Podcast Statistics - How we collect and process data

This article will provide you with detailed information and insights on how Spreaker measures listener data. More specifically, it provides information on how this data is processed in order to better represent actual user behavior and avoid fraudulent and fake traffic.

Preface: How Listeners Consume Spreaker Content

Content published on Spreaker can be accessed in two different ways:

On platform: content is consumed through a Spreaker-made application (the website or mobile apps), reported as a download.
Off platform: when content is consumed through third party client listening apps (podcatchers). This is reported as a download.

Definitions

Download

A download is counted in response to a user-initiated action that starts the audio playback of a specific media file. Subsequent actions made by the same user on the same media file (i.e. play/stop/play sequences, seeks, etc…) are not counted if they fall into a specific Time Window, in which case they are considered as part of the same play. (See Time Windows for details.).

Also, a download is counted in response to an application-initiated HTTP GET request for a media file hosted on Spreaker. In order to avoid double (or even multiple) countings of download data, raw CDN logs are filtered so that only a single request for a specific media file, coming from a unique IP and client, is counted within a specific time window, within a single day.

Download Time Windows

A temporal interval, which defines the boundaries of which multiple downloads initiated by the same user or client application are counted as a single event. This filtering avoids multiple countings of subsequent HTTP get requests (e.g. byte-range requests) that are actually part of the same download session, and at the same time avoid fraudulent behavior (multiple plays/stops) aimed at increasing download counts.

Each download action that happens before the time window has elapsed is not counted, and the time window's starting point resets.

The current time window is one day.

On-Platform Downloads

All the listening activity that happens on Spreaker's applications:

The Spreaker website
Spreaker's mobile apps (iOS, Android)
Customized mobile apps (iOS, Android) made by Spreaker
Spreaker embedded players

Off-Platform Downloads

When a third party listening application (e.g. Apple Podcasts apps, Google Podcasts, Spotify, Overcast, Stitcher, …) requests content from Spreaker's servers – usually referencing media links contained inside RSS.

All that Spreaker sees from its end is an HTTP "GET" request, which includes info such as a source IP address and a set of headers. Among this info, the “User Agent” helps identify the originating application and OS.

In this case, Spreaker has no definitive information about the media that was or was not subsequently listened to by the user.

The raw number of HTTP "GET" requests of media files is not a reliable indicator of the number of downloads because:

Some clients make multiple "Get" requests for the same file (byte-range requests) as they try to minimize the data transfer through a connection
Some sources of downloads are considered illegitimate (for example download bots) and are filtered from the count

Before reporting this data to listeners, Spreaker applies a number of filtering mechanisms to avoid multiple trackings and ensures accurate counting. See below for details on how this filtering is performed.

Other downloads (Unreported)

Some platforms (e.g. Google Play, …) create a local copy of the media files to be served or hosted to their user base. This means that Spreaker cannot get any information on their downloads because the client requests do not hit the Spreaker CDN. If these platforms support a reporting API, then plays are counted. Otherwise, there is no information about this media getting accessed on Spreaker's dashboard.

Download Counts

A download is counted when one of the following happens:

A user clicks on the play button for a media file (on one of Spreaker's apps)
A media file starts autoplaying
A user completes an action which results in a media file playing (e.g. clicks to a position in the waveform and the media starts playing)
A HTTP "Get" request for a file is detected and the same file (or a portion of the same file in case of byte-range requests)

AND

Has not been requested by the same user/client application during the Time Window.

Multiple requests for the same media file originating from the same client application might also get filtered if they exceed the maximum hourly and daily thresholds for such traffic; this is to avoid fake traffic generators and other bots.

Download Filtering

Spreaker uses a proprietary algorithm to analyze downloads in order to eliminate redundant requests, bots, and fraudulent traffic at a consistent measure of actual listener activity. This provides advertisers and publishers with the most reliable data set in order to better understand listener activity.

Specifically, these actions are performed when filtering downloads:

GET requests are tracked (most podcatchers usually also do HTTP HEAD requests before downloading a file)
Known bots and crawlers are filtered out
Multiple tracking is avoided by not counting multiple requests by the same client (see uniqueness) that fall within the Time Window (see Time Windows)
Multiple downloads are filtered out from the same source based on hourly and daily rates (and avoid fake traffic generators)

Measurement Methodology

Uniqueness Algorithm

By relying on the user account information (when available) Spreaker is able to easily identify the unique downloads of its media files. When user account information is not available, Spreaker uses a proprietary algorithm involving cookies, the IP address, the user agent, and other factors to aggregate multiple requests into a single Unique Download request. This algorithm is constantly evolving as the industry evolves.

Oftentimes, while Spreaker’s algorithm aims to capture the activity of unique people, it is limited to interpreting the actions of unique clients. When multiple people share the same computer or device and all listen to the same media file, this "Unique Download" metric will understate the number of Unique Listeners who accessed the episode. Luckily, the proliferation of personal handheld devices has significantly reduced the likelihood and frequency of these scenarios.

Geolocation and Geocoding

User geolocation comes from the use of:

GPS or other location information when available
IP address geocoding lookup

IP addresses are assigned by Internet Service Providers (ISPs). For some ISPs, IP addresses don’t relate to the actual location of the end-user, though this is becoming less of a problem with the growing prevalence of always-on, high-speed internet connections and the mapping of user devices with GPS data.

Sources Information

Sources are identified in various ways:

Spreaker apps: these are directly counted within Spreaker's infrastructure
Embedded player: These downloads are reported as coming from the host domain where the player is embedded
API reported downloads: directly counted
Other download sources: the originating app and device are identified by relying on information such as the User Agent or the referrer field of the HTTP get request.