Listener Analytics (Introduction)
This document provides additional information and insights on how Spreaker measures listener data. More specifically, it provides information on how this data is processed in order to better represent actual user behavior and avoid fraudulent and fake traffic.
Content published on Spreaker can be accessed in two different ways:
On platform: content is consumed through a Spreaker-made application (the website or mobile apps). This gives the most measurement details and is also the only way to consume LIVE streams, which are reported as plays.
Off platform: when content is consumed through third party client listening apps (podcatchers). This can either be reported as a play (if the listening platform supports a reporting API) or a download.
A play is counted in response to a user-initiated action that starts the audio playback of a specific media file. Subsequent actions made by the same user on the same media file (i.e. play/stop/play sequences, seeks, etc…) are not counted if they fall into a specific Time Window, in which case they are considered as part of the same play. (See Time Windows for details.)
Since plays can only be reliably counted on listening apps, plays are usually reported when they happen on Spreaker's apps; however, more and more third-party applications are starting to implement some sort of API for reporting actual play counts. When a play count is available, this data is displayed on Spreaker.
A download is counted in response to an application-initiated HTTP GET request for a media file hosted on Spreaker. In order to avoid double (or even multiple) countings of download data, raw CDN logs are filtered so that only a single request for a specific media file, coming from a unique IP and client, is counted within a specific time window, within a single day.
A temporal interval, which defines the boundaries of which multiple events (plays, downloads) initiated by the same user or client application are counted as a single event. This filtering avoids multiple countings of subsequent HTTP get requests (e.g. byte range requests) that are actually part of the same download session, and at the same time avoid fraudulent behavior (multiple plays/stops) aimed at increasing play counts.
Each play or download action that happens before the time window has elapsed is not counted, and the time window's starting point resets.
The current time window is one day. Please keep in mind that on platform plays are not included in this calculation (see the following paragraph for details).
All the listening activity that happens on Spreaker's applications.
When Spreaker's technology is used as a listening application, Spreaker is able to reliably measure how users engage with its content.
This enables measurements such as:
Plays (counted after 5 seconds of consumption, pre-roll excluded)
Time Spent Listening (TSL)
In addition, these plays usually carry more demographic information about the audience (including age and gender) due to the fact that a sizable portion of the audience signs up to Spreaker and contributes their demographic data.
On platform plays happen through:
The Spreaker website
Spreaker's mobile apps (iOS, Android)
Customized mobile apps (iOS, Android) made by Spreaker
Spreaker embedded players
When a third party listening application (e.g. Apple podcast apps, iTunes, Overcast, Stitcher, …) requests content from Spreaker's servers – usually referencing media links contained inside RSS feeds – Spreaker refers to this as a "download". This also happens when files are downloaded using the direct media link, for example when users click on the download button on the web user interface or when a direct media link is shared publicly.
All that Spreaker sees from its end is an HTTP "GET" request, which includes info such as a source IP address and a set of headers. Among this info, the “User Agent” helps identify the originating application and OS.
In simpler terms, Spreaker has no definitive information about the media that was or was not subsequently listened to by the user, so we report this information as "downloads".
The raw number of HTTP "GET" requests of media files is not a reliable indicator of the number of downloads because:
Some clients make multiple "Get" requests for the same file (byte-range requests) as they try to minimize the data transfer through a connection
Some sources of downloads are considered illegitimate (for example download bots) and are filtered from the count
Before reporting this data to listeners, Spreaker applies a number of filtering mechanisms to avoid multiple trackings and ensures accurate counting. See below for details on how this filtering is performed.
Some platforms (for example iHeartRadio, YouTube,...) offer an API which can be used to query the number of times a media file has been played or streamed. In these cases, Spreaker treats them as Plays.
The number of these platforms is going to increase in the foreseeable future, and reporting will become a standard among many players.
Downloads and plays are never duplicated; if a platform supports API reporting, only plays, and not downloads, are tracked in Spreaker's stats.
Some platforms (e.g. Soundcloud, Google Play, …) create a local copy of the media files to be served or hosted to their user base. This means that Spreaker cannot get any information on their downloads because the client requests do not hit the Spreaker CDN. If these platforms support a reporting API, then plays are counted. Otherwise, there is no information about this media getting accessed on Spreaker's dashboard.
A play is counted when one of the following happens:
A user clicks on the play button for a media file (on one of Spreaker's apps)
A media file starts autoplaying
A user completes an action which results in a media file playing (e.g. clicks to a position in the waveform and the media starts playing)
- The same file has not been played by the same user within the Time Window
Plays are also generated when sources that report data through APIs send back information.
Spreaker uses a proprietary algorithm to analyze plays in order to eliminate redundant requests, bots, and fraudulent traffic arriving at consistent intervals during actual listener activity. This provides advertisers and publishers with the most reliable data set in order to better understand listener activity.
A download is counted when an HTTP "Get" request for a file is detected and the same file (or a portion of the same file in case of byte-range requests) has not been requested by the same user/client application during the Time Window.
Multiple requests for the same media file originating from the same client application might also get filtered if they exceed the maximum hourly and daily thresholds for such traffic; this is to avoid fake traffic generators and other bots.
If the source of the request (as identified either through the user-agent, IP address range, or by any other means) is identified as an app that provides API reporting, download requests are ignored and not counted.
Spreaker uses a proprietary algorithm to analyze downloads in order to eliminate redundant requests, bots, and fraudulent traffic at a consistent measure of actual listener activity. This provides advertisers and publishers with the most reliable data set in order to better understand listener activity.
Specifically, these actions are performed when filtering downloads:
GET requests are tracked (most podcatchers usually also do HTTP HEAD requests before downloading a file)
Known bots and crawlers are filtered out
Multiple tracking is avoided by not counting multiple requests by the same client (see uniqueness) that fall within the Time Window (see Time Windows)
Multiple downloads are filtered out from the same source based on hourly and daily rates (and avoid fake traffic generators)
By relying on user account information (when available) Spreaker is able to easily identify the unique plays of its media files. When user account information is not available, Spreaker uses a proprietary algorithm involving cookies, the IP address, the user agent, and other factors to aggregate multiple requests into a single Unique Play or Unique Download request. This algorithm is constantly evolving as the industry evolves.
Oftentimes, specifically in the case of downloads, while Spreaker’s algorithm aims to capture the activity of unique people, it is limited to interpreting the actions of unique clients. When multiple people share the same computer or device and all listen to the same media file, this "Unique Download" metric will understate the number of Unique Listeners who accessed the episode. Fortunately, the proliferation of personal handheld devices has significantly reduced the likelihood and frequency of these scenarios.
User geolocation is derived through the use of:
GPS or other location information when available
IP address geocoding lookup
IP addresses are assigned by Internet Service Providers (ISPs). For some ISPs, IP addresses don’t relate to the actual location of the end user, though this is becoming less of a problem with the growing prevalence of always-on, high-speed internet connections and the mapping of user devices with GPS data.
When listeners are registered Spreaker users and have supplied their demographic information (etiher directly or through a social network), Spreaker is able to retrieve that specific information. Users are generally interested in signing up to Spreaker when they want to use features such as
The chat/comment system
The follow user/subscribe to show features
The like button
Since the number of registered users is a statistically significant percentage of listeners (usually around 20%), the information reported is usually meaningful, though its accuracy may vary for reasons related users sampling the platform. Also, content producers primarily use the Spreaker platform as their main listening platform.
Sources are identified in various ways:
Spreaker apps: these are directly counted within Spreaker's infrastructure
Embedded player: These plays are reported as coming from the host domain where the player is embedded
API reported plays: directly counted
Other download sources: the originating app and device are identified by relying on information such as the User Agent or the referrer field of the HTTP get request