Video Central

Datasets API overview

The Datasets API is part of our new partner data product, Slate Analytics. Unlike other Slate reports, datasets are append-only (each file has new data), are not available for download in the Slate UI (but are accessible via API only), and are built explicitly for partner data engineers to consume granular data and perform analytics. This topic helps data engineers set up their pipelines to retrieve dataset, defines the values in the dataset files, and provides sample queries and suggestions for the optimal ways partners can use this data.

Practical use of datasets

We provide datasets to consumers in the form of a changelog. Each event is published only once. However, if any column values for a previously provided row need to be updated, we will publish a new version of the record to reflect the changes in your next available file. The changelog is append-only, to ensure that all data modifications are captured. Data engineers can use this changelog to update their data tables directly.

When you process the changelog, it’s essential to always use the latest record for a given event_id, based on the last_update_time_utc column. This ensures that you always have the most up-to-date version of each record. If a record needs to be deleted, this action is reflected in the is_deleted column. A value of 1 indicates that the record has been deleted, while a value of 0 represents an active record. This changelog approach allows you to effectively manage new and changing data, and ensures that your data tables remain accurate and up to date with the latest information.

Datasets API preliminaries

Before you make requests to the Dataset API, it’s important to understand the basic requirements for authentication and pagination. This section covers how to securely access the API and navigate large datasets efficiently.

Onboarding to Analytics API
To retrieve datasets you need to onboard to the Analytics API suite first. More details can be found here.

The base URI is: https://videocentral.amazon.com/apis/v2. All requests should include a valid LWA authentication token in the request authorization header. For example:

If the request header doesn’t include the token, or if the token is expired, the Datasets API will return an unauthorized exception.

Pagination
All Slate API responses are paginated. Pagination parameters are specified through requests parameters.

Request parameter	Default value	Description
limit	10	The number of documents returned in a single page (the page size).
offset	0	The number of pages to skip (the page number).

All paginated responses contain the following fields.

Field	Description
total	The total document count in all pages.
next	The URL to the next page. Null if the last page.

Use the Datasets API

To programmatically access datasets, clients should follow a series of API calls that enumerate available resources—such as accounts, groups, businesses, and datasets—before retrieving downloadable URLs for the data files. This sequence is designed to support automation and can be integrated into recurring data pipelines or scheduled workflows.

List accounts
/v2/accounts

This resource returns the list of Slate accounts that the user can access. The set of accounts is accessible in Slate through the accounts dropdown list near the top right corner of the portal. You can also use these links to find your account_id or your channel/studio_id.

Example request

Example response

List groups (business lines)
/v2/accounts/{account_id}

This resource returns the groups of business lines (such as channels) that the user can access.

Example request

Example response

List businesses
/v2/accounts/{account_id}/{group_id}

This resource returns a list of businesses (such as specific channel names) available for this account, depending on the given business line.

Example request

Example response

List available datasets
/v2/accounts/{acccount_id}/{group_id}/{business_id}/datasets

This resource returns the list of datasets available for a given channel or studio. (The list of available datasets and their attributes are included in Dataset definitions, later in this topic.) The datasets currently available to download are:

Subscription: Events in the customer lifecycle, such as when a customer subscribed.
Playback: Playback session events where customers engaged with content.
Catalog: Events where your catalog metadata has changed, such as when a new title was added.

Example request

Example response

Obtain dataset file(s)
/v2/accounts/{account_id}/{group_id}/{business_id}/datasets/{dataset_id}

This resource provides a list of dataset files. Depending on the requested time range, the list may include a large number of files. The total field indicates how many files to expect. After completing a full backfill, you can stay up to date by continuing to request files using a startDateTime equal to the last retrieved timestamp and an endDateTime set to the current time.

New datasets are published approximately every 4 hours, and may contain events that have occurred within the previous 12 hours. We recommend calling our API multiple times per day, approximately every 4-6 hours, to ensure your local data is as complete and up-to-date as possible. If we experience a delay in publishing, we will communicate through email as soon as possible.

The following table describes the available request parameters for dataset files.

Request parameter	Description
startDateTime	Recommendation is to set from the last time pulled. (Format: YYYY-MM-DDTHH:MM:SSZ - timestamp UTC)
endDateTime	Recommendation is to set at time of pulling/current time. (Format: YYYY-MM-DDTHH:MM:SSZ - timestamp UTC)
limit	Maximum limit is 1000 links per page.

Note: Our maximum data retention is 2 years. Requests for datasets with a timestamp earlier than 2 years prior will not return any results.

Example request

Example response

Notes:

Maximum file zip size is 300 MB. Files that exceed that limit will be split into multiple files.
Presigned URL time-to-live (TTL) is 60 minutes.

Dataset definitions

The tables in this section list the columns, data types, and definitions for each of the 3 available datasets.

Subscription dataset

Column	Type	Definition
subscription_event_id (pk)	string	The unique ID for each subscription event vended through this log.
subscription_event_type	string	The type of subscription event that occurred: Start: Customer subscribed to a channel they were not subscribed to previously. Renewal: Customer subscribed to a channel and was already active for that channel prior to the subscription event. Cancel: Customer canceled their subscription. Active - AR ON: Customer is active and has turned autorenew on. Active - AR OFF: Customer is active, but has turned autorenew off.
subscription_event_time_utc	timestamp	The time the subscription event occurred, standardized to UTC.
subscription_event_time_zone	string	The time zone of the subscription marketplace.
cid	string	Anonymized customer identifier (CID). This customer identifier will persist for all events under a single parent channel to enable inter-tier movement and customer lifecycle tracking.
offer_id	string	The ID of the specific subscription offer the event occurred in relation to.
offer_name	string	The human-readable name of the offer.
offer_type	string	The type of offer.
offer_marketplace	string	The marketplace where the subscription offer was live.
offer_billing_type	string	The type of payment required for the offer: HO: Hard offer; payment required. FT: Free trial; no payment required.
offer_payment_amount	string	The billing amount of the offer_id.
benefit_id	string	The ID of the Prime Video benefit the offer is configured under.
channel_label	string	The name of the channel the offer is under. Note: If this column shows a null value, and you have concerns, please contact your CAM or PsM.
channel_tier_label	string	The name of the channel the offer is under. Note: If this column shows a null value, and you have concerns, please contact your CAM or PsM.
is_promo	int	Indicates whether an offer is on a promotion at time of event (0 = no promo, 1 = yes promo).
create_time_utc	timestamp	The time the subscription event log record was created, standardized to UTC.
last_update_time_utc	timestamp	The time the subscription event log record was last updated, standardized to UTC.
is_deleted	int	Indicates whether a record that was previously created should be deleted (0 = should persist, 1= should be deleted).

Playback dataset

Column	Type	Definition
session_id (pk)	string	The unique ID for the playback session.
marketplace_id	int	The unique ID for the playback marketplace.
marketplace_desc	string	A friendly description for the playback marketplace.
cid	string	The user identifier, anonymized with UUID.
benefit_id	string	The benefit associated with content that was streamed.
catalog_id	string	Foreign key (FK) used to join to catalog table.
subscription_offer_id	string	The subscription offer_id customer is subscribed to at time of stream (Active or ApprovalPending).
subscription_event_id	string	Foreign key (FK) to join to subscription event log to get the exact status of subscriber at time of playback (Active)
start_segment_utc	timestamp	Start of playback segment in UTC.
end_segment_utc	timestamp	End of playback segment in UTC.
seconds_viewed	int	Seconds user streamed content during playback.
position_start	double	Second of stream where playback session started.
position_end	double	Second of stream where playback session ended.
connection_type	string	Connection used by the customer to stream the content.
stream_type	string	Classification between Video-On-Demand, Live, or Just After Broadcast (JAB) streams.
device_class	string	Type of device (such as Living Room, Mobile, Web, or Others).
device_sub_class	string	Granular type of device (such as game console, smart_tv, roku).
geo_dma	string	The 3-digit geographical Designated Market Area (DMA) of the area where the stream was generated.
playback_method	string	Accounts for whether playback is Online or Offline.
quality	string	Playback quality (such as 1080p or 4K)
event_type	string	The defining event type (playback_segments)
create_time_utc	timestamp	Timestamp when record was added to table, in UTC.
last_update_time_utc	timestamp	Last updated timestamp when record was modified, in UTC.
is_deleted	int	Flag to denote to partners if the record should be deleted in their system.

Catalog dataset

Column	Type	Definition
id (pk)	string	The unique ID for the title.
marketplace_id	int	The unique ID for the offer marketplace.
benefit_id	string	The benefit assoicated with the content extended.
title	string	The title of the series/movie.
vendor_sku	string	An arbitrary identifier that the vendor generates for each of their movies or episodes.
season	integer	The season number (for episodic content).
episode	integer	The episode number.
episode_name	string	The episode name (optional).
runtime_minutes	integer	The runtime of the content viewed.
live_linear_channel_name	string	The channel name for live content.
content_type	string	Either TV or Movie.
content_quality	string	HD or SD
content_group	string	3P_SUBS
create_time_utc	timestamp	Timestamp when record was added to table, in UTC.
last_update_time_utc	timestamp	Last updated timestamp when record was modified, in UTC.
is_deleted	int	Flag to denote to partners if the record should be deleted in their system.

Sample queries

The following SQL example demonstrates how the dataset tables connect. You can join playback data to the subscription event log on the subscription_event_id column. This provides the latest subscription status prior to that stream. In this example, the catalog_id column in the playback dataset is joined to the id field in catalog_event_log to provide all catalog metadata.

The following SQL example will return the top 10 first-watched titles for customers post their having started a subscription.

Sample orchestration

If you want to automate data extraction from the Datasets API on a recurring schedule, the following sample Python script demonstrates how to make incremental API calls every 6 hours. It tracks the timestamp of the last successful request by persisting it locally, and uses that value—plus one second—as the startDateTime for the next call. The script calculates endDateTime as the current time, builds the appropriate query parameters, and sends a GET request with authentication. This approach ensures continuous, non-overlapping data retrieval across time windows and can be scheduled via cron or another job scheduler.