The new Datasets API in Prime Video Slate enables developers to build clients to retrieve event-gain exports (datasets) and any related dimensional datasets.
Important: The new endpoint documented here supports both subscription and playback. Playback datasets are only available through this new endpoint.
Datasets API overview
The Datasets API is part of our new partner data product, Slate Analytics. Unlike other Slate reports, datasets are append-only (each file has new data), are not available for download in the Slate UI (but are accessible via API only), and are built explicitly for partner data engineers to consume granular data and perform analytics. This topic helps data engineers set up their pipelines to retrieve dataset, defines the values in the dataset files, and provides sample queries and suggestions for the optimal ways partners can use this data.
Practical use of datasets
We provide datasets to consumers in the form of a changelog. Each event is published only once. However, if any column values for a previously provided row need to be updated, we will publish a new version of the record to reflect the changes in your next available file. The changelog is append-only, to ensure that all data modifications are captured. Data engineers can use this changelog to update their data tables directly.
When you process the changelog, it’s essential to always use the latest record for a given event_id, based on the last_update_time_utc column. This ensures that you always have the most up-to-date version of each record. If a record needs to be deleted, this action is reflected in the is_deleted column. A value of 1 indicates that the record has been deleted, while a value of 0 represents an active record. This changelog approach allows you to effectively manage new and changing data, and ensures that your data tables remain accurate and up to date with the latest information.
Datasets API preliminaries
Before you make requests to the Dataset API, it’s important to understand the basic requirements for authentication and pagination. This section covers how to securely access the API and navigate large datasets efficiently.
Authentication
To retrieve datasets you must have:
- A login with Amazon Security Profile
- An authorization code to request a token
- A token for all curl requests
The base URI is: https://videocentral.amazon.com/apis/v2. All requests should include a valid LWA authentication token in the request authorization header. For example:curl -X GET \
-H "Authorization: Bearer Atza|auth_token" \
https://videocentral.amazon.com/apis/v2/accounts/123456
If the request header doesn’t include the token, or if the token is expired, the Datasets API will return an unauthorized exception.
Pagination
All Slate API responses are paginated. Pagination parameters are specified through requests parameters.
Request parameter |
Default value |
Description |
limit |
10 |
The number of documents returned in a single page (the page size). |
offset |
0 |
The number of pages to skip (the page number). |
All paginated responses contain the following fields.
Field |
Description |
total |
The total document count in all pages. |
next |
The URL to the next page. Null if the last page. |
Use the Datasets API
To programmatically access datasets, clients should follow a series of API calls that enumerate available resources—such as accounts, groups, businesses, and datasets—before retrieving downloadable URLs for the data files. This sequence is designed to support automation and can be integrated into recurring data pipelines or scheduled workflows.
List accounts
/v2/accounts
This resource returns the list of Slate accounts that the user can access. The set of accounts is accessible in Slate through the accounts dropdown list near the top right corner of the portal. You can also use these links to find your account_id or your channel/studio_id.
Example request |
|
Example response |
|
List groups (business lines)
/v2/accounts/{account_id}
This resource returns the groups of business lines (such as channels) that the user can access.
Example request |
|
Example response |
|
List businesses
/v2/accounts/{account_id}/{group_id}
This resource returns a list of businesses (such as specific channel names) available for this account, depending on the given business line.
Example request |
|
Example response |
|
List available datasets
/v2/accounts/{acccount_id}/{group_id}/{business_id}/datasets
This resource returns the list of datasets available for a given channel or studio. (The list of available datasets and their attributes are included in Dataset definitions, later in this topic.) The datasets currently available to download are:
- Subscription: Events in the customer lifecycle, such as when a customer subscribed.
- Playback: Playback session events where customers engaged with content.
- Catalog: Events where your catalog metadata has changed, such as when a new title was added.
Example request |
|
Example response |
|
Obtain dataset file(s)
/v2/accounts/{account_id}/{group_id}/{partition_id}/datasets/{dataset_id}
This resource provides a list of dataset files. Depending on the requested time range, the list may include a large number of files. The total field indicates how many files to expect. After completing a full backfill, you can stay up to date by continuing to request files using a startDateTime equal to the last retrieved timestamp and an endDateTime set to the current time.
New datasets are published approximately every 4 hours, and may contain events that have occurred within the previous 12 hours. We recommend calling our API multiple times per day, approximately every 4-6 hours, to ensure your local data is as complete and up-to-date as possible. If we experience a delay in publishing, we will communicate through email as soon as possible.
The following table describes the available request parameters for dataset files.
Request parameter |
Description |
startDateTime |
Recommendation is to set from the last time pulled. |
endDateTime |
Recommendation is to set at time of pulling/current time. |
limit |
Maximum limit is 1000 links per page. |
Note: Our maximum data retention is 2 years. Requests for datasets with a timestamp earlier than 2 years prior will not return any results.
Example request |
|
|
Example response |
Notes:
|
|
Dataset definitions
The tables in this section list the columns, data types, and definitions for each of the 3 available datasets.
Subscription dataset
Column |
Type |
Definition |
subscription_event_id (pk) |
string |
The unique ID for each subscription event vended through this log. |
subscription_event_type |
string |
The type of subscription event that occurred: Start: Customer subscribed to a channel they were not subscribed to previously. |
subscription_event_time_utc |
timestamp |
The time the subscription event occurred, standardized to UTC. |
subscription_event_time_zone |
string |
The time zone of the subscription marketplace. |
cid |
string |
Anonymized customer identifier (CID). This customer identifier will persist for all events under a single parent channel to enable inter-tier movement and customer lifecycle tracking. |
offer_id |
string |
The ID of the specific subscription offer the event occurred in relation to. |
offer_name |
string |
The human-readable name of the offer. |
offer_type |
string |
The type of offer. |
offer_marketplace |
string |
The marketplace where the subscription offer was live. |
offer_billing_type |
string |
The type of payment required for the offer: HO: Hard offer; payment required. |
offer_payment_amount |
string |
The billing amount of the offer_id. |
benefit_id |
string |
The ID of the Prime Video benefit the offer is configured under. |
channel_label |
string |
The name of the channel the offer is under. Note: If this column shows a null value, and you have concerns, please contact your CAM or PsM. |
channel_tier_label |
string |
The name of the channel the offer is under. Note: If this column shows a null value, and you have concerns, please contact your CAM or PsM. |
is_promo |
int |
Indicates whether an offer is on a promotion at time of event (0 = no promo, 1 = yes promo). |
create_time_utc |
timestamp |
The time the subscription event log record was created, standardized to UTC. |
last_update_time_utc |
timestamp |
The time the subscription event log record was last updated, standardized to UTC. |
is_deleted |
int |
Indicates whether a record that was previously created should be deleted (0 = should persist, 1= should be deleted). |
Playback dataset
Column |
Type |
Definition |
session_id (pk) |
string |
The unique ID for the playback session. |
marketplace_id |
int |
The unique ID for the playback marketplace. |
marketplace_desc |
string |
A friendly description for the playback marketplace. |
cid |
string |
The user identifier, anonymized with UUID. |
benefit_id |
string |
The benefit associated with content that was streamed. |
catalog_id |
string |
Foreign key (FK) used to join to catalog table. |
subscription_offer_id |
string |
The subscription offer_id customer is subscribed to at time of stream (Active or ApprovalPending). |
subscription_event_id |
string |
Foreign key (FK) to join to subscription event log to get the exact status of subscriber at time of playback (Active) |
start_segment_utc |
timestamp |
Start of playback segment in UTC. |
end_segment_utc |
timestamp |
End of playback segment in UTC. |
seconds_viewed |
int |
Seconds user streamed content during playback. |
position_start |
double |
Second of stream where playback session started. |
position_end |
double |
Second of stream where playback session ended. |
connection_type |
string |
Connection used by the customer to stream the content. |
stream_type |
string |
Classification between Video-On-Demand, Live, or Just After Broadcast (JAB) streams. |
device_class |
string |
Type of device (such as Living Room, Mobile, Web, or Others). |
device_sub_class |
string |
Granular type of device (such as game console, smart_tv, roku). |
geo_dma |
string |
The 3-digit geographical Designated Market Area (DMA) of the area where the stream was generated. |
playback_method |
string |
Accounts for whether playback is Online or Offline. |
quality |
string |
Playback quality (such as 1080p or 4K) |
event_type |
string |
The defining event type (playback_segments) |
create_time_utc |
timestamp |
Timestamp when record was added to table, in UTC. |
last_update_time_utc |
timestamp |
Last updated timestamp when record was modified, in UTC. |
is_deleted |
int |
Flag to denote to partners if the record should be deleted in their system. |
Catalog dataset
Column |
Type |
Definition |
id (pk) |
string |
The unique ID for the title. |
marketplace_id |
int |
The unique ID for the offer marketplace. |
benefit_id |
string |
The benefit assoicated with the content extended. |
title |
string |
The title of the series/movie. |
vendor_sku |
string |
An arbitrary identifier that the vendor generates for each of their movies or episodes. |
season |
integer |
The season number (for episodic content). |
episode |
integer |
The episode number. |
episode_name |
string |
The episode name (optional). |
runtime_minutes |
integer |
The runtime of the content viewed. |
live_linear_channel_name |
string |
The channel name for live content. |
content_type |
string |
Either TV or Movie. |
content_quality |
string |
HD or SD |
content_group |
string |
3P_SUBS |
create_time_utc |
timestamp |
Timestamp when record was added to table, in UTC. |
last_update_time_utc |
timestamp |
Last updated timestamp when record was modified, in UTC. |
is_deleted |
int |
Flag to denote to partners if the record should be deleted in their system. |
Sample queries
The following SQL example demonstrates how the dataset tables connect. You can join playback data to the subscription event log on the subscription_event_id column. This provides the latest subscription status prior to that stream. In this example, the catalog_id column in the playback dataset is joined to the id field in catalog_event_log to provide all catalog metadata.
select*
from playback_event_log a
left join subscription_event_log b on a.subscription_event_id=b.subscription_event_id
left join catalog_event_log c on a.catalog_id=c.id
The following SQL example will return the top 10 first-watched titles for customers post their having started a subscription.
with main as (select a.*,row_number() over
(partition by a.cid order by start_segment_utc asc) as rn
from playback_event_log a
inner join (select distinct cid from subscription_event_log
where subscription_event_type='Start' ) b on a.cid = b.cid
)
select c.id,c.title,c.episode_name,c.content_type,sum(seconds_viewed)
as total_seconds_viewed
from main a
inner join catalog_event_log c on a.catalog_id = c.id
where rn = 1
GROUP by c.id,c.title,c.episode_name,c.content_type
order by sum(seconds_viewed) desc
limit 10;
Sample orchestration
If you want to automate data extraction from the Datasets API on a recurring schedule, the following sample Python script demonstrates how to make incremental API calls every 6 hours. It tracks the timestamp of the last successful request by persisting it locally, and uses that value—plus one second—as the startDateTime for the next call. The script calculates endDateTime as the current time, builds the appropriate query parameters, and sends a GET request with authentication. This approach ensures continuous, non-overlapping data retrieval across time windows and can be scheduled via cron or another job scheduler.
import requests
import datetime
import os
# Constants for the API and token
AUTH_TOKEN = "Atza|auth_token"
ACCOUNT_URL = (
"https://videocentral.amazon.com/apis/v2/"
"accounts/123456/7890/abc123/datasets/data987"
)
# File where we persist the timestamp of the last successful API call
LAST_CALL_FILE = "last_call_time.txt"
def load_last_call_time():
"""
Reads the timestamp of the last API call from a local file.
If the file doesn't exist, defaults to a specific start time.
"""
if not os.path.exists(LAST_CALL_FILE):
# If no record exists, assume we're starting fresh from this date
return datetime.datetime(2023, 1, 1, 0, 0, 0, tzinfo=datetime.timezone.utc)
with open(LAST_CALL_FILE, "r") as f:
# Read and parse the ISO timestamp from the file
return datetime.datetime.fromisoformat(f.read().strip())
def save_current_call_time(dt):
"""
Saves the current timestamp to the local file so it can be used
as the starting point for the next API call.
"""
with open(LAST_CALL_FILE, "w") as f:
f.write(dt.isoformat())
def main():
# Current UTC time will be used as the end of the range
now = datetime.datetime.now(datetime.timezone.utc)
# Start time is one second after the last recorded call
start_time = load_last_call_time() + datetime.timedelta(seconds=1)
end_time = now
# Set the query parameters
params = {
"startDateTime": start_time.isoformat(),
"endDateTime": end_time.isoformat(),
"offset": 0,
"limit": 50
}
# Set the auth header
headers = {
"Authorization": f"Bearer {AUTH_TOKEN}"
}
# Make the GET request to the API
response = requests.get(ACCOUNT_URL, headers=headers, params=params)
# Check if the request was successful
if response.ok:
print("Data fetched successfully.")
# Persist the end time as the last successful call time
save_current_call_time(end_time)
else:
print(f"Error fetching data: {response.status_code} - {response.text}")
if __name__ == "__main__":
main()