Datasets API - Prime Video Tech Docs

Datasets API

Last updated 2025-07-21

The new Datasets API in Prime Video Slate enables developers to build clients to retrieve event-gain exports (datasets) and any related dimensional datasets.

Important: The new endpoint documented here supports both subscription and playback. Playback datasets are only available through this new endpoint.

Datasets API overview

The Datasets API is part of our new partner data product, Slate Analytics. Unlike other Slate reports, datasets are append-only (each file has new data), are not available for download in the Slate UI (but are accessible via API only), and are built explicitly for partner data engineers to consume granular data and perform analytics. This topic helps data engineers set up their pipelines to retrieve dataset, defines the values in the dataset files, and provides sample queries and suggestions for the optimal ways partners can use this data.

Practical use of datasets

We provide datasets to consumers in the form of a changelog. Each event is published only once. However, if any column values for a previously provided row need to be updated, we will publish a new version of the record to reflect the changes in your next available file. The changelog is append-only, to ensure that all data modifications are captured. Data engineers can use this changelog to update their data tables directly.

When you process the changelog, it’s essential to always use the latest record for a given event_id, based on the last_update_time_utc column. This ensures that you always have the most up-to-date version of each record. If a record needs to be deleted, this action is reflected in the is_deleted column. A value of 1 indicates that the record has been deleted, while a value of 0 represents an active record. This changelog approach allows you to effectively manage new and changing data, and ensures that your data tables remain accurate and up to date with the latest information.

Datasets API preliminaries

Before you make requests to the Dataset API, it’s important to understand the basic requirements for authentication and pagination. This section covers how to securely access the API and navigate large datasets efficiently.

Authentication
To retrieve datasets you must have:

  1. A login with Amazon Security Profile
  2. An authorization code to request a token
  3. A token for all curl requests

The base URI is: https://videocentral.amazon.com/apis/v2. All requests should include a valid LWA authentication token in the request authorization header. For example:

If the request header doesn’t include the token, or if the token is expired, the Datasets API will return an unauthorized exception.

Pagination
All Slate API responses are paginated. Pagination parameters are specified through requests parameters.

Request parameter

Default value

Description

limit

10

The number of documents returned in a single page (the page size).

offset

0

The number of pages to skip (the page number).

All paginated responses contain the following fields.

Field

Description

total

The total document count in all pages.

next

The URL to the next page. Null if the last page.

Use the Datasets API

To programmatically access datasets, clients should follow a series of API calls that enumerate available resources—such as accounts, groups, businesses, and datasets—before retrieving downloadable URLs for the data files. This sequence is designed to support automation and can be integrated into recurring data pipelines or scheduled workflows.

List accounts
/v2/accounts

This resource returns the list of Slate accounts that the user can access. The set of accounts is accessible in Slate through the accounts dropdown list near the top right corner of the portal. You can also use these links to find your account_id or your channel/studio_id.

Example request

Example response

List groups (business lines)
/v2/accounts/{account_id}

This resource returns the groups of business lines (such as channels) that the user can access.

Example request

Example response

List businesses
/v2/accounts/{account_id}/{group_id}

This resource returns a list of businesses (such as specific channel names) available for this account, depending on the given business line.

Example request

Example response

List available datasets
/v2/accounts/{acccount_id}/{group_id}/{business_id}/datasets

This resource returns the list of datasets available for a given channel or studio. (The list of available datasets and their attributes are included in Dataset definitions, later in this topic.) The datasets currently available to download are:

  • Subscription: Events in the customer lifecycle, such as when a customer subscribed.
  • Playback: Playback session events where customers engaged with content.
  • Catalog: Events where your catalog metadata has changed, such as when a new title was added.

Example request

Example response

Obtain dataset file(s)
/v2/accounts/{account_id}/{group_id}/{partition_id}/datasets/{dataset_id}

This resource provides a list of dataset files. Depending on the requested time range, the list may include a large number of files. The total field indicates how many files to expect. After completing a full backfill, you can stay up to date by continuing to request files using a startDateTime equal to the last retrieved timestamp and an endDateTime set to the current time.

New datasets are published approximately every 4 hours, and may contain events that have occurred within the previous 12 hours. We recommend calling our API multiple times per day, approximately every 4-6 hours, to ensure your local data is as complete and up-to-date as possible. If we experience a delay in publishing, we will communicate through email as soon as possible.

The following table describes the available request parameters for dataset files.

Request parameter

Description

startDateTime

Recommendation is to set from the last time pulled.
(Format: YYYY-MM-DDTHH:MM:SSZ - timestamp UTC)

endDateTime

Recommendation is to set at time of pulling/current time.
(Format: YYYY-MM-DDTHH:MM:SSZ - timestamp UTC)

limit

Maximum limit is 1000 links per page.

Note: Our maximum data retention is 2 years. Requests for datasets with a timestamp earlier than 2 years prior will not return any results.

Example request

Example response

Notes:

  • Maximum file zip size is 300 MB. Files that exceed that limit will be split into multiple files.
  • Presigned URL time-to-live (TTL) is 60 minutes.

Dataset definitions

The tables in this section list the columns, data types, and definitions for each of the 3 available datasets.

Subscription dataset

Column

Type

Definition

subscription_event_id (pk)

string

The unique ID for each subscription event vended through this log.

subscription_event_type

string

The type of subscription event that occurred:

Start: Customer subscribed to a channel they were not subscribed to previously.
Renewal: Customer subscribed to a channel and was already active for that channel prior to the subscription event.
Cancel: Customer canceled their subscription.
Active - AR ON: Customer is active and has turned autorenew on.
Active - AR OFF: Customer is active, but has turned autorenew off.

subscription_event_time_utc

timestamp

The time the subscription event occurred, standardized to UTC.

subscription_event_time_zone

string

The time zone of the subscription marketplace.

cid

string

Anonymized customer identifier (CID). This customer identifier will persist for all events under a single parent channel to enable inter-tier movement and customer lifecycle tracking.

offer_id

string

The ID of the specific subscription offer the event occurred in relation to.

offer_name

string

The human-readable name of the offer.

offer_type

string

The type of offer.

offer_marketplace

string

The marketplace where the subscription offer was live.

offer_billing_type

string

The type of payment required for the offer:

HO: Hard offer; payment required.
FT: Free trial; no payment required.

offer_payment_amount

string

The billing amount of the offer_id.

benefit_id

string

The ID of the Prime Video benefit the offer is configured under.

channel_label

string

The name of the channel the offer is under.

Note: If this column shows a null value, and you have concerns, please contact your CAM or PsM.

channel_tier_label

string

The name of the channel the offer is under.

Note: If this column shows a null value, and you have concerns, please contact your CAM or PsM.

is_promo

int

Indicates whether an offer is on a promotion at time of event (0 = no promo, 1 = yes promo).

create_time_utc

timestamp

The time the subscription event log record was created, standardized to UTC.

last_update_time_utc

timestamp

The time the subscription event log record was last updated, standardized to UTC.

is_deleted

int

Indicates whether a record that was previously created should be deleted (0 = should persist, 1= should be deleted).

Playback dataset

Column

Type

Definition

session_id (pk)

string

The unique ID for the playback session.

marketplace_id

int

The unique ID for the playback marketplace.

marketplace_desc

string

A friendly description for the playback marketplace.

cid

string

The user identifier, anonymized with UUID.

benefit_id

string

The benefit associated with content that was streamed.

catalog_id

string

Foreign key (FK) used to join to catalog table.

subscription_offer_id

string

The subscription offer_id customer is subscribed to at time of stream (Active or ApprovalPending).

subscription_event_id

string

Foreign key (FK) to join to subscription event log to get the exact status of subscriber at time of playback (Active)

start_segment_utc

timestamp

Start of playback segment in UTC.

end_segment_utc

timestamp

End of playback segment in UTC.

seconds_viewed

int

Seconds user streamed content during playback.

position_start

double

Second of stream where playback session started.

position_end

double

Second of stream where playback session ended.

connection_type

string

Connection used by the customer to stream the content.

stream_type

string

Classification between Video-On-Demand, Live, or Just After Broadcast (JAB) streams.

device_class

string

Type of device (such as Living Room, Mobile, Web, or Others).

device_sub_class

string

Granular type of device (such as game console, smart_tv, roku).

geo_dma

string

The 3-digit geographical Designated Market Area (DMA) of the area where the stream was generated.

playback_method

string

Accounts for whether playback is Online or Offline.

quality

string

Playback quality (such as 1080p or 4K)

event_type

string

The defining event type (playback_segments)

create_time_utc

timestamp

Timestamp when record was added to table, in UTC.

last_update_time_utc

timestamp

Last updated timestamp when record was modified, in UTC.

is_deleted

int

Flag to denote to partners if the record should be deleted in their system.

Catalog dataset

Column

Type

Definition

id (pk)

string

The unique ID for the title.

marketplace_id

int

The unique ID for the offer marketplace.

benefit_id

string

The benefit assoicated with the content extended.

title

string

The title of the series/movie.

vendor_sku

string

An arbitrary identifier that the vendor generates for each of their movies or episodes.

season

integer

The season number (for episodic content).

episode

integer

The episode number.

episode_name

string

The episode name (optional).

runtime_minutes

integer

The runtime of the content viewed.

live_linear_channel_name

string

The channel name for live content.

content_type

string

Either TV or Movie.

content_quality

string

HD or SD

content_group

string

3P_SUBS

create_time_utc

timestamp

Timestamp when record was added to table, in UTC.

last_update_time_utc

timestamp

Last updated timestamp when record was modified, in UTC.

is_deleted

int

Flag to denote to partners if the record should be deleted in their system.

Sample queries

The following SQL example demonstrates how the dataset tables connect. You can join playback data to the subscription event log on the subscription_event_id column. This provides the latest subscription status prior to that stream. In this example, the catalog_id column in the playback dataset is joined to the id field in catalog_event_log to provide all catalog metadata.

The following SQL example will return the top 10 first-watched titles for customers post their having started a subscription.

Sample orchestration

If you want to automate data extraction from the Datasets API on a recurring schedule, the following sample Python script demonstrates how to make incremental API calls every 6 hours. It tracks the timestamp of the last successful request by persisting it locally, and uses that value—plus one second—as the startDateTime for the next call. The script calculates endDateTime as the current time, builds the appropriate query parameters, and sends a GET request with authentication. This approach ensures continuous, non-overlapping data retrieval across time windows and can be scheduled via cron or another job scheduler.

Can’t find what you’re looking for?

Contact us


インターナルサーバーエラー、もう一度お試しください。
セッションがタイムアウトしました

続行するにはサインインしてください

サインイン
edit