Extract Data from Video
Extract structured data from an online video’s transcript using a custom schema.
Provide a video_url and a JSON schema describing the fields to extract. Optionally include what_to_extract to guide the extraction.
Auto-transcription: For non-YouTube videos without an existing transcript (e.g. Instagram, TikTok, Facebook), the API automatically transcribes the video audio when transcribe is true (the default). This uses speech-to-text credits (video_uploads quota). YouTube videos rely on platform captions and cannot be auto-transcribed. Set transcribe=false to disable this behavior.
Schema format: Each field must have type and description. Supported types: String, Number, Boolean, Integer, Object, Array, Enum. Max 10 root fields, max 3 nesting levels.
Content-Type: Accepts application/json or YAML (application/x-yaml, text/yaml).
Token usage: Set include_usage=true to include prompt/completion token counts in the response.
Billing: Each extraction consumes at least 1 analysis credit. For longer transcripts, billing scales as ceil(total_tokens / 15000) credits. If auto-transcription is triggered, speech-to-text hours are also charged based on video duration. All charges are reverted if the request fails.
Overview
The extraction endpoint lets you pull structured, typed data from any video transcript by providing a JSON schema describing the fields you need. This is ideal for building automated pipelines that need consistent, machine-readable output from video content.How It Works
- Provide a
video_urland aschemadefining the fields to extract - VidNavigator fetches the platform transcript when available
- For supported non-YouTube sources, it can auto-transcribe audio when no transcript exists
- VidNavigator runs AI extraction against your schema
- You receive structured JSON matching your schema definition, plus
video_info
Schema Rules
- Each field must have
typeanddescription - Supported types:
String,Number,Boolean,Integer,Object,Array,Enum - Maximum 10 root-level fields
- Maximum 3 nesting levels
Automatic Transcription
Thetranscribe parameter controls whether VidNavigator should automatically fall back to speech-to-text when a transcript is not available.
transcribe=trueby default- applies to non-YouTube videos only
- useful for platforms like Instagram, TikTok, Facebook, X, and similar sources
- YouTube extraction relies on platform captions and does not support speech-to-text fallback through this endpoint
transcribe=false, the request will fail when no transcript is available instead of triggering speech-to-text processing.
Example Usage
Basic Extraction
With Extraction Guidance
Usewhat_to_extract to provide additional context to the AI about what to focus on:
Disable Auto-Transcription
Usetranscribe=false when you want extraction to run only if a platform transcript already exists:
Response Example
usage field is only included when include_usage=true in the request. video_info is included in the response metadata.Billing
- AI extraction consumes
analysis_requestunits in blocks of 15,000 total tokens - Formula:
ceil(total_tokens / 15000) - Examples:
14,000tokens ->1analysis_requestunit17,000tokens ->2analysis_requestunits31,000tokens ->3analysis_requestunits
- If
transcribe=truetriggers speech-to-text fallback, speech-to-text is charged separately astranscription_hourusage based on video/audio duration - Failed requests are not charged
Use Cases
Data Pipelines
Content Cataloging
Market Research
Compliance Monitoring
Authorizations
API key authentication. Include your VidNavigator API key in the X-API-Key header.
Body
URL of the video to extract data from
"https://youtube.com/watch?v=dQw4w9WgXcQ"
Custom extraction schema defining the fields to extract. Max 10 root-level fields, max 3 nesting levels. Each field must have type and description.
{
"main_topics": {
"type": "Array",
"description": "List of main topics discussed",
"items": {
"type": "String",
"description": "A topic"
}
},
"sentiment": {
"type": "Enum",
"description": "Overall sentiment of the video",
"enum": ["positive", "negative", "neutral"]
},
"key_takeaway": {
"type": "String",
"description": "The single most important takeaway"
}
}Optional guidance for what to extract from the transcript
"Extract the main topics and any product names mentioned"
When true, automatically transcribes the video audio if no platform transcript is available. Applies to non-YouTube videos only (Instagram, TikTok, Facebook, X, etc.). Uses speech-to-text credits based on video duration.
When true, includes token usage statistics in the response
Response
Data extracted successfully
success Extracted data matching the provided schema. The shape of this object mirrors the input schema fields.
Video metadata (title, channel, duration, views, etc.). Only present for /extract/video requests.
File metadata (name, size, type, duration, etc.). Only present for /extract/file requests.
Token usage statistics. Only present when include_usage=true.

