Extract structured data from an online video’s transcript using a custom schema.
Provide a video_url and a JSON schema describing the fields to extract. Optionally include what_to_extract to guide the extraction.
Auto-transcription: For non-YouTube videos without an existing transcript (e.g. Instagram, TikTok, Facebook), the API automatically transcribes the video audio when transcribe is true (the default). This uses speech-to-text credits (video_uploads quota). YouTube videos rely on platform captions and cannot be auto-transcribed. Set transcribe=false to disable this behavior.
Schema format: Each field must have type and description. Supported types: String, Number, Boolean, Integer, Object, Array, Enum. Max 10 root fields, max 3 nesting levels.
Content-Type: Accepts application/json or YAML (application/x-yaml, text/yaml).
Token usage: Set include_usage=true to include prompt/completion token counts in the response.
Billing: Each extraction consumes at least 1 analysis credit. For longer transcripts, billing scales as ceil(total_tokens / 15000) credits. If auto-transcription is triggered, speech-to-text hours are also charged based on video duration. All charges are reverted if the request fails.
Extract structured data from an online video’s transcript using a custom schema you define.Documentation Index
Fetch the complete documentation index at: https://docs.vidnavigator.com/llms.txt
Use this file to discover all available pages before exploring further.
video_url and a schema defining the fields to extractvideo_infotype and descriptionString, Number, Boolean, Integer, Object, Array, Enumtranscribe parameter controls whether VidNavigator should automatically fall back to speech-to-text when a transcript is not available.
transcribe=true by defaulttranscribe=false, the request will fail when no transcript is available instead of triggering speech-to-text processing.
what_to_extract to provide additional context to the AI about what to focus on:
transcribe=false when you want extraction to run only if a platform transcript already exists:
usage field is only included when include_usage=true in the request. video_info is included in the response metadata.analysis_request units in blocks of 15,000 total tokensceil(total_tokens / 15000)14,000 tokens -> 1 analysis_request unit17,000 tokens -> 2 analysis_request units31,000 tokens -> 3 analysis_request unitstranscribe=true triggers speech-to-text fallback, speech-to-text is charged separately as transcription_hour usage based on video/audio durationAPI key authentication. Include your VidNavigator API key in the X-API-Key header.
URL of the video to extract data from
"https://youtube.com/watch?v=dQw4w9WgXcQ"
Custom extraction schema defining the fields to extract. Max 10 root-level fields, max 3 nesting levels. Each field must have type and description.
{
"main_topics": {
"type": "Array",
"description": "List of main topics discussed",
"items": {
"type": "String",
"description": "A topic"
}
},
"sentiment": {
"type": "Enum",
"description": "Overall sentiment of the video",
"enum": ["positive", "negative", "neutral"]
},
"key_takeaway": {
"type": "String",
"description": "The single most important takeaway"
}
}Optional guidance for what to extract from the transcript
"Extract the main topics and any product names mentioned"
When true, automatically transcribes the video audio if no platform transcript is available. Applies to non-YouTube videos only (Instagram, TikTok, Facebook, X, etc.). Uses speech-to-text credits based on video duration.
When true, includes token usage statistics in the response
Data extracted successfully
success Extracted data matching the provided schema. The shape of this object mirrors the input schema fields.
Video metadata (title, channel, duration, views, etc.). Only present for /extract/video requests.
File metadata (name, size, type, duration, etc.). Only present for /extract/file requests.
Token usage statistics. Only present when include_usage=true.