Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vidnavigator.com/llms.txt

Use this file to discover all available pages before exploring further.

If transcripts are for humans, extraction is for software. The Extract API lets you define the exact fields you want, then returns structured JSON that matches your schema so you can send the result straight into a database, CRM, workflow, or AI pipeline.

What This Guide Helps You Do

By the end of this guide, you will know how to:
  • choose the right extraction endpoint
  • design a schema the model can follow reliably
  • use what_to_extract to improve focus
  • understand auto-transcription, caching, and billing
  • move from one test video to a repeatable production workflow

Why This Is More Than Just an LLM Prompt

The real challenge is not only extracting structured data. It is getting the video content into a usable text layer in the first place. With VidNavigator, the workflow is end to end:
  • fetch the transcript when the platform already exposes one
  • auto-transcribe supported non-YouTube videos with speech-to-text when no transcript exists
  • run AI extraction on that text using your schema
  • return validated JSON you can use immediately in software
That is the main difference. You are not expected to manually retrieve captions, run a separate speech-to-text step, clean the text, and then prompt a model yourself. Compared with building that flow manually, VidNavigator gives you:
  • transcript retrieval and speech-to-text built into the same extraction workflow
  • one API call instead of stitching together multiple tools
  • a fixed response shape defined by your schema
  • validated output instead of free-form text
  • prompt compilation that is cached for 2 hours for repeated schemas
This is what makes video data extraction practical at scale, especially for platforms where transcripts are inconsistent, missing, or not easily accessible.

Choose the Right Endpoint

Online Videos

Use /v1/extract/video when you have a public video URL from YouTube, Instagram, TikTok, Facebook, X, Vimeo, Dailymotion, Loom, and similar platforms.

Uploaded Files

Use /v1/extract/file when the content is already in your VidNavigator library and you want to extract from a file_id.
extract/video can auto-transcribe non-YouTube videos when transcribe=true (the default). extract/file does not transcribe for you, so the file must already have a transcript.

How Extraction Works

The Extract API runs a simple 2-step pipeline:
  1. You send a schema and optional what_to_extract instruction.
  2. VidNavigator compiles an optimized extraction plan and caches it for 2 hours.
  3. The plan is applied to the transcript for the current video or file.
  4. The response is validated so the output matches your schema.
That cache matters. The first call for a new schema usually has a small compilation overhead. Reusing the same schema on later calls is faster because the compiled prompt is reused automatically.

Start with a Small Schema

The fastest way to succeed is to start with 2-4 fields, validate the output, then expand. Here is a good starter schema for product review videos:
{
  "products_mentioned": {
    "type": "Array",
    "description": "Products mentioned in the video",
    "items": {
      "type": "String",
      "description": "Product name"
    }
  },
  "main_claim": {
    "type": "String",
    "description": "The single most important claim or conclusion made in the video"
  },
  "sentiment": {
    "type": "Enum",
    "description": "Overall sentiment toward the main product discussed",
    "enum": ["positive", "neutral", "negative"]
  }
}

Quickstart

Extract from an Online Video

Use this when you have a public video URL:
cURL
curl -X POST "https://api.vidnavigator.com/v1/extract/video" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "video_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "what_to_extract": "Focus on product names, pricing, and the speaker''s overall verdict",
    "schema": {
      "products_mentioned": {
        "type": "Array",
        "description": "Products mentioned in the video",
        "items": { "type": "String", "description": "Product name" }
      },
      "pricing_signals": {
        "type": "Array",
        "description": "Any price points, pricing plans, or cost references",
        "items": { "type": "String", "description": "Pricing detail" }
      },
      "verdict": {
        "type": "String",
        "description": "Final recommendation or overall verdict from the speaker"
      }
    }
  }'

Extract from an Uploaded File

Use this when the transcript already lives in your VidNavigator workspace:
cURL
curl -X POST "https://api.vidnavigator.com/v1/extract/file" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "file_id": "file_abc123",
    "what_to_extract": "Extract action items and owners from this meeting recording",
    "schema": {
      "action_items": {
        "type": "Array",
        "description": "Tasks agreed on during the meeting",
        "items": { "type": "String", "description": "Action item" }
      },
      "owners": {
        "type": "Array",
        "description": "People assigned to specific tasks",
        "items": { "type": "String", "description": "Person name and task if available" }
      },
      "deadline_mentions": {
        "type": "Array",
        "description": "Any due dates or timing commitments mentioned",
        "items": { "type": "String", "description": "Deadline detail" }
      }
    }
  }'

When to Use what_to_extract

Your schema defines the output shape. what_to_extract tells the model where to focus. Use it when:
  • the transcript covers multiple topics and you care about one of them
  • you want the model to prioritize a specific lens such as pricing, claims, objections, or action items
  • the field descriptions are correct but still too broad
Good examples:
  • Focus on feature claims, competitor mentions, and pricing strategy.
  • Extract only factual claims and cited statistics. Ignore jokes and off-topic banter.
  • Prioritize buyer intent, objections, and next steps from the sales call.

Schema Rules

Your schema must follow these limits:
  • maximum 10 root fields
  • maximum 3 nesting levels
  • maximum 10 subfields per Object
  • supported field types: String, Number, Boolean, Integer, Array, Object, Enum
  • every field must include both type and description
Write descriptions like instructions, not labels. Primary pricing strategy discussed, in one sentence is much better than pricing.

Best Practices That Improve Results

1. Be specific in field descriptions

Weak:
{
  "topic": {
    "type": "String",
    "description": "Topic"
  }
}
Better:
{
  "primary_topic": {
    "type": "String",
    "description": "Primary topic discussed in the video, in 5 to 10 words"
  }
}

2. Use enums when the answer should come from a fixed list

If a field is really a classification, use Enum instead of String. That makes downstream automation much easier.

3. Start simple, then expand

Do not begin with a 10-field nested schema unless you already know it works. Start with the 2-3 fields that matter most, test them on a few videos, then add more.

4. Write descriptions in the output language you want

The response is returned in the same language as your schema descriptions. If you write field descriptions in French, the output will also come back in French.

Ready-to-Use Ideas

Common extraction patterns include:
  • lead generation: companies, buyer intent, pain points, next steps
  • market research: competitor mentions, feature claims, pricing strategy, objections
  • creator analysis: hooks, quotes, sponsored mentions, content format
  • RAG ingestion: dense summary, entities, claims, topics, language code
  • fact-checking: claims, statistics, cited sources, controversy markers

Online Videos vs. Uploaded Files

POST /extract/video

Choose this endpoint when:
  • you already have a public video URL
  • you want VidNavigator to fetch the transcript for you
  • you want automatic speech-to-text fallback for supported non-YouTube videos
Important behavior:
  • transcribe=true by default
  • automatic transcription applies to non-YouTube platforms only
  • YouTube relies on platform captions and cannot use speech-to-text fallback through this endpoint

POST /extract/file

Choose this endpoint when:
  • the media is already uploaded to your VidNavigator account
  • you want to extract from private or internal content
  • you already completed transcription earlier in your workflow
Typical file workflow:
  1. Upload the file.
  2. Generate a transcript if needed.
  3. Call /v1/extract/file with the resulting file_id.

Billing and Performance

Each extraction consumes analysis_request units. One unit covers up to 15,000 total tokens; longer transcripts scale with: ceil(total_tokens / 15000) A few practical notes:
  • the first call for a new schema includes prompt compilation overhead
  • repeated calls with the same schema reuse the cached plan for 2 hours
  • extract/video may also trigger transcription_hour billing when auto-transcription is used
  • failed requests are reverted and are not billed

Troubleshooting

transcript_not_available

This usually means the source does not have a usable transcript and auto-transcription is either disabled or unavailable.
  • For non-YouTube URLs, keep transcribe=true on /extract/video.
  • For uploaded files, transcribe the file before calling /extract/file.

invalid_schema

This usually means a field is missing type or description, nesting is too deep, or the schema exceeds the limits above.

Results are too broad

Tighten the schema descriptions and add a more targeted what_to_extract instruction.

First request is slower than later requests

That is expected when a schema is new. The compiled extraction plan is cached for 2 hours, so repeated schemas are faster.

Next Steps