Skip to main content
Transcripts and summaries are useful for reading, but impossible to automate reliably. The Extract API lets you define exactly what you need and get clean, structured JSON back from any video.

The Problem: Video Content Doesn’t Scale

Every day, thousands of hours of video are published across YouTube, TikTok, Instagram, X, and other platforms. Buried inside are competitor mentions, product reviews, pricing signals, customer pain points, expert insights, and buying intent — data that teams across your organization need. But video data extraction today is broken:
  • Sales teams manually watch webinars to find lead signals.
  • Market researchers hire interns to catalog competitor mentions.
  • Content teams scrub through hours of footage to pull quotes.
When teams try to automate with standard LLMs, they get inconsistent, free-form text that changes shape with every call — unusable for databases, CRMs, or pipelines. VidNavigator’s Video Data Extraction API solves this. You define a JSON schema describing exactly the data points you need, and the API returns clean, validated, structured JSON that matches your schema every single time. No prompt engineering. No parsing code. No inconsistency.

Key Takeaways

  • Define a custom schema (JSON) to extract exactly the data you need from any video.
  • 2-phase AI pipeline: prompt compilation (cached) → structured extraction (Pydantic-enforced) guarantees consistent output.
  • Works with online videos (/v1/extract/video) and uploaded files (/v1/extract/file) — supporting YouTube, TikTok, Instagram, X, and many more.
  • Prompt caching means repeat extractions are instant: define a schema once, extract from hundreds of videos.
  • Cost-effective: At 100 extractions per credit, processing videos at scale is extremely cheap.

Why Not Just Use standard LLMs?

You could paste a transcript into a standard LLM and ask for structured data. It works for one video, but it breaks at scale:
  • Inconsistent output shape: Standard LLMs return slightly different JSON keys, structures, and formatting every time. You can’t reliably pipe it into a database or API.
  • No schema enforcement: The Extract API uses Pydantic to enforce your exact schema. Every response is guaranteed to match your field names, types, and nesting.
  • No transcript pipeline: You have to manually get the transcript, paste it, and copy the result. The Extract API handles transcript retrieval, caching, and extraction in one single call.
  • No prompt caching: Every standard LLM call re-generates the prompt. VidNavigator caches the optimized extraction prompt, so repeat schemas are faster and cheaper.
  • No batch automation: The Extract API is a REST endpoint. Loop over 1,000 video URLs, feed results into your pipeline. No copy-paste needed.

How It Works: The 2-Phase Pipeline

Phase 1 — Prompt Compilation (one-time, cached)

The API takes your schema and optional what_to_extract instruction and generates an optimized pair of AI prompts. This compiled “extraction plan” is cached with a 2-hour TTL using a fingerprint of your schema. The next time you send the exact same schema within the cache window, the compilation step is skipped entirely.

Phase 2 — Structured Extraction

The cached prompt template is filled with the video’s transcript text, then sent to the AI model with strict structured output enforcement. The result is validated JSON that exactly matches your custom schema extraction.

Use Case Templates

1. Lead Generation

Built for sales and BD teams. Extract companies, decision-makers, pricing signals, pain points, buying intent, and calls-to-action from sales calls, webinars, or product demos.
{
  "company_names": {
    "type": "Array",
    "description": "Names of any companies mentioned",
    "items": { "type": "String", "description": "Company name" }
  },
  "decision_makers": {
    "type": "Array",
    "description": "Names and roles of key decision-makers",
    "items": { "type": "String", "description": "Person and their role" }
  },
  "pain_points": {
    "type": "Array",
    "description": "Customer pain points or challenges discussed",
    "items": { "type": "String", "description": "Specific pain point" }
  },
  "pricing_signals": {
    "type": "Array",
    "description": "Mentions of budget, pricing, or cost",
    "items": { "type": "String", "description": "Pricing discussion" }
  },
  "buying_intent": {
    "type": "Enum",
    "description": "Level of buying intent expressed",
    "values": ["High", "Medium", "Low", "None"]
  },
  "calls_to_action": {
    "type": "Array",
    "description": "Next steps or action items agreed upon",
    "items": { "type": "String", "description": "Action item" }
  }
}

2. Market Research

Competitive intelligence for product and strategy teams. Map competitor mentions, feature claims, pricing strategies, target audiences, and objections addressed in industry talks and reviews.
{
  "competitors_mentioned": {
    "type": "Array",
    "description": "List of competitors discussed",
    "items": { "type": "String", "description": "Competitor name" }
  },
  "feature_claims": {
    "type": "Array",
    "description": "Key product features or capabilities highlighted",
    "items": { "type": "String", "description": "Feature claim" }
  },
  "pricing_strategies": {
    "type": "Array",
    "description": "Details about product pricing or tiers",
    "items": { "type": "String", "description": "Pricing detail" }
  },
  "target_audience": {
    "type": "String",
    "description": "The primary audience the product is aimed at"
  },
  "objections_addressed": {
    "type": "Array",
    "description": "Customer objections or concerns answered",
    "items": { "type": "String", "description": "Objection addressed" }
  }
}

3. Content & Creator Analysis

Designed for marketing and content teams. Capture hooks, key quotes, content format, sponsored product mentions, and audience engagement cues from creator videos and branded content.
{
  "hooks": {
    "type": "Array",
    "description": "Engaging statements used to grab attention early",
    "items": { "type": "String", "description": "Hook quote" }
  },
  "key_quotes": {
    "type": "Array",
    "description": "Memorable or highly quotable statements",
    "items": { "type": "String", "description": "Quote" }
  },
  "content_format": {
    "type": "Enum",
    "description": "The primary format or style of the video",
    "values": ["Tutorial", "Review", "Vlog", "Interview", "Other"]
  },
  "sponsored_mentions": {
    "type": "Array",
    "description": "Products or brands explicitly sponsored or promoted",
    "items": { "type": "String", "description": "Brand or product" }
  },
  "engagement_cues": {
    "type": "Array",
    "description": "Calls to like, subscribe, comment, or interact",
    "items": { "type": "String", "description": "Engagement cue" }
  }
}

4. AI Pipeline / RAG Ingestion

For AI builders and data engineers. Produce vector-ready summaries, named entities, factual claims, topic labels, language codes, and sentiment.
{
  "summary_vector_ready": {
    "type": "String",
    "description": "A dense summary optimized for vector embedding"
  },
  "named_entities": {
    "type": "Array",
    "description": "People, organizations, or locations mentioned",
    "items": { "type": "String", "description": "Entity name" }
  },
  "factual_claims": {
    "type": "Array",
    "description": "Verifiable factual statements made",
    "items": { "type": "String", "description": "Claim" }
  },
  "topic_labels": {
    "type": "Array",
    "description": "High-level topic categories",
    "items": { "type": "String", "description": "Topic" }
  },
  "language_code": {
    "type": "String",
    "description": "ISO 639-1 language code of the primary language"
  },
  "sentiment": {
    "type": "Enum",
    "description": "Overall sentiment of the content",
    "values": ["Positive", "Neutral", "Negative"]
  }
}

5. Brand & E-Commerce Monitoring

Track brand mentions, promotional codes, creator recommendations, audience demographics cues, and purchase intent signals.
{
  "brand_mentions": {
    "type": "Array",
    "description": "Specific brands or products mentioned",
    "items": { "type": "String", "description": "Brand name" }
  },
  "promotional_codes": {
    "type": "Array",
    "description": "Discount or promo codes shared",
    "items": { "type": "String", "description": "Promo code" }
  },
  "creator_recommendations": {
    "type": "Array",
    "description": "Products explicitly recommended by the creator",
    "items": { "type": "String", "description": "Recommendation" }
  },
  "demographic_cues": {
    "type": "Array",
    "description": "Hints about the intended target audience",
    "items": { "type": "String", "description": "Demographic insight" }
  },
  "purchase_intent_signals": {
    "type": "Array",
    "description": "Statements indicating intent to buy or suggesting others buy",
    "items": { "type": "String", "description": "Purchase signal" }
  }
}

Online Videos vs. Uploaded Files

Extract from Online Videos

Use the /v1/extract/video endpoint to extract data directly from public video URLs (YouTube, TikTok, Instagram, X, etc.). Note: The /v1/extract/video endpoint requires the video to already have a transcript. If the video doesn’t have native captions, call /v1/transcribe first to generate a transcript via speech-to-text.

Extract from Uploaded Files

The /v1/extract/file endpoint works identically to /v1/extract/video but takes a file_id instead of a URL. The file must be uploaded and transcribed first via the file upload endpoints.

Schema Rules

To ensure high accuracy and strict adherence, your JSON schemas must follow these rules:
  • Max 10 root fields
  • Max 3 nesting levels (level 3 must be primitive types only)
  • Max 10 subfields per Object
  • Supported types: String, Number, Boolean, Integer, Array, Object, Enum
  • Every field requires both type and description

Prompt Caching & Performance

Every extraction schema you send is fingerprinted. The resulting hash is used to look up a previously compiled prompt plan.
  • The first call with a new schema has ~2–3s of compilation overhead.
  • All subsequent calls with the same schema skip compilation entirely (Instant, or 0 seconds compilation time).
  • Plans are cached for 2 hours (TTL) and are automatically recompiled when they expire.
This means your pipeline gets faster the more you use it. Once a schema is compiled, every subsequent video processed with that schema benefits from instant prompt reuse.

Best Practices

  • Write specific field descriptions: The better your descriptions, the more accurate the extraction. Instead of “topic”, write “Primary topic discussed in the video, in 5–10 words”.
  • Use Enum types for classification fields instead of free-text Strings. Enums constrain the AI output to your predefined values, eliminating inconsistency.
  • Start with a simple schema and add fields iteratively. Test with 2–3 fields first, verify accuracy, then expand.
  • Use what_to_extract to guide the AI’s focus. This optional instruction steers the model toward specific parts of the transcript, improving relevance.
  • Write descriptions in your target language. The output is returned in the same language as your schema descriptions (99+ languages supported).

Pricing: Built for Scale

Each extraction counts as 1 video analysis. With VidNavigator:
  • 1 credit = 100 video extractions/analyses
  • Instant (0s) compilation on cached schemas.
This pricing model is designed to scale effortlessly. For example, our Voyager Plan offers 1,200 credits for $300. Because each credit processes 100 videos, that means you can process up to 120,000 videos, scaling down to as little as $0.0025 per video extraction. Compare that to the cost of a research analyst manually watching and cataloging video content, or the engineering time to build and maintain a custom pipeline. Ready to build? Head over to the Extract Video API Reference to see code examples and start extracting structured data today!