> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vidnavigator.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Extract Data from File

> Extract structured data from an uploaded file's transcript using a custom schema.

Provide a `file_id` and a JSON `schema` describing the fields to extract. The file must be processed and have a transcript available. Optionally include `what_to_extract` to guide the extraction.

**Schema format:** Each field must have `type` and `description`. Supported types: `String`, `Number`, `Boolean`, `Integer`, `Object`, `Array`, `Enum`. Max 10 root fields, max 3 nesting levels.

**Content-Type:** Accepts `application/json` or YAML (`application/x-yaml`, `text/yaml`).

**Token usage:** Set `include_usage=true` to include prompt/completion token counts in the response.

**Billing:** Each extraction consumes at least 1 analysis credit. For longer transcripts, billing scales as `ceil(total_tokens / 15000)` credits. All charges are reverted if the request fails.

Extract structured data from an uploaded file's transcript using a custom schema you define.

## Overview

Extract structured data from files you have previously uploaded to your VidNavigator library. The file must be fully processed and have a transcript available.

### How It Works

1. Provide a `file_id` and a `schema` describing the structured fields you want
2. VidNavigator reads the transcript from the uploaded file
3. VidNavigator runs AI extraction against your schema
4. You receive structured JSON matching your schema definition, plus `file_info`

### Schema Rules

* Each field must have `type` and `description`
* Supported types: `String`, `Number`, `Boolean`, `Integer`, `Object`, `Array`, `Enum`
* Maximum **10 root-level fields**
* Maximum **3 nesting levels**

<Tip>
  You can also send the request body as YAML by setting `Content-Type` to `application/x-yaml` or `text/yaml`.
</Tip>

<Note>
  This endpoint does **not** support a `transcribe` parameter. It works only on files that already have a transcript available.
</Note>

## Example Usage

<CodeGroup>
  ```bash cURL theme={null}
  curl -X POST "https://api.vidnavigator.com/v1/extract/file" \
    -H "X-API-Key: YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "file_id": "file_abc123",
      "schema": {
        "action_items": {
          "type": "Array",
          "description": "Action items and tasks mentioned in the meeting",
          "items": {
            "type": "Object",
            "description": "An action item",
            "properties": {
              "task": { "type": "String", "description": "The task description" },
              "assignee": { "type": "String", "description": "Person assigned" },
              "deadline": { "type": "String", "description": "Deadline if mentioned" }
            }
          }
        },
        "decisions_made": {
          "type": "Array",
          "description": "Key decisions made during the meeting",
          "items": { "type": "String", "description": "A decision" }
        }
      },
      "what_to_extract": "Extract action items and deadlines from this meeting",
      "include_usage": true
    }'
  ```

  ```python Python theme={null}
  import requests

  url = "https://api.vidnavigator.com/v1/extract/file"
  headers = {
      "X-API-Key": "YOUR_API_KEY",
      "Content-Type": "application/json"
  }
  data = {
      "file_id": "file_abc123",
      "schema": {
          "action_items": {
              "type": "Array",
              "description": "Action items and tasks mentioned in the meeting",
              "items": {
                  "type": "Object",
                  "description": "An action item",
                  "properties": {
                      "task": {"type": "String", "description": "The task description"},
                      "assignee": {"type": "String", "description": "Person assigned"},
                      "deadline": {"type": "String", "description": "Deadline if mentioned"}
                  }
              }
          },
          "decisions_made": {
              "type": "Array",
              "description": "Key decisions made during the meeting",
              "items": {"type": "String", "description": "A decision"}
          }
      },
      "what_to_extract": "Extract action items and deadlines from this meeting"
  }

  response = requests.post(url, headers=headers, json=data)
  result = response.json()
  ```

  ```javascript JavaScript theme={null}
  const response = await fetch('https://api.vidnavigator.com/v1/extract/file', {
    method: 'POST',
    headers: {
      'X-API-Key': 'YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      file_id: 'file_abc123',
      schema: {
        action_items: {
          type: 'Array',
          description: 'Action items and tasks mentioned in the meeting',
          items: {
            type: 'Object',
            description: 'An action item',
            properties: {
              task: { type: 'String', description: 'The task description' },
              assignee: { type: 'String', description: 'Person assigned' },
              deadline: { type: 'String', description: 'Deadline if mentioned' }
            }
          }
        },
        decisions_made: {
          type: 'Array',
          description: 'Key decisions made during the meeting',
          items: { type: 'String', description: 'A decision' }
        }
      },
      what_to_extract: 'Extract action items and deadlines from this meeting'
    })
  });

  const result = await response.json();
  ```
</CodeGroup>

## Response Example

```json theme={null}
{
  "status": "success",
  "data": {
    "action_items": [
      {
        "task": "Update the product roadmap with Q2 priorities",
        "assignee": "Sarah",
        "deadline": "Friday"
      },
      {
        "task": "Schedule customer interviews for user research",
        "assignee": "Mike",
        "deadline": "next week"
      }
    ],
    "decisions_made": [
      "Proceed with option B for the pricing model",
      "Delay the mobile app launch to Q3"
    ]
  },
  "file_info": {
    "id": "file_abc123",
    "name": "weekly-product-meeting.mp4",
    "size": 248517632,
    "type": "video/mp4",
    "duration": 3720.0,
    "status": "completed",
    "created_at": "2026-04-10T09:30:00Z",
    "updated_at": "2026-04-10T09:36:12Z",
    "has_transcript": true,
    "namespace_ids": ["ns_team_notes"],
    "namespaces": [
      {
        "id": "ns_team_notes",
        "name": "Team Notes"
      }
    ]
  },
  "usage": {
    "prompt_tokens": 3200,
    "completion_tokens": 120,
    "total_tokens": 3320
  }
}
```

<Note>
  The `usage` field is only included when `include_usage=true` in the request. `file_info` is included in the response metadata.
</Note>

## Billing

* AI extraction consumes `analysis_request` units in blocks of **15,000 total tokens**
* Formula: `ceil(total_tokens / 15000)`
* Examples:
  * `14,000` tokens -> `1` `analysis_request` unit
  * `17,000` tokens -> `2` `analysis_request` units
  * `31,000` tokens -> `3` `analysis_request` units
* There is no speech-to-text billing on this endpoint
* Failed requests are not charged


## OpenAPI

````yaml POST /extract/file
openapi: 3.0.3
info:
  title: VidNavigator Developer API
  description: >-
    The VidNavigator Developer API provides programmatic access to video
    analysis, transcription, and search capabilities.


    ## Authentication

    All endpoints require API key authentication via the `X-API-Key` header:

    ```

    X-API-Key: YOUR_API_KEY

    ```


    ## Rate Limits

    Check the documentation for the rate limits for each endpoint.


    ## Error Handling

    The API uses standard HTTP status codes and returns error responses in JSON
    format:

    ```json

    {
      "status": "error",
      "error": "error_type",
      "message": "Human readable error message"
    }

    ```
  version: 1.0.0
  contact:
    name: VidNavigator Support
    url: https://vidnavigator.com/support
    email: support@vidnavigator.com
  license:
    name: Proprietary
    url: https://vidnavigator.com/terms
servers:
  - url: https://api.vidnavigator.com/v1
    description: Production server
security:
  - ApiKeyAuth: []
tags:
  - name: Transcripts
    description: Extract transcripts from online videos
  - name: TikTok
    description: Scrape TikTok profile metadata and per-video stats
  - name: Files
    description: Manage uploaded audio/video files
  - name: Analysis
    description: AI-powered content analysis
  - name: Extraction
    description: >-
      Extract structured data from video and file transcripts using custom
      schemas
  - name: Namespaces
    description: Organize uploaded files into namespaces (folders)
  - name: Search
    description: Search videos and files using AI
  - name: System
    description: System health and information
  - name: Tweet Analysis
    description: Extract structured claims and metadata from X/Twitter tweets
paths:
  /extract/file:
    post:
      tags:
        - Extraction
      summary: Extract structured data from uploaded file
      description: >-
        Extract structured data from an uploaded file's transcript using a
        custom schema.


        Provide a `file_id` and a JSON `schema` describing the fields to
        extract. The file must be processed and have a transcript available.
        Optionally include `what_to_extract` to guide the extraction.


        **Schema format:** Each field must have `type` and `description`.
        Supported types: `String`, `Number`, `Boolean`, `Integer`, `Object`,
        `Array`, `Enum`. Max 10 root fields, max 3 nesting levels.


        **Content-Type:** Accepts `application/json` or YAML
        (`application/x-yaml`, `text/yaml`).


        **Token usage:** Set `include_usage=true` to include prompt/completion
        token counts in the response.


        **Billing:** Each extraction consumes at least 1 analysis credit. For
        longer transcripts, billing scales as `ceil(total_tokens / 15000)`
        credits. All charges are reverted if the request fails.
      operationId: extractFileData
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - file_id
                - schema
              properties:
                file_id:
                  type: string
                  description: ID of the uploaded file to extract data from
                schema:
                  $ref: '#/components/schemas/ExtractionSchema'
                what_to_extract:
                  type: string
                  description: Optional guidance for what to extract from the transcript
                  example: Extract action items and deadlines from this meeting
                include_usage:
                  type: boolean
                  default: false
                  description: When true, includes token usage statistics in the response
      responses:
        '200':
          description: Data extracted successfully
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ExtractionResponse'
        '400':
          description: Bad request - missing parameters, invalid schema, or input too large
          content:
            application/json:
              schema:
                type: object
                properties:
                  status:
                    type: string
                    enum:
                      - error
                  error:
                    type: string
                    enum:
                      - missing_parameter
                      - request_body_required
                      - invalid_schema
                      - input_too_large
                  message:
                    type: string
        '402':
          $ref: '#/components/responses/PaymentRequired'
        '403':
          $ref: '#/components/responses/AccessDenied'
        '404':
          description: File not found or transcript not available
          content:
            application/json:
              schema:
                type: object
                properties:
                  status:
                    type: string
                    enum:
                      - error
                  error:
                    type: string
                    enum:
                      - file_not_found
                      - transcript_not_available
                  message:
                    type: string
        '500':
          $ref: '#/components/responses/InternalServerError'
        '503':
          $ref: '#/components/responses/SystemOverload'
components:
  schemas:
    ExtractionSchema:
      type: object
      description: >-
        Custom extraction schema defining the fields to extract. Max 10
        root-level fields, max 3 nesting levels. Each field must have `type` and
        `description`.
      additionalProperties:
        type: object
        required:
          - type
          - description
        properties:
          type:
            type: string
            enum:
              - String
              - Number
              - Boolean
              - Integer
              - Object
              - Array
              - Enum
            description: The data type of the field
          description:
            type: string
            description: Description of what this field should contain
          properties:
            type: object
            description: Sub-fields for Object type
          items:
            type: object
            description: Item schema for Array type
          enum:
            type: array
            items:
              type: string
            description: Allowed values for Enum type
      example:
        main_topics:
          type: Array
          description: List of main topics discussed
          items:
            type: String
            description: A topic
        sentiment:
          type: Enum
          description: Overall sentiment of the video
          enum:
            - positive
            - negative
            - neutral
        key_takeaway:
          type: String
          description: The single most important takeaway
    ExtractionResponse:
      type: object
      properties:
        status:
          type: string
          enum:
            - success
        data:
          type: object
          description: >-
            Extracted data matching the provided schema. The shape of this
            object mirrors the input schema fields.
        video_info:
          allOf:
            - $ref: '#/components/schemas/VideoInfo'
          description: >-
            Video metadata (title, channel, duration, views, etc.). Only present
            for /extract/video requests.
        file_info:
          allOf:
            - $ref: '#/components/schemas/FileInfo'
          description: >-
            File metadata (name, size, type, duration, etc.). Only present for
            /extract/file requests.
        usage:
          type: object
          description: Token usage statistics. Only present when include_usage=true.
          properties:
            prompt_tokens:
              type: integer
            completion_tokens:
              type: integer
            total_tokens:
              type: integer
    VideoInfo:
      type: object
      properties:
        title:
          type: string
          description: Video title
        description:
          type: string
          description: Video description
        thumbnail:
          type: string
          format: uri
          description: Video thumbnail URL
        url:
          type: string
          format: uri
          description: Video URL
        channel:
          type: string
          description: Channel name
        channel_url:
          type: string
          format: uri
          description: Channel URL
        duration:
          type: number
          description: Duration in seconds
        views:
          type: integer
          description: View count
        likes:
          type: integer
          description: Like count
        published_date:
          type: string
          description: Publication date
        keywords:
          type: array
          items:
            type: string
          description: Video keywords/tags
        category:
          type: string
          description: Video category
        available_languages:
          type: array
          items:
            type: string
          description: Available transcript languages
        selected_language:
          type: string
          description: Selected transcript language
        carousel_info:
          $ref: '#/components/schemas/VideoCarouselInfo'
          description: >-
            Present only for carousel posts (e.g., Instagram posts with multiple
            videos)
    FileInfo:
      type: object
      properties:
        id:
          type: string
          description: Unique file identifier
        name:
          type: string
          description: File name
        size:
          type: integer
          description: File size in bytes
        type:
          type: string
          description: MIME type
        duration:
          type: number
          description: Duration in seconds
        status:
          type: string
          enum:
            - pending
            - processing
            - completed
            - failed
            - cancelled
          description: Processing status
        created_at:
          type: string
          format: date-time
          description: Upload timestamp
        updated_at:
          type: string
          format: date-time
          description: Last update timestamp
        original_file_date:
          type: string
          format: date-time
          description: Original file creation date
        has_transcript:
          type: boolean
          description: Whether transcript is available
        error_message:
          type: string
          description: Error message if processing failed
        namespace_ids:
          type: array
          items:
            type: string
          description: IDs of the namespaces this file belongs to
        namespaces:
          type: array
          items:
            $ref: '#/components/schemas/NamespaceRef'
          description: Resolved namespaces this file belongs to (with names)
    VideoCarouselInfo:
      type: object
      description: Carousel information for a single video within a carousel post
      properties:
        total_items:
          type: integer
          description: Total number of items in the carousel (videos + images)
        video_count:
          type: integer
          description: Number of videos in the carousel
        image_count:
          type: integer
          description: Number of images in the carousel
        selected_index:
          type: integer
          description: 1-based index of the selected video
    NamespaceRef:
      type: object
      description: Resolved namespace reference with ID and display name
      properties:
        id:
          type: string
          description: Namespace identifier
        name:
          type: string
          description: Namespace display name
  responses:
    PaymentRequired:
      description: Usage limit exceeded - upgrade required
      content:
        application/json:
          schema:
            type: object
            properties:
              status:
                type: string
                enum:
                  - error
              error:
                type: string
                enum:
                  - limit_exceeded
              message:
                type: string
    AccessDenied:
      description: Access denied - insufficient permissions
      content:
        application/json:
          schema:
            type: object
            properties:
              status:
                type: string
                enum:
                  - error
              error:
                type: string
                enum:
                  - access_denied
              message:
                type: string
    InternalServerError:
      description: Internal server error
      content:
        application/json:
          schema:
            type: object
            properties:
              status:
                type: string
                enum:
                  - error
              error:
                type: string
                enum:
                  - internal_server_error
              message:
                type: string
    SystemOverload:
      description: System is temporarily overloaded
      content:
        application/json:
          schema:
            type: object
            properties:
              status:
                type: string
                enum:
                  - error
              error:
                type: string
                enum:
                  - system_overload
              message:
                type: string
              retry_after_seconds:
                type: integer
                description: Recommended number of seconds to wait before retrying
              charge_user:
                type: boolean
                description: Always false for overload errors — the request was not billed
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: X-API-Key
      description: >-
        API key authentication. Include your VidNavigator API key in the
        X-API-Key header.

````