Skip to main content
POST
/
extract
/
file
Extract structured data from uploaded file
curl --request POST \
  --url https://api.vidnavigator.com/v1/extract/file \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: <api-key>' \
  --data '
{
  "file_id": "<string>",
  "schema": {
    "main_topics": {
      "type": "Array",
      "description": "List of main topics discussed",
      "items": {
        "type": "String",
        "description": "A topic"
      }
    },
    "sentiment": {
      "type": "Enum",
      "description": "Overall sentiment of the video",
      "enum": [
        "positive",
        "negative",
        "neutral"
      ]
    },
    "key_takeaway": {
      "type": "String",
      "description": "The single most important takeaway"
    }
  },
  "what_to_extract": "Extract action items and deadlines from this meeting",
  "include_usage": false
}
'
{
  "status": "success",
  "data": {},
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "total_tokens": 123
  }
}
Extract structured data from an uploaded file’s transcript using a custom schema you define.

Overview

Works the same as Extract Data from Video, but operates on files you have previously uploaded to your VidNavigator library. The file must be fully processed with a transcript available.

Schema Rules

  • Each field must have type and description
  • Supported types: String, Number, Boolean, Integer, Object, Array, Enum
  • Maximum 10 root-level fields
  • Maximum 3 nesting levels
You can also send the request body as YAML by setting Content-Type to application/x-yaml or text/yaml.

Example Usage

curl -X POST "https://api.vidnavigator.com/v1/extract/file" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "file_id": "file_abc123",
    "schema": {
      "action_items": {
        "type": "Array",
        "description": "Action items and tasks mentioned in the meeting",
        "items": {
          "type": "Object",
          "description": "An action item",
          "properties": {
            "task": { "type": "String", "description": "The task description" },
            "assignee": { "type": "String", "description": "Person assigned" },
            "deadline": { "type": "String", "description": "Deadline if mentioned" }
          }
        }
      },
      "decisions_made": {
        "type": "Array",
        "description": "Key decisions made during the meeting",
        "items": { "type": "String", "description": "A decision" }
      }
    },
    "what_to_extract": "Extract action items and deadlines from this meeting",
    "include_usage": true
  }'

Response Example

{
  "status": "success",
  "data": {
    "action_items": [
      {
        "task": "Update the product roadmap with Q2 priorities",
        "assignee": "Sarah",
        "deadline": "Friday"
      },
      {
        "task": "Schedule customer interviews for user research",
        "assignee": "Mike",
        "deadline": "next week"
      }
    ],
    "decisions_made": [
      "Proceed with option B for the pricing model",
      "Delay the mobile app launch to Q3"
    ]
  },
  "usage": {
    "prompt_tokens": 3200,
    "completion_tokens": 120,
    "total_tokens": 3320
  }
}
The usage field is only included when include_usage=true in the request.

Authorizations

X-API-Key
string
header
required

API key authentication. Include your VidNavigator API key in the X-API-Key header.

Body

application/json
file_id
string
required

ID of the uploaded file to extract data from

schema
object
required

Custom extraction schema defining the fields to extract. Max 10 root-level fields, max 3 nesting levels. Each field must have type and description.

Example:
{
"main_topics": {
"type": "Array",
"description": "List of main topics discussed",
"items": {
"type": "String",
"description": "A topic"
}
},
"sentiment": {
"type": "Enum",
"description": "Overall sentiment of the video",
"enum": ["positive", "negative", "neutral"]
},
"key_takeaway": {
"type": "String",
"description": "The single most important takeaway"
}
}
what_to_extract
string

Optional guidance for what to extract from the transcript

Example:

"Extract action items and deadlines from this meeting"

include_usage
boolean
default:false

When true, includes token usage statistics in the response

Response

Data extracted successfully

status
enum<string>
Available options:
success
data
object

Extracted data matching the provided schema. The shape of this object mirrors the input schema fields.

usage
object

Token usage statistics. Only present when include_usage=true.