Skip to main content
GET
/
collections
/
{collection_id}
/
videos
/
{file_id}
/
media-descriptions
Retrieve media description data for a specific file in a collection
curl --request GET \
  --url https://api.cloudglue.dev/v1/collections/{collection_id}/videos/{file_id}/media-descriptions \
  --header 'Authorization: Bearer <token>'
{
  "collection_id": "<string>",
  "file_id": "<string>",
  "thumbnail_url": "<string>",
  "content": "<string>",
  "title": "<string>",
  "summary": "<string>",
  "duration_seconds": 123,
  "segment_summary": [
    {
      "title": "<string>",
      "summary": "<string>",
      "start_time": 123,
      "end_time": 123,
      "thumbnail_url": "<string>"
    }
  ],
  "chapters": [
    {
      "index": 1,
      "start_time": 1,
      "end_time": 1,
      "description": "<string>"
    }
  ],
  "shots": [
    {
      "index": 1,
      "start_time": 1,
      "end_time": 1
    }
  ],
  "total_chapters": 1,
  "total_shots": 1,
  "visual_scene_description": [
    {
      "text": "<string>",
      "start_time": 123,
      "end_time": 123
    }
  ],
  "scene_text": [
    {
      "text": "<string>",
      "start_time": 123,
      "end_time": 123
    }
  ],
  "speech": [
    {
      "speaker": "<string>",
      "text": "<string>",
      "start_time": 123,
      "end_time": 123,
      "words": [
        {
          "word": "<string>",
          "start_time": 123,
          "end_time": 123
        }
      ]
    }
  ],
  "audio_description": [
    {
      "text": "<string>",
      "start_time": 123,
      "end_time": 123
    }
  ]
}
For details on how to create a video collection, see Create Collection

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

collection_id
string
required

The ID of the collection

file_id
string
required

The ID of the file

Query Parameters

response_format
enum<string>

Format for the response

Available options:
json,
markdown
start_time_seconds
number

Start time in seconds to filter out results by

end_time_seconds
number

End time in seconds to filter out results by

modalities
enum<string>[]

The modalities to output in the response. Can be used to return smaller data sets. Comma separated list of strings. Defaults to all modalities available / previously extracted. Accepted values are speech,visual_scene_description,scene_text, audio_description, summary, segment_summary, title

Available options:
speech,
visual_scene_description,
scene_text,
audio_description,
summary,
segment_summary,
title
include_thumbnails
boolean

When true, include a file-level thumbnail_url on the response and per-segment thumbnail_url on each segment_summary entry

include_word_timestamps
boolean
default:false

When true, include a words array on each speech entry with word-level start_time and end_time. Not available for YouTube sources. Only applies when response_format=json.

include_chapters
boolean
default:false

Include narrative chapters in the response (when segmentation strategy is 'narrative')

include_shots
boolean
default:false

Include shot boundaries in the response (when segmentation strategy is 'shot-detector')

Response

Media description data

collection_id
string
required

Unique identifier for the collection

file_id
string
required

Unique identifier for the file

thumbnail_url
string<uri>

URL of the file-level thumbnail for the video. Only present when include_thumbnails=true.

content
string

Content string returned based on formatting, e.g. set to markdown text when response_format=markdown is requested

title
string

Generated title of the video

summary
string

Generated video level summary

duration_seconds
number

Duration of the video in seconds

segment_summary
object[]

Array of summary information for each segment of the video

chapters
object[]

Array of narrative chapters (only present when include_chapters=true and segmentation strategy is 'narrative')

shots
object[]

Array of shot boundaries (only present when include_shots=true and segmentation strategy is 'shot-detector')

total_chapters
integer

Total number of chapters (only present when include_chapters=true and segmentation strategy is 'narrative')

Required range: x >= 0
total_shots
integer

Total number of shots (only present when include_shots=true and segmentation strategy is 'shot-detector')

Required range: x >= 0
visual_scene_description
object[]

Array of visual descriptions

scene_text
object[]

Array of scene text extractions

speech
object[]

Array of speech transcriptions

audio_description
object[]

Array of audio descriptions