Chat completions work with both locally uploaded videos and YouTube content,
with multimodal understanding available for uploaded files.
Understanding Rich Transcript Collections
Rich Transcript Collections are specialized collections that combine multiple layers of video understanding:- Speech transcription: What’s being said in the video
- Visual scene descriptions: What’s happening visually
- On-screen text: Text and captions visible in the video
- Contextual understanding: How all these elements work together
When to Use Chat Completions
Chat completions with Rich Transcript Collections are ideal when you need to:- Answer specific questions about video content without watching the entire video
- Extract insights that span multiple segments or videos
- Generate summaries with specific focus areas or perspectives
- Find precise information using natural language rather than keyword search
- Build conversational interfaces that can discuss video content intelligently
Core Chat Completion Parameters
Essential Parameters
Messages Array
Themessages
array follows the standard chat completion format:
role
: “user”, “assistant”, or “system”content
: The message content
Model: nimbus-001
Cloudglue’snimbus-001
is a specialized model optimized for:
- Multimodal understanding: Processes speech, visual, and text content together
- Grounded responses: Answers are based on actual video content, not training data
- Citation support: Can provide specific timestamps and sources
- Conversational context: Maintains context across multiple exchanges
Force Search
Theforce_search
parameter controls whether the system searches your collections:
true
: Always searches collections before responding (recommended)false
: May respond from general knowledge without searching
Include Citations
Citations provide transparency and verifiability:true
: Returns timestamps, file references, and content snippetsfalse
: Returns only the response text
Advanced Search Filters
For precise control over what content is searched, you can use metadata filters to target specific videos in your collection:Supported Filter Operations
Thefilter
parameter allows you to constrain searches using file metadata:
-
path: JSON path to the metadata field (e.g.,
"metadata.custom_field"
or"video_info.has_audio"
) -
operator: Comparison operator to apply
Equal
/NotEqual
: Exact match comparisonLessThan
/GreaterThan
: Numeric comparisonIn
: Check if value is in a comma-separated listContainsAny
/ContainsAll
: Array operations (usevalueTextArray
)
- valueText: Single value for scalar comparisons
- valueTextArray: Array of values for array operations
Practical Example: Cooking Video Chat Bot
Let’s build a comprehensive example using cooking videos to demonstrate multi-turn conversations and advanced features.Setting Up the Collection
Multi-Turn Conversation Example
Now let’s demonstrate a realistic conversation about pasta recipes. This example shows how to build a stateful chatbot that maintains conversation history and can handle follow-up questions. The implementation demonstrates proper conversation state management, error handling, and how to structure questions for optimal results from the nimbus-001 model.- Conversation State Management: The chatbot maintains conversation history across multiple questions
- TypeScript Integration: Full type safety with proper CloudGlue SDK types
- Multiple Question Types: Shows different query patterns (techniques, ingredients, visual cues, structured output)
- Citation Handling: Demonstrates how to access and count citation sources
- Environment Configuration: Proper setup with environment variables and command-line arguments
- Error Handling: Safe access to response properties with optional chaining
- Production-Ready Structure: Modular design that can be extended for real applications
Example Conversation Output
Here’s what a realistic conversation might look like:Advanced Techniques
Using System Messages for Specialized Responses
You can guide the model’s behavior using system messages:Search Optimization with Filters
For complex queries, use metadata filters to target specific videos in your collection:Understanding Citations
When you setinclude_citations: true
, the response includes detailed references to the specific video segments that informed the answer. This provides transparency and allows users to verify information or explore the original content.
Example Citation Response
Let’s examine what a real citation response looks like for the question “What cooking techniques are used for pasta dishes?”:Complete JSON Response with Citations
Key Citation Fields
Each citation provides detailed information about the source:- collection_id: ID of the collection containing the video
- file_id: Unique identifier for the specific video file
- segment_id: ID of the specific segment within the video
- start_time / end_time: Precise timestamps in seconds
- text: Brief description of the segment’s relevance
- speech: Array of transcribed speech with speaker identification and timestamps
- visual_scene_description: Visual content descriptions (when available)
- scene_text: On-screen text detected in the segment
Using Citation Data
Citations enable powerful functionality in your applications:Best Practices
1. Design Effective Conversations
- Start broad, then narrow: Begin with overview questions, then drill into specifics
- Maintain context: Include relevant conversation history for follow-up questions
- Use specific terminology: Culinary terms, technique names, and ingredient specifics yield better results
2. Optimize Search Parameters
- Use force_search: true for accuracy when you need video-specific information
- Apply metadata filters when you need to target specific subsets of your video collection
- Set appropriate temperature: Lower (0.1-0.3) for factual responses, higher (0.5-0.7) for creative interpretations
3. Structure Complex Requests
4. Handle Multi-Video Collections
- Reference specific videos when asking comparative questions
- Use metadata filters to focus on relevant subsets of your collection
- Aggregate insights by asking for patterns across multiple sources
5. Leverage Citation Information
- Verify key facts by checking citation timestamps
- Cross-reference information across multiple cited segments
- Guide users to specific video moments for deeper learning
Try it out
Ready to start building conversational video experiences? Check out our Chat Completion API to get started with Rich Transcript Collections. Experiment with different question types:- Factual queries: “What ingredients are used?”
- Technique analysis: “How do the chefs differ in their approach?”
- Structured requests: “Create a recipe in markdown format”
- Comparative questions: “Which method is more traditional?”
Advanced Integration Patterns
For production applications, consider implementing:- Conversation persistence: Store and resume chat sessions
- Response caching: Cache common queries for better performance
- Citation indexing: Build searchable citation databases for content discovery
- Multi-collection queries: Search across different video collections simultaneously
- Response validation: Implement confidence scoring based on citation quality