Rich Transcripts
Understanding rich transcripts and their capabilities
What are Rich Transcripts?
Rich Transcripts are structured pieces of information that describe a video and its content.
The most familiar kind of rich transcript is a speech transcript, which composes of the verbal speech from a video. We see this commonly in places like YouTube, and nowadays many meeting software tools also offer this as a feature.
While speech transcripts are a great source of rich and deep information, they are not the only kind of rich transcript that Cloudglue offers.
Cloudglue also exposes other kinds of rich transcripts, including:
- Visual scene descriptions: A description of the visual scene in a video.
- Scene text: The text that is visible on the screen in a video.
Combining speech, with these other rich transcript types allows Cloudglue to turn your videos into a rich and structured source of information.
What can you use it for?
Speech
Speech transcripts are the most intuitive kind of rich transcript, and are the most commonly used. They are a great source of information for a variety of use cases, including:
- Searching for specific words or phrases
- Finding key takeaways or action items from a meeting
- Generating a summary of a conversation
Visual scene descriptions
Visual scene descriptions are a great way to know what is physically happening in a video. They are a great way to understand the context and setting of a video, and are a great way to identify key visual elements or objects in a scene.
Examples of visual scene descriptions include:
- Detecting important visual cues or actions
- Monitoring product placement or branding
- Identifying people or objects in the scene
- Understanding visual transitions between scenes
- Analyzing visual composition and framing
Scene text
Scene text is the text that is visible on the screen in a video. Commonly, this could be closed captioning, but it could also be other text that is visible on the screen.
Examples of scene text include:
- Closed captioning
- Text from a presentation
- Text from a document or on the background
- Menu text or product labels
Putting it all together
Separately, each of these rich transcript types are great on their own, but when combined, they can provide a very methodical and deep way to understand different scenes throughout a video. Speech allows you to know what is being said for context at a time, while visual scene descriptions and scene text allow you to really understand who, what, when, and where a scene is taking place.
Next Steps
To learn more about how to use rich transcripts, including step by step examples, you can read our Transcription Guide.
For information about how to do on-demand transcriptions, you can read our On-Demand Transcription Guide.