What are Rich Transcripts?
Rich Transcripts are structured pieces of information that describe a video and its content. The most familiar kind of rich transcript is a speech transcript, which composes of the verbal speech from a video. We see this commonly in places like YouTube, and nowadays many meeting software tools also offer this as a feature. While speech transcripts are a great source of rich and deep information, they are not the only kind of rich transcript that Cloudglue offers. Cloudglue also exposes other kinds of rich transcripts, including:- Visual scene descriptions: A description of the visual scene in a video.
- Scene text: The text that is visible on the screen in a video.
What can you use it for?
Speech
Speech transcripts are the most intuitive kind of rich transcript, and are the most commonly used. They are a great source of information for a variety of use cases, including:- Searching for specific words or phrases
- Finding key takeaways or action items from a meeting
- Generating a summary of a conversation
Visual scene descriptions
Visual scene descriptions are a great way to know what is physically happening in a video. They are a great way to understand the context and setting of a video, and are a great way to identify key visual elements or objects in a scene. Examples of visual scene descriptions include:- Detecting important visual cues or actions
- Monitoring product placement or branding
- Identifying people or objects in the scene
- Understanding visual transitions between scenes
- Analyzing visual composition and framing
Scene text
Scene text is the text that is visible on the screen in a video. Commonly, this could be closed captioning, but it could also be other text that is visible on the screen. Examples of scene text include:- Closed captioning
- Text from a presentation
- Text from a document or on the background
- Menu text or product labels