What are Rich Transcripts?

Rich Transcripts are structured pieces of information that describe a video and its content.

The most familiar kind of rich transcript is a speech transcript, which composes of the verbal speech from a video. We see this commonly in places like YouTube, and nowadays many meeting software tools also offer this as a feature.

While speech transcripts are a great source of rich and deep information, they are not the only kind of rich transcript that Cloudglue offers.

Cloudglue also exposes other kinds of rich transcripts, including:

  • Visual scene descriptions: A description of the visual scene in a video.
  • Scene text: The text that is visible on the screen in a video.

Combining speech, with these other rich transcript types allows Cloudglue to turn your videos into a rich and structured source of information.

What can you use it for?

Speech

Speech transcripts are the most intuitive kind of rich transcript, and are the most commonly used. They are a great source of information for a variety of use cases, including:

  • Searching for specific words or phrases
  • Finding key takeaways or action items from a meeting
  • Generating a summary of a conversation

Visual scene descriptions

Visual scene descriptions are a great way to know what is physically happening in a video. They are a great way to understand the context and setting of a video, and are a great way to identify key visual elements or objects in a scene.

Examples of visual scene descriptions include:

  • Detecting important visual cues or actions
  • Monitoring product placement or branding
  • Identifying people or objects in the scene
  • Understanding visual transitions between scenes
  • Analyzing visual composition and framing

Scene text

Scene text is the text that is visible on the screen in a video. Commonly, this could be closed captioning, but it could also be other text that is visible on the screen.

Examples of scene text include:

  • Closed captioning
  • Text from a presentation
  • Text from a document or on the background
  • Menu text or product labels

Putting it all together

Separately, each of these rich transcript types are great on their own, but when combined, they can provide a very methodical and deep way to understand different scenes throughout a video. Speech allows you to know what is being said for context at a time, while visual scene descriptions and scene text allow you to really understand who, what, when, and where a scene is taking place.

Next Steps

To learn more about how to use rich transcripts, including step by step examples, you can read our Transcription Guide.

For information about how to do on-demand transcriptions, you can read our On-Demand Transcription Guide.