Extracting Structured Data from Cooking Videos: A Hands-On Guide
Introduction
Video content contains a wealth of information that, when properly structured, can power advanced search, analytics, and insights. In this guide, we’ll walk through extracting structured data from cooking videos using Cloudglue’s entity extraction capabilities. By the end, you’ll learn how to:- Create an entity collection for cooking videos
- Define a structured schema for recipe information
- Extract detailed data from YouTube cooking videos
- Analyze and visualize patterns across multiple videos
Prerequisites
Creating a Recipe Entity Collection
First, we’ll create a collection specifically for cooking videos with a schema that captures recipe details, equipment, actions, ingredients, and cooking phases.Adding YouTube Cooking Videos
Now, let’s add 10 different cooking videos to our collection. We’ll choose a variety of cuisines and meal types.Waiting for Processing and Retrieving Entities
The extraction process runs asynchronously. Let’s monitor and wait for the extraction to complete.Extracting and Aggregating the Data
Now that our videos are processed, let’s extract the entities and organize them for analysis.

Analyzing the Data
1. Ingredient Count Comparison by Cuisine Type
Let’s compare the average number of ingredients used in each cuisine type.
2. Phase Duration Analysis
Let’s analyze how much time each video spends in different cooking phases.
- Active cooking takes up a significant portion (17-38%) of the videos
- Videos vary considerably in how they balance explanation (16-21%) and prep (17-33%)
- Some videos allocate a small portion (around 8%) to tasting phase
- The distribution of phases can indicate the style and target audience of a cooking video:
- Videos with more prep time may be better for beginners
- Videos with more active cooking focus on the technical aspects
- Videos with more explanation time provide more context and background
3. Action Complexity Timeline for a Single Video
Let’s select one representative video and chart how the complexity (measured by actions per segment) changes over time.
- The peak complexity moments in the video (occurring at specific times with higher numbers of distinct actions)
- How cooking phases relate to action complexity (with notable activity in both prep and active cooking phases)
- The rhythm of the recipe - showing multiple spikes of 4-5 actions throughout the video interspersed with less complex segments
- Potential points where viewers might need to pause or rewatch to follow along
Exploring Advanced Queries
Besides the visualizations above, you can perform more targeted analyses with simple pandas queries:Conclusion
In this guide, we’ve demonstrated how to:- Create a structured extraction schema for cooking videos
- Process multiple YouTube videos with Cloudglue’s entity extraction
- Analyze the extracted data to uncover insights about ingredients, cooking phases, and action complexity
- Visualize the results using matplotlib
Next Steps
- Try different extraction schemas for other video domains (e.g., sports analysis, product reviews)
- Implement real-time dashboards using extracted data
- Build a search interface that lets users find specific segments across videos
- Create a recommendation system based on extracted entities