Microsoft Foundry Blog

8 MIN READ

Enhancing Workplace Safety and Efficiency with Azure AI Foundry's Content Understanding

Microsoft

Dec 22, 2024

In today’s fast-paced industries, maintaining worker efficiency and safety is more than a priority—it’s a necessity. Azure AI Foundry’s Content Understanding service, specifically the Video Shot Analysis template, represents a significant advancement in workplace analytics. By leveraging Generative AI to analyze video data, businesses can transform their operations, ensuring not only productivity but also the well-being of their workforce.

What is Azure AI Foundry’s Content Understanding?

Azure AI Foundry’s Content Understanding service is a cutting-edge AI platform that can process and analyze multimodal data formats, including text, audio, images, documents, and video. This capability enables businesses to extract actionable insights from diverse data types without requiring specialized AI skills. Whether analyzing customer feedback, automating workflows, or enhancing video content understanding, Azure AI Foundry makes it easy to derive value from complex datasets. Learn more about Azure AI Foundry’s Content Understanding service.

Revolutionizing Video Analytics with Content Understanding

Azure AI Foundry’s Content Understanding service enables organizations to extract actionable insights from videos using the public preview Video Shot Analysis template. This capability breaks down video footage into segments (e.g., one-minute intervals) and analyzes them based on a user-defined schema, delivering structured data to inform decision-making. The service can also analyze environmental audio, providing insights into noise levels that might impact safety or productivity, alongside generating transcriptions for actionable understanding of spoken content.

Key features of Video Shot Analysis include:

Schema Customization: Users can define fields to capture specific metrics such as worker actions, posture, and safety risks.
High Accuracy: Advanced AI models ensure precise detection and classification of actions.
Validated Outputs: Structured JSON outputs facilitate integration with workflows and applications.

Custom schema definition interface in Azure AI Foundry, enabling tailored analysis of workplace video data.

Learn more about defining schemas in Azure AI Foundry Content Understanding - Getting Started.

Analyzing Worker Efficiency and Safety: A Closer Look

The Video Shot Analysis template allows users to define up to 10 fields for detailed analysis. For worker safety and efficiency, our schema included fields like:

Dominant Action: Identifies the most frequent activities (e.g., lifting, walking) within each video segment.
Worker Posture Analysis: Highlights ergonomic behaviors, such as bending and standing upright.
Safety Risks: Detects hazards like improper lifting techniques.
Load Weights: Tracks weights handled by workers for insights into workload distribution.
Environmental Noise Levels: Monitors workplace noise to ensure compliance with safety standards.

Each field captures essential data to evaluate workplace conditions, helping organizations enhance both safety and performance.

Visualizing Results: Insights in Action

Analysis results showcasing dominant actions, safety risks, and environmental insights extracted from video footage.

Once the schema is defined, the analyzer processes video footage to extract insights. For instance, in a warehouse environment:

The analyzer identified dominant actions such as lifting (60%) and walking (30%).
Load weights were categorized into approximate ranges (e.g., 25 lbs, 30 lbs).
No significant fatigue indicators were detected, and the noise levels were deemed typical for the setting.

Explore more about the Video Shot Analysis template.

JSON Output: Structured Data for Seamless Integration

{
	"id": "b45c3ee9-c239-4df5-b1dc-fc8d7c3ecffa",
	"status": "Succeeded",
	"result": {
		"analyzerId": "auto-labeling-model-1734748393327-893",
		"apiVersion": "2024-12-01-preview",
		"createdAt": "2024-12-21T02:34:07Z",
		"warnings": [],
		"contents": [
			{
				"markdown": "# Shot 0:0.0 => 1:1.772\n## Transcript\n```\nWEBVTT\n\n```\n## Key Frames\n- 0:2.934 ![](keyFrame.2934.jpg)\n- 0:5.867 ![](keyFrame.5867.jpg)\n- 0:8.801 ![](keyFrame.8801.jpg)\n- 0:11.734 ![](keyFrame.11734.jpg)\n- 0:14.668 ![](keyFrame.14668.jpg)\n- 0:17.602 ![](keyFrame.17602.jpg)\n- 0:20.535 ![](keyFrame.20535.jpg)\n- 0:23.469 ![](keyFrame.23469.jpg)\n- 0:26.402 ![](keyFrame.26402.jpg)\n- 0:29.336 ![](keyFrame.29336.jpg)\n- 0:32.270 ![](keyFrame.32270.jpg)\n- 0:35.203 ![](keyFrame.35203.jpg)\n- 0:38.137 ![](keyFrame.38137.jpg)\n- 0:41.71 ![](keyFrame.41071.jpg)\n- 0:44.4 ![](keyFrame.44004.jpg)\n- 0:46.938 ![](keyFrame.46938.jpg)\n- 0:49.871 ![](keyFrame.49871.jpg)\n- 0:52.805 ![](keyFrame.52805.jpg)\n- 0:55.739 ![](keyFrame.55739.jpg)\n- 0:58.672 ![](keyFrame.58672.jpg)",
				"fields": {
					"dominantAction": {
						"type": "array",
						"valueArray": [
							{
								"type": "string",
								"valueString": "lifting box (60%)"
							},
							{
								"type": "string",
								"valueString": "scanning items (20%)"
							},
							{
								"type": "string",
								"valueString": "walking (20%)"
							}
						]
					},
					"workerPostureAnalysis": {
						"type": "array",
						"valueArray": [
							{
								"type": "string",
								"valueString": "Worker maintained upright posture for 70% of the time, with 30% observed bending while lifting."
							}
						]
					},
					"actionScore": {
						"type": "number",
						"valueNumber": 85
					},
					"equipmentUsage": {
						"type": "array",
						"valueArray": [
							{
								"type": "string",
								"valueString": "Equipment_Used_Properly: Handheld scanner used for logging items."
							},
							{
								"type": "string",
								"valueString": "Manual_Handling_Detected: Lifting boxes manually."
							}
						]
					},
					"actionContext": {
						"type": "string",
						"valueString": "The worker operated in a warehouse environment with shelves nearby, handling medium-sized boxes. Lighting was adequate, and the floor appeared clean and unobstructed. The worker used a handheld scanner to log items, occasionally bending to lift boxes from a pallet."
					},
					"loadWeights": {
						"type": "array",
						"valueArray": [
							{
								"type": "string",
								"valueString": "30 lbs"
							},
							{
								"type": "string",
								"valueString": "30 lbs"
							},
							{
								"type": "string",
								"valueString": "30 lbs"
							},
							{
								"type": "string",
								"valueString": "30 lbs"
							}
						]
					},
					"safetyRisks": {
						"type": "array",
						"valueArray": [
							{
								"type": "string",
								"valueString": "Improper_Lifting: Observed bending without knee support."
							}
						]
					},
					"workerActionsSummary": {
						"type": "array",
						"valueArray": [
							{
								"type": "string",
								"valueString": "lifting box: { \"instances\": [ { \"duration\": 5, \"efficiency\": \"Within_Benchmark\" }, { \"duration\": 7, \"efficiency\": \"Within_Benchmark\" }, { \"duration\": 6, \"efficiency\": \"Within_Benchmark\" }, { \"duration\": 5, \"efficiency\": \"Within_Benchmark\" } ], \"total_duration\": 23, \"count\": 4, \"efficiency\": \"Within_Benchmark\" }"
							},
							{
								"type": "string",
								"valueString": "scanning items: { \"instances\": [ { \"duration\": 3, \"efficiency\": \"Within_Benchmark\" }, { \"duration\": 3, \"efficiency\": \"Within_Benchmark\" } ], \"total_duration\": 6, \"count\": 2, \"efficiency\": \"Within_Benchmark\" }"
							},
							{
								"type": "string",
								"valueString": "walking: { \"instances\": [ { \"duration\": 4, \"efficiency\": \"Within_Benchmark\" }, { \"duration\": 5, \"efficiency\": \"Within_Benchmark\" } ], \"total_duration\": 9, \"count\": 2, \"efficiency\": \"Within_Benchmark\" }"
							}
						]
					},
					"environmentalNoiseLevel": {
						"type": "string",
						"valueString": "Average noise level: 70 dB, typical for warehouse operations with occasional peaks due to equipment use."
					},
					"workerFatigueIndicators": {
						"type": "string",
						"valueString": "No significant signs of fatigue detected. Actions maintained consistent speed and efficiency."
					}
				},
				"kind": "audioVisual",
				"startTimeMs": 0,
				"endTimeMs": 61772,
				"width": 1920,
				"height": 1080
			}
		]
	}
}

The JSON output generated by Video Shot Analysis provides structured, machine-readable data. For example:

{
  "dominantAction": ["Lifting (60%)", "Walking (30%)"],
  "workerPostureAnalysis": ["Upright posture: 70%"],
  "actionScore": 85,
  "loadWeights": ["Approx. 30 lbs", "Approx. 25 lbs"],
  "safetyRisks": ["None"],
  "environmentalNoiseLevel": "Moderate",
  "workerFatigueIndicators": "No significant fatigue detected"
}

This output simplifies the integration process for developers, enabling seamless workflows and real-time monitoring.

Learn how to integrate JSON outputs into your workflows in Structured Outputs.

Video Analysis Applications Across Industries

Detailed breakdown of worker actions and benchmarks from Video Shot Analysis template."

The versatility of Video Shot Analysis extends far beyond warehouses. Its potential applications span multiple industries:

Manufacturing: Optimize assembly lines by analyzing worker efficiency and identifying ergonomic risks.
Logistics and Warehousing: Improve safety and streamline operations by monitoring task performance and load handling.
Healthcare: Assess hospital staff movements to enhance patient care and minimize fatigue risks.
Retail: Monitor stockroom operations and ensure adherence to safety protocols.
Construction: Evaluate worker posture and movements to mitigate risks in physically demanding tasks.

Discover more use cases in Azure AI Foundry's applications.

Driving Industry Transformation

Schema field configurations, highlighting customization capabilities for workplace analytics.

With Azure AI Foundry’s Video Shot Analysis, organizations gain the ability to:

Enhance Safety: Proactively identify hazards like improper lifting or excessive fatigue.
Boost Productivity: Monitor efficiency benchmarks and improve task performance.
Streamline Training: Use AI-driven insights to create tailored training programs.

Realizing Value with Azure AI Foundry

Azure AI Foundry’s Content Understanding service is a powerful tool for organizations striving to achieve more through data-driven insights. The public preview Video Shot Analysis template showcases how Generative AI can transform video content into actionable intelligence, driving operational excellence across industries.

Content Understanding API: Unlocking Automation at Scale

Azure AI Foundry’s Content Understanding API empowers developers to harness its capabilities programmatically, enabling seamless integration into existing applications and workflows. Through the API, you can automate content analysis tasks such as:

Video Shot Analysis: Extract actionable insights from video segments.
Audio Transcription and Analysis: Generate transcriptions and assess audio environments.
Image and Document Insights: Derive value from a range of media formats.

Developers can define custom parameters and interact with the API via REST endpoints, providing full control over how insights are generated and utilized. Whether building solutions for real-time monitoring or batch analysis, the API offers flexibility and scalability.

To get started with the Content Understanding API, check out the Official API Quickstart Guide.

Build Fully Customized Video Analysis Solutions

For those seeking even greater flexibility and customization beyond Azure AI Foundry’s portal and Content Understanding service, Microsoft offers an Azure Samples GitHub repository showcasing how to build a solution using GPT-4o. This demo enables users to analyze and extract insights from video files or video URLs (e.g., YouTube).

Key Steps to Process a Video:

Split the video into segments of specified duration.
Extract video frames at 1 frame per second.
Transcribe audio using Whisper, if enabled.
Analyze frames and audio (optional) to generate descriptions, summaries, or custom insights based on a given prompt.

This approach offers a fully customizable solution for those with application development expertise. Explore the GitHub repository here: Video Analysis with GPT-4o.

Conclusion

Azure AI Foundry’s Video Shot Analysis empowers businesses to address workplace safety and efficiency challenges with unparalleled precision. From real-time monitoring to detailed post-analysis, this innovative service enables industries to optimize their operations and foster safer work environments.

With its flexible schema design, validated JSON outputs, and powerful analytics capabilities, Azure AI Foundry is setting the standard for workplace intelligence. As industries continue to embrace AI-driven transformation, the potential applications of Video Shot Analysis are limitless.

Learn how you can get started today with Azure AI Foundry’s Content Understanding.

Updated Jan 14, 2025

Version 4.0

azure ai services

azure ai studio

azure ai video indexer

azure ai vision

azure openai service

John_Carroll

Microsoft

Joined December 08, 2023

View Profile