Video content is becoming increasingly central to business operations, from training materials to safety monitoring. As part of Azure's comprehensive video analysis capabilities, we're excited to discuss Azure Video Retrieval, a powerful service that enables natural language search across your video and image content. This service makes it easier than ever to locate exactly what you need within your media assets.
What is Azure Video Retrieval?
Azure Video Retrieval allows you to create a search index and populate it with both videos and images. Using natural language queries, you can search through this content to identify visual elements (like objects and safety events) and speech content without requiring manual transcription or specialized expertise. The service offers powerful customization options - developers can define metadata schemas for each index, ingest custom metadata, and specify which features (vision, speech) to extract and filter during search operations. Whether you're looking for specific spoken phrases or visual occurrences, the service pinpoints exact timestamps where your search criteria appear.
Key Features
- Multimodal Search: Search across both visual and audio content using natural language
- Custom Metadata Support: Define and ingest metadata schemas for enhanced retrieval
- Flexible Feature Extraction: Specify which features (vision, speech) to extract and search
- Precise Timestamp Matching: Get exact frame locations where your search criteria appear
- Multiple Content Types: Index and search both videos and images
- Simple Integration: Easy implementation with Azure Blob Storage
- Comprehensive API: Full REST API support for custom implementations
Getting Started
Prerequisites
Before you begin, you'll need:
- An Azure Cognitive Services multi-service account
- An Azure Blob Storage Account for video content
Setting Up Video Indexing
The indexing process is straightforward. Here's how to create an index and upload videos:
# Iterate through blobs and build the index
for blob in blob_service_client.get_container_client(az_storage_container_name).list_blobs():
blob_name = blob.name
blob_url = f"https://{az_storage_account_name}.blob.core.windows.net/{az_storage_container_name}/{blob_name}"
# Generate SAS URL for secure access
sas_url = blob_url + "?" + sas_token
# Add video to index
payload["videos"].append({
"mode": "add",
"documentId": str(uuid.uuid4()),
"documentUrl": sas_url,
"metadata": {
"cameraId": "video-indexer-demo-camera1",
"timestamp": datetime.datetime.now(datetime.UTC).strftime("%Y-%m-%d %H:%M:%S")
}
})
# Create index
response = requests.put(url, headers=headers, json=payload)
Searching Videos
The service supports two primary search modes:
# Query templates for searching by text or speech
query_by_text = {
"queryText": "<user query>",
"filters": {
"featureFilters": ["vision"],
},
}
query_by_speech = {
"queryText": "<user query>",
"filters": {
"featureFilters": ["speech"],
},
}
The search input is passed to the REST API based on the mode chosen.
# Function to search for video frames based on user input, from the Azure Video Retrieval Service
def search_videos(query, query_type):
url = f"https://{az_video_indexer_endpoint}/computervision/retrieval/indexes/{az_video_indexer_index_name}:queryByText?api-version={az_video_indexer_api_version}"
headers = {
"Ocp-Apim-Subscription-Key": az_video_indexer_key,
"Content-Type": "application/json",
}
input_query = None
if query_type == "Speech":
query_by_speech["queryText"] = query
input_query = query_by_speech
else:
query_by_text["queryText"] = query
input_query = query_by_text
try:
response = requests.post(url, headers=headers, json=input_query)
response.raise_for_status()
print("search response \n", response.json())
return response.json()
except Exception as e:
print("error", e.args)
print("error", e)
return None
The REST APIs that are required to complete the steps in this process are covered here
Use Cases
Azure Video Retrieval can transform how organizations work with video content across various scenarios:
- Training and Education: Quickly locate specific topics or demonstrations within training videos
- Content Management: Efficiently organize and retrieve media assets
- Safety and Compliance: Find specific safety-related content or incidents
- Media Production: Locate specific scenes or dialogue across video libraries
Demo
Watch this sample application that uses Video retrieval to let users search frames across multiple videos in an Index
The source code of the sample application can be accessed here
Resources :