Multimodal video search powered by Video Retrieval in Azure

Microsoft

Dec 16, 2024

Video content is becoming increasingly central to business operations, from training materials to safety monitoring. As part of Azure's comprehensive video analysis capabilities, we're excited to discuss Azure Video Retrieval, a powerful service that enables natural language search across your video and image content. This service makes it easier than ever to locate exactly what you need within your media assets.

What is Azure Video Retrieval?

Azure Video Retrieval allows you to create a search index and populate it with both videos and images. Using natural language queries, you can search through this content to identify visual elements (like objects and safety events) and speech content without requiring manual transcription or specialized expertise. The service offers powerful customization options - developers can define metadata schemas for each index, ingest custom metadata, and specify which features (vision, speech) to extract and filter during search operations. Whether you're looking for specific spoken phrases or visual occurrences, the service pinpoints exact timestamps where your search criteria appear.

Key Features

Multimodal Search: Search across both visual and audio content using natural language
Custom Metadata Support: Define and ingest metadata schemas for enhanced retrieval
Flexible Feature Extraction: Specify which features (vision, speech) to extract and search
Precise Timestamp Matching: Get exact frame locations where your search criteria appear
Multiple Content Types: Index and search both videos and images
Simple Integration: Easy implementation with Azure Blob Storage
Comprehensive API: Full REST API support for custom implementations

Getting Started

Prerequisites

Before you begin, you'll need:

An Azure Cognitive Services multi-service account
An Azure Blob Storage Account for video content

Setting Up Video Indexing

The indexing process is straightforward. Here's how to create an index and upload videos:

# Iterate through blobs and build the index
for blob in blob_service_client.get_container_client(az_storage_container_name).list_blobs():
    blob_name = blob.name
    blob_url = f"https://{az_storage_account_name}.blob.core.windows.net/{az_storage_container_name}/{blob_name}"
    
    # Generate SAS URL for secure access
    sas_url = blob_url + "?" + sas_token

    # Add video to index
    payload["videos"].append({
        "mode": "add",
        "documentId": str(uuid.uuid4()),
        "documentUrl": sas_url,
        "metadata": {
            "cameraId": "video-indexer-demo-camera1",
            "timestamp": datetime.datetime.now(datetime.UTC).strftime("%Y-%m-%d %H:%M:%S")
        }
    })

# Create index
response = requests.put(url, headers=headers, json=payload)

Searching Videos

The service supports two primary search modes:

# Query templates for searching by text or speech
query_by_text = {
    "queryText": "<user query>",
    "filters": {
        "featureFilters": ["vision"],
    },
}

query_by_speech = {
    "queryText": "<user query>",
    "filters": {
        "featureFilters": ["speech"],
    },
}

The search input is passed to the REST API based on the mode chosen.

# Function to search for video frames based on user input, from the Azure Video Retrieval Service
def search_videos(query, query_type):
    url = f"https://{az_video_indexer_endpoint}/computervision/retrieval/indexes/{az_video_indexer_index_name}:queryByText?api-version={az_video_indexer_api_version}"
    headers = {
        "Ocp-Apim-Subscription-Key": az_video_indexer_key,
        "Content-Type": "application/json",
    }

    input_query = None
    if query_type == "Speech":
        query_by_speech["queryText"] = query
        input_query = query_by_speech
    else:
        query_by_text["queryText"] = query
        input_query = query_by_text
    try:
        response = requests.post(url, headers=headers, json=input_query)
        response.raise_for_status()
        print("search response \n", response.json())
        return response.json()
    except Exception as e:
        print("error", e.args)
        print("error", e)
        return None

The REST APIs that are required to complete the steps in this process are covered here

Use Cases

Azure Video Retrieval can transform how organizations work with video content across various scenarios:

Training and Education: Quickly locate specific topics or demonstrations within training videos
Content Management: Efficiently organize and retrieve media assets
Safety and Compliance: Find specific safety-related content or incidents
Media Production: Locate specific scenes or dialogue across video libraries