Blog Post

Educator Developer Blog
8 MIN READ

Building with Azure OpenAI Sora: A Complete Guide to AI Video Generation

suzarilshah's avatar
Feb 04, 2026

In this comprehensive guide, we'll explore how to integrate both Sora 1 and Sora 2 models from Azure OpenAI Service into a production web application. We'll cover API integration, request body parameters, cost analysis, limitations, and the key differences between using Azure AI Foundry endpoints versus OpenAI's native API.

Table of Contents

  1. Introduction to Sora Models
  2. Azure AI Foundry vs. OpenAI API Structure
  3. API Integration: Request Body Parameters
  4. Video Generation Modes
  5. Cost Analysis per Generation
  6. Technical Limitations & Constraints
  7. Resolution & Duration Support
  8. Implementation Best Practices

Introduction to Sora Models

Sora is OpenAI's groundbreaking text-to-video model that generates realistic videos from natural language descriptions. Azure AI Foundry provides access to two versions:

  • Sora 1: The original model focused primarily on text-to-video generation with extensive resolution options (480p to 1080p) and flexible duration (1-20 seconds)
  • Sora 2: The enhanced version with native audio generation, multiple generation modes (text-to-video, image-to-video, video-to-video remix), but more constrained resolution options (720p only in public preview)

Azure AI Foundry vs. OpenAI API Structure

Key Architectural Differences

Sora 1 uses Azure's traditional deployment-based API structure:

  • Endpoint Patternhttps://{resource-name}.openai.azure.com/openai/deployments/{deployment-name}/...
  • Parameters: Uses Azure-specific naming like n_seconds, n_variants, separate width/height fields
  • Job Management: Uses /jobs/{id} for status polling
  • Content Download: Uses /video/generations/{generation_id}/content/video

Sora 2 adapts OpenAI's v1 API format while still being hosted on Azure:

  • Endpoint Patternhttps://{resource-name}.openai.azure.com/openai/deployments/{deployment-name}/videos
  • Parameters: Uses OpenAI-style naming like seconds (string), size (combined dimension string like "1280x720")
  • Job Management: Uses /videos/{video_id} for status polling
  • Content Download: Uses /videos/{video_id}/content

Why This Matters?

This architectural difference requires conditional request formatting in your code:

const isSora2 = deployment.toLowerCase().includes('sora-2');

if (isSora2) {
  requestBody = {
    model: deployment,
    prompt,
    size: `${width}x${height}`,  // Combined format
    seconds: duration.toString(), // String type
  };
} else {
  requestBody = {
    model: deployment,
    prompt,
    height,                       // Separate dimensions
    width,
    n_seconds: duration.toString(), // Azure naming
    n_variants: variants,
  };
}

 

API Integration: Request Body Parameters

Sora 1 API Parameters

Standard Text-to-Video Request:

{
  "model": "sora-1",
  "prompt": "Wide shot of a child flying a red kite in a grassy park, golden hour sunlight, camera slowly pans upward.",
  "height": "720",
  "width": "1280",
  "n_seconds": "12",
  "n_variants": "2"
}

 

Parameter Details:

  • model (String, Required): Your Azure deployment name
  • prompt (String, Required): Natural language description of the video (max 32000 chars)
  • height (String, Required): Video height in pixels
  • width (String, Required): Video width in pixels
  • n_seconds (String, Required): Duration (1-20 seconds)
  • n_variants (String, Optional): Number of variations to generate (1-4, constrained by resolution)

Sora 2 API Parameters

Text-to-Video Request:

{
  "model": "sora-2",
  "prompt": "A serene mountain landscape with cascading waterfalls, cinematic drone shot",
  "size": "1280x720",
  "seconds": "12"
}

 

Image-to-Video Request (uses FormData):

const formData = new FormData();
formData.append('model', 'sora-2');
formData.append('prompt', 'Animate this image with gentle wind movement');
formData.append('size', '1280x720');
formData.append('seconds', '8');
formData.append('input_reference', imageFile); // JPEG/PNG/WebP

 

Video-to-Video Remix Request:

  • EndpointPOST .../videos/{video_id}/remix
  • Body: Only { "prompt": "your new description" }
  • The original video's structure, motion, and framing are reused while applying the new prompt

Parameter Details:

  • model (String, Optional): Your deployment name
  • prompt (String, Required): Video description
  • size (String, Optional): Either "720x1280" or "1280x720" (defaults to "720x1280")
  • seconds (String, Optional): "4", "8", or "12" (defaults to "4")
  • input_reference (File, Optional): Reference image for image-to-video mode
  • remix_video_id (String, URL parameter): ID of video to remix

Video Generation Modes

1. Text-to-Video (Both Models)

The foundational mode where you provide a text prompt describing the desired video.

Implementation:

const response = await fetch(endpoint, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'api-key': apiKey,
  },
  body: JSON.stringify({
    model: deployment,
    prompt: "A train journey through mountains with dramatic lighting",
    size: "1280x720",
    seconds: "12",
  }),
});

Best Practices:

  • Include shot type (wide, close-up, aerial)
  • Describe subject, action, and environment
  • Specify lighting conditions (golden hour, dramatic, soft)
  • Add camera movement if desired (pans, tilts, tracking shots)

2. Image-to-Video (Sora 2 Only)

Generate a video anchored to or starting from a reference image.

Key Requirements:

  • Supported formats: JPEG, PNG, WebP
  • Image dimensions must exactly match the selected video resolution
  • Our implementation automatically resizes uploaded images to match

Implementation Detail:

// Resize image to match video dimensions
const targetWidth = parseInt(width);
const targetHeight = parseInt(height);
const resizedImage = await resizeImage(inputReference, targetWidth, targetHeight);

// Send as multipart/form-data
formData.append('input_reference', resizedImage);

3. Video-to-Video Remix (Sora 2 Only)

Create variations of existing videos while preserving their structure and motion.

Use Cases:

  • Change weather conditions in the same scene
  • Modify time of day while keeping camera movement
  • Swap subjects while maintaining composition
  • Adjust artistic style or color grading

Endpoint Structure:

POST {base_url}/videos/{original_video_id}/remix?api-version=2024-08-01-preview

Implementation:

let requestEndpoint = endpoint;
if (isSora2 && remixVideoId) {
  const [baseUrl, queryParams] = endpoint.split('?');
  const root = baseUrl.replace(/\/videos$/, '');
  requestEndpoint = `${root}/videos/${remixVideoId}/remix${queryParams ? '?' + queryParams : ''}`;
}

 

Cost Analysis per Generation

Sora 1 Pricing Model

Base Rate: ~$0.05 per second per variant at 720p 

Resolution Scaling: Cost scales linearly with pixel count

Formula:

const basePrice = 0.05;
const basePixels = 1280 * 720; // Reference resolution
const currentPixels = width * height;
const resolutionMultiplier = currentPixels / basePixels;
const totalCost = basePrice * duration * variants * resolutionMultiplier;

Examples:

  • 720p (1280×720), 12 seconds, 1 variant: $0.60
  • 1080p (1920×1080), 12 seconds, 1 variant: $1.35
  • 720p, 12 seconds, 2 variants: $1.20

Sora 2 Pricing Model

Flat Rate: $0.10 per second per variant (no resolution scaling in public preview)

Formula:

const totalCost = 0.10 * duration * variants;

Examples:

  • 720p (1280×720), 4 seconds: $0.40
  • 720p (1280×720), 12 seconds: $1.20
  • 720p (720×1280), 8 seconds: $0.80

 

Note: Since Sora 2 currently only supports 720p in public preview, resolution doesn't affect cost, only duration matters.

Cost Comparison

ScenarioSora 1 (720p)Sora 2 (720p)Winner
4s video$0.20$0.40Sora 1
12s video$0.60$1.20Sora 1
12s + audioN/A (no audio)$1.20Sora 2 (unique)
Image-to-videoN/A$0.40-$1.20Sora 2 (unique)

 

Recommendation: Use Sora 1 for cost-effective silent videos at various resolutions. Use Sora 2 when you need audio, image/video inputs, or remix capabilities.

Technical Limitations & Constraints

Sora 1 Limitations

Resolution Options:

  • 9 supported resolutions from 480×480 to 1920×1080
  • Includes square, portrait, and landscape formats
  • Full list: 480×480, 480×854, 854×480, 720×720, 720×1280, 1280×720, 1080×1080, 1080×1920, 1920×1080

Duration:

  • Flexible: 1 to 20 seconds
  • Any integer value within range

Variants:

  • Depends on resolution:
    • 1080p: Variants disabled (n_variants must be 1)
    • 720p: Max 2 variants
    • Other resolutions: Max 4 variants

Concurrent Jobs: Maximum 2 jobs running simultaneously

Job Expiration: Videos expire 24 hours after generation

Audio: No audio generation (silent videos only)

Sora 2 Limitations

Resolution Options (Public Preview):

  • Only 2 options: 720×1280 (portrait) or 1280×720 (landscape)
  • No square formats
  • No 1080p support in current preview

Duration:

  • Fixed options only: 4, 8, or 12 seconds
  • No custom durations
  • Defaults to 4 seconds if not specified

Variants:

  • Not prominently supported in current API documentation
  • Focus is on single high-quality generations with audio

Concurrent Jobs: Maximum 2 jobs (same as Sora 1)

Job Expiration: 24 hours (same as Sora 1)

Audio: Native audio generation included (dialogue, sound effects, ambience)

Shared Constraints

Concurrent Processing: Both models enforce a limit of 2 concurrent video jobs per Azure resource. You must wait for one job to complete before starting a third.

Job Lifecycle:

queued → preprocessing → processing/running → completed

Download Window: Videos are available for 24 hours after completion. After expiration, you must regenerate the video.

Generation Time:

  • Typical: 1-5 minutes depending on resolution, duration, and API load
  • Can occasionally take longer during high demand

Resolution & Duration Support Matrix

Sora 1 Support Matrix

ResolutionAspect RatioMax VariantsDuration RangeUse Case
480×480Square41-20sSocial thumbnails
480×854Portrait41-20sMobile stories
854×480Landscape41-20sQuick previews
720×720Square41-20sInstagram posts
720×1280Portrait21-20sTikTok/Reels
1280×720Landscape21-20sYouTube shorts
1080×1080Square11-20sPremium social
1080×1920Portrait11-20sPremium vertical
1920×1080Landscape11-20sFull HD content

Sora 2 Support Matrix

ResolutionAspect RatioDuration OptionsAudioGeneration Modes
720×1280Portrait4s, 8s, 12s✅ YesText, Image, Video Remix
1280×720Landscape4s, 8s, 12s✅ YesText, Image, Video Remix

Note: Sora 2's limited resolution options in public preview are expected to expand in future releases.

Implementation Best Practices

1. Job Status Polling Strategy

Implement adaptive backoff to avoid overwhelming the API:

const maxAttempts = 180; // 15 minutes max
let attempts = 0;
const baseDelayMs = 3000; // Start with 3 seconds

while (attempts < maxAttempts) {
  const response = await fetch(statusUrl, {
    headers: { 'api-key': apiKey },
  });

  if (response.status === 404) {
    // Job not ready yet, wait longer
    const delayMs = Math.min(15000, baseDelayMs + attempts * 1000);
    await new Promise(r => setTimeout(r, delayMs));
    attempts++;
    continue;
  }

  const job = await response.json();
  
  // Check completion (different status values for Sora 1 vs 2)
  const isCompleted = isSora2 
    ? job.status === 'completed' 
    : job.status === 'succeeded';
  
  if (isCompleted) break;
  
  // Adaptive backoff
  const delayMs = Math.min(15000, baseDelayMs + attempts * 1000);
  await new Promise(r => setTimeout(r, delayMs));
  attempts++;
}

2. Handling Different Response Structures

Sora 1 Video Download:

const generations = Array.isArray(job.generations) ? job.generations : [];
const genId = generations[0]?.id;
const videoUrl = `${root}/${genId}/content/video`;

Sora 2 Video Download:

const videoUrl = `${root}/videos/${jobId}/content`;

3. Error Handling

try {
  const response = await fetch(endpoint, fetchOptions);
  
  if (!response.ok) {
    const error = await response.text();
    throw new Error(`Video generation failed: ${error}`);
  }
  
  // ... handle successful response
} catch (error) {
  console.error('[VideoGen] Error:', error);
  // Implement retry logic or user notification
}

 

4. Image Preprocessing for Image-to-Video

Always resize images to match the target video resolution:

async function resizeImage(file: File, targetWidth: number, targetHeight: number): Promise<File> {
  return new Promise((resolve, reject) => {
    const img = new Image();
    const canvas = document.createElement('canvas');
    const ctx = canvas.getContext('2d');

    img.onload = () => {
      canvas.width = targetWidth;
      canvas.height = targetHeight;
      ctx.drawImage(img, 0, 0, targetWidth, targetHeight);
      
      canvas.toBlob((blob) => {
        if (blob) {
          const resizedFile = new File([blob], file.name, { type: file.type });
          resolve(resizedFile);
        } else {
          reject(new Error('Failed to create resized image blob'));
        }
      }, file.type);
    };

    img.onerror = () => reject(new Error('Failed to load image'));
    img.src = URL.createObjectURL(file);
  });
}

5. Cost Tracking

Implement cost estimation before generation and tracking after:

// Pre-generation estimate
const estimatedCost = calculateCost(width, height, duration, variants, soraVersion);

// Save generation record
await saveGenerationRecord({
  prompt,
  soraModel: soraVersion,
  duration: parseInt(duration),
  resolution: `${width}x${height}`,
  variants: parseInt(variants),
  generationMode: mode,
  estimatedCost,
  status: 'queued',
  jobId: job.id,
});

// Update after completion
await updateGenerationStatus(jobId, 'completed', { videoId: finalVideoId });

6. Progressive User Feedback

Provide detailed status updates during the generation process:

const statusMessages: Record<string, string> = {
  'preprocessing': 'Preprocessing your request...',
  'running': 'Generating video...',
  'processing': 'Processing video...',
  'queued': 'Job queued...',
  'in_progress': 'Generating video...',
};

onProgress?.(statusMessages[job.status] || `Status: ${job.status}`);

Conclusion

Building with Azure OpenAI's Sora models requires understanding the nuanced differences between Sora 1 and Sora 2, both in API structure and capabilities. Key takeaways:

  1. Choose the right model: Sora 1 for resolution flexibility and cost-effectiveness; Sora 2 for audio, image inputs, and remix capabilities
  2. Handle API differences: Implement conditional logic for parameter formatting and status polling based on model version
  3. Respect limitations: Plan around concurrent job limits, resolution constraints, and 24-hour expiration windows
  4. Optimize costs: Calculate estimates upfront and track actual usage for better budget management
  5. Provide great UX: Implement adaptive polling, progressive status updates, and clear error messages

The future of AI video generation is exciting, and Azure AI Foundry provides production-ready access to these powerful models. As Sora 2 matures and limitations are lifted (especially resolution options), we'll see even more creative applications emerge.

Resources:

This blog post is based on real-world implementation experience building LemonGrab, my AI video generation platform that integrates both Sora 1 and Sora 2 through Azure AI Foundry. The code examples are extracted from production usage.

 

Updated Jan 26, 2026
Version 1.0
No CommentsBe the first to comment