Educator Developer Blog

8 MIN READ

Building with Azure OpenAI Sora: A Complete Guide to AI Video Generation

MVP

Feb 04, 2026

In this comprehensive guide, we'll explore how to integrate both Sora 1 and Sora 2 models from Azure OpenAI Service into a production web application. We'll cover API integration, request body parameters, cost analysis, limitations, and the key differences between using Azure AI Foundry endpoints versus OpenAI's native API.

Introduction to Sora Models
Azure AI Foundry vs. OpenAI API Structure
API Integration: Request Body Parameters
Video Generation Modes
Cost Analysis per Generation
Technical Limitations & Constraints
Resolution & Duration Support
Implementation Best Practices

Introduction to Sora Models

Sora is OpenAI's groundbreaking text-to-video model that generates realistic videos from natural language descriptions. Azure AI Foundry provides access to two versions:

Sora 1: The original model focused primarily on text-to-video generation with extensive resolution options (480p to 1080p) and flexible duration (1-20 seconds)
Sora 2: The enhanced version with native audio generation, multiple generation modes (text-to-video, image-to-video, video-to-video remix), but more constrained resolution options (720p only in public preview)

Azure AI Foundry vs. OpenAI API Structure

Key Architectural Differences

Sora 1 uses Azure's traditional deployment-based API structure:

Endpoint Pattern: https://{resource-name}.openai.azure.com/openai/deployments/{deployment-name}/...
Parameters: Uses Azure-specific naming like n_seconds, n_variants, separate width/height fields
Job Management: Uses /jobs/{id} for status polling
Content Download: Uses /video/generations/{generation_id}/content/video

Sora 2 adapts OpenAI's v1 API format while still being hosted on Azure:

Endpoint Pattern: https://{resource-name}.openai.azure.com/openai/deployments/{deployment-name}/videos
Parameters: Uses OpenAI-style naming like seconds (string), size (combined dimension string like "1280x720")
Job Management: Uses /videos/{video_id} for status polling
Content Download: Uses /videos/{video_id}/content

Why This Matters?

This architectural difference requires conditional request formatting in your code:

const isSora2 = deployment.toLowerCase().includes('sora-2');

if (isSora2) {
  requestBody = {
    model: deployment,
    prompt,
    size: `${width}x${height}`,  // Combined format
    seconds: duration.toString(), // String type
  };
} else {
  requestBody = {
    model: deployment,
    prompt,
    height,                       // Separate dimensions
    width,
    n_seconds: duration.toString(), // Azure naming
    n_variants: variants,
  };
}

API Integration: Request Body Parameters

Sora 1 API Parameters

Standard Text-to-Video Request:

{
  "model": "sora-1",
  "prompt": "Wide shot of a child flying a red kite in a grassy park, golden hour sunlight, camera slowly pans upward.",
  "height": "720",
  "width": "1280",
  "n_seconds": "12",
  "n_variants": "2"
}

Parameter Details:

model (String, Required): Your Azure deployment name
prompt (String, Required): Natural language description of the video (max 32000 chars)
height (String, Required): Video height in pixels
width (String, Required): Video width in pixels
n_seconds (String, Required): Duration (1-20 seconds)
n_variants (String, Optional): Number of variations to generate (1-4, constrained by resolution)

Sora 2 API Parameters

Text-to-Video Request:

{
  "model": "sora-2",
  "prompt": "A serene mountain landscape with cascading waterfalls, cinematic drone shot",
  "size": "1280x720",
  "seconds": "12"
}

Image-to-Video Request (uses FormData):

const formData = new FormData();
formData.append('model', 'sora-2');
formData.append('prompt', 'Animate this image with gentle wind movement');
formData.append('size', '1280x720');
formData.append('seconds', '8');
formData.append('input_reference', imageFile); // JPEG/PNG/WebP

Video-to-Video Remix Request:

Endpoint: POST .../videos/{video_id}/remix
Body: Only { "prompt": "your new description" }
The original video's structure, motion, and framing are reused while applying the new prompt

Parameter Details:

model (String, Optional): Your deployment name
prompt (String, Required): Video description
size (String, Optional): Either "720x1280" or "1280x720" (defaults to "720x1280")
seconds (String, Optional): "4", "8", or "12" (defaults to "4")
input_reference (File, Optional): Reference image for image-to-video mode
remix_video_id (String, URL parameter): ID of video to remix

Video Generation Modes

1. Text-to-Video (Both Models)

The foundational mode where you provide a text prompt describing the desired video.

Implementation:

const response = await fetch(endpoint, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'api-key': apiKey,
  },
  body: JSON.stringify({
    model: deployment,
    prompt: "A train journey through mountains with dramatic lighting",
    size: "1280x720",
    seconds: "12",
  }),
});

Best Practices:

Include shot type (wide, close-up, aerial)
Describe subject, action, and environment
Specify lighting conditions (golden hour, dramatic, soft)
Add camera movement if desired (pans, tilts, tracking shots)

2. Image-to-Video (Sora 2 Only)

Generate a video anchored to or starting from a reference image.

Key Requirements:

Supported formats: JPEG, PNG, WebP
Image dimensions must exactly match the selected video resolution
Our implementation automatically resizes uploaded images to match

Implementation Detail:

// Resize image to match video dimensions
const targetWidth = parseInt(width);
const targetHeight = parseInt(height);
const resizedImage = await resizeImage(inputReference, targetWidth, targetHeight);

// Send as multipart/form-data
formData.append('input_reference', resizedImage);

3. Video-to-Video Remix (Sora 2 Only)

Create variations of existing videos while preserving their structure and motion.

Use Cases:

Change weather conditions in the same scene
Modify time of day while keeping camera movement
Swap subjects while maintaining composition
Adjust artistic style or color grading

Endpoint Structure:

POST {base_url}/videos/{original_video_id}/remix?api-version=2024-08-01-preview

Implementation:

let requestEndpoint = endpoint;
if (isSora2 && remixVideoId) {
  const [baseUrl, queryParams] = endpoint.split('?');
  const root = baseUrl.replace(/\/videos$/, '');
  requestEndpoint = `${root}/videos/${remixVideoId}/remix${queryParams ? '?' + queryParams : ''}`;
}

Cost Analysis per Generation

Sora 1 Pricing Model

Base Rate: ~$0.05 per second per variant at 720p

Resolution Scaling: Cost scales linearly with pixel count

Formula:

const basePrice = 0.05;
const basePixels = 1280 * 720; // Reference resolution
const currentPixels = width * height;
const resolutionMultiplier = currentPixels / basePixels;
const totalCost = basePrice * duration * variants * resolutionMultiplier;

Examples:

720p (1280×720), 12 seconds, 1 variant: $0.60
1080p (1920×1080), 12 seconds, 1 variant: $1.35
720p, 12 seconds, 2 variants: $1.20

Sora 2 Pricing Model

Flat Rate: $0.10 per second per variant (no resolution scaling in public preview)

Formula:

const totalCost = 0.10 * duration * variants;

Examples:

720p (1280×720), 4 seconds: $0.40
720p (1280×720), 12 seconds: $1.20
720p (720×1280), 8 seconds: $0.80

Note: Since Sora 2 currently only supports 720p in public preview, resolution doesn't affect cost, only duration matters.

Cost Comparison

Scenario	Sora 1 (720p)	Sora 2 (720p)	Winner
4s video	$0.20	$0.40	Sora 1
12s video	$0.60	$1.20	Sora 1
12s + audio	N/A (no audio)	$1.20	Sora 2 (unique)
Image-to-video	N/A	$0.40-$1.20	Sora 2 (unique)

Recommendation: Use Sora 1 for cost-effective silent videos at various resolutions. Use Sora 2 when you need audio, image/video inputs, or remix capabilities.

Technical Limitations & Constraints

Sora 1 Limitations

Resolution Options:

9 supported resolutions from 480×480 to 1920×1080
Includes square, portrait, and landscape formats
Full list: 480×480, 480×854, 854×480, 720×720, 720×1280, 1280×720, 1080×1080, 1080×1920, 1920×1080

Duration:

Flexible: 1 to 20 seconds
Any integer value within range

Variants:

Depends on resolution:
- 1080p: Variants disabled (n_variants must be 1)
- 720p: Max 2 variants
- Other resolutions: Max 4 variants

Concurrent Jobs: Maximum 2 jobs running simultaneously

Job Expiration: Videos expire 24 hours after generation

Audio: No audio generation (silent videos only)

Sora 2 Limitations

Resolution Options (Public Preview):

Only 2 options: 720×1280 (portrait) or 1280×720 (landscape)
No square formats
No 1080p support in current preview

Duration:

Fixed options only: 4, 8, or 12 seconds
No custom durations
Defaults to 4 seconds if not specified

Variants:

Not prominently supported in current API documentation
Focus is on single high-quality generations with audio

Concurrent Jobs: Maximum 2 jobs (same as Sora 1)

Job Expiration: 24 hours (same as Sora 1)

Audio: Native audio generation included (dialogue, sound effects, ambience)

Shared Constraints

Concurrent Processing: Both models enforce a limit of 2 concurrent video jobs per Azure resource. You must wait for one job to complete before starting a third.

Job Lifecycle:

queued → preprocessing → processing/running → completed

Download Window: Videos are available for 24 hours after completion. After expiration, you must regenerate the video.

Generation Time:

Typical: 1-5 minutes depending on resolution, duration, and API load
Can occasionally take longer during high demand

Resolution & Duration Support Matrix

Sora 1 Support Matrix

Resolution	Aspect Ratio	Max Variants	Duration Range	Use Case
480×480	Square	4	1-20s	Social thumbnails
480×854	Portrait	4	1-20s	Mobile stories
854×480	Landscape	4	1-20s	Quick previews
720×720	Square	4	1-20s	Instagram posts
720×1280	Portrait	2	1-20s	TikTok/Reels
1280×720	Landscape	2	1-20s	YouTube shorts
1080×1080	Square	1	1-20s	Premium social
1080×1920	Portrait	1	1-20s	Premium vertical
1920×1080	Landscape	1	1-20s	Full HD content

Sora 2 Support Matrix

Resolution	Aspect Ratio	Duration Options	Audio	Generation Modes
720×1280	Portrait	4s, 8s, 12s	✅ Yes	Text, Image, Video Remix
1280×720	Landscape	4s, 8s, 12s	✅ Yes	Text, Image, Video Remix

Note: Sora 2's limited resolution options in public preview are expected to expand in future releases.

Implementation Best Practices

1. Job Status Polling Strategy

Implement adaptive backoff to avoid overwhelming the API:

const maxAttempts = 180; // 15 minutes max
let attempts = 0;
const baseDelayMs = 3000; // Start with 3 seconds

while (attempts < maxAttempts) {
  const response = await fetch(statusUrl, {
    headers: { 'api-key': apiKey },
  });

  if (response.status === 404) {
    // Job not ready yet, wait longer
    const delayMs = Math.min(15000, baseDelayMs + attempts * 1000);
    await new Promise(r => setTimeout(r, delayMs));
    attempts++;
    continue;
  }

  const job = await response.json();
  
  // Check completion (different status values for Sora 1 vs 2)
  const isCompleted = isSora2 
    ? job.status === 'completed' 
    : job.status === 'succeeded';
  
  if (isCompleted) break;
  
  // Adaptive backoff
  const delayMs = Math.min(15000, baseDelayMs + attempts * 1000);
  await new Promise(r => setTimeout(r, delayMs));
  attempts++;
}

2. Handling Different Response Structures

Sora 1 Video Download:

const generations = Array.isArray(job.generations) ? job.generations : [];
const genId = generations[0]?.id;
const videoUrl = `${root}/${genId}/content/video`;

Sora 2 Video Download:

const videoUrl = `${root}/videos/${jobId}/content`;

3. Error Handling

try {
  const response = await fetch(endpoint, fetchOptions);
  
  if (!response.ok) {
    const error = await response.text();
    throw new Error(`Video generation failed: ${error}`);
  }
  
  // ... handle successful response
} catch (error) {
  console.error('[VideoGen] Error:', error);
  // Implement retry logic or user notification
}

4. Image Preprocessing for Image-to-Video

Always resize images to match the target video resolution:

async function resizeImage(file: File, targetWidth: number, targetHeight: number): Promise<File> {
  return new Promise((resolve, reject) => {
    const img = new Image();
    const canvas = document.createElement('canvas');
    const ctx = canvas.getContext('2d');

    img.onload = () => {
      canvas.width = targetWidth;
      canvas.height = targetHeight;
      ctx.drawImage(img, 0, 0, targetWidth, targetHeight);
      
      canvas.toBlob((blob) => {
        if (blob) {
          const resizedFile = new File([blob], file.name, { type: file.type });
          resolve(resizedFile);
        } else {
          reject(new Error('Failed to create resized image blob'));
        }
      }, file.type);
    };

    img.onerror = () => reject(new Error('Failed to load image'));
    img.src = URL.createObjectURL(file);
  });
}

5. Cost Tracking

Implement cost estimation before generation and tracking after:

// Pre-generation estimate
const estimatedCost = calculateCost(width, height, duration, variants, soraVersion);

// Save generation record
await saveGenerationRecord({
  prompt,
  soraModel: soraVersion,
  duration: parseInt(duration),
  resolution: `${width}x${height}`,
  variants: parseInt(variants),
  generationMode: mode,
  estimatedCost,
  status: 'queued',
  jobId: job.id,
});

// Update after completion
await updateGenerationStatus(jobId, 'completed', { videoId: finalVideoId });

6. Progressive User Feedback

Provide detailed status updates during the generation process:

const statusMessages: Record<string, string> = {
  'preprocessing': 'Preprocessing your request...',
  'running': 'Generating video...',
  'processing': 'Processing video...',
  'queued': 'Job queued...',
  'in_progress': 'Generating video...',
};

onProgress?.(statusMessages[job.status] || `Status: ${job.status}`);

Conclusion

Building with Azure OpenAI's Sora models requires understanding the nuanced differences between Sora 1 and Sora 2, both in API structure and capabilities. Key takeaways:

Choose the right model: Sora 1 for resolution flexibility and cost-effectiveness; Sora 2 for audio, image inputs, and remix capabilities
Handle API differences: Implement conditional logic for parameter formatting and status polling based on model version
Respect limitations: Plan around concurrent job limits, resolution constraints, and 24-hour expiration windows
Optimize costs: Calculate estimates upfront and track actual usage for better budget management
Provide great UX: Implement adaptive polling, progressive status updates, and clear error messages

The future of AI video generation is exciting, and Azure AI Foundry provides production-ready access to these powerful models. As Sora 2 matures and limitations are lifted (especially resolution options), we'll see even more creative applications emerge.

Resources:

This blog post is based on real-world implementation experience building LemonGrab, my AI video generation platform that integrates both Sora 1 and Sora 2 through Azure AI Foundry. The code examples are extracted from production usage.

Updated Jan 26, 2026

Version 1.0