In this comprehensive guide, we'll explore how to integrate both Sora 1 and Sora 2 models from Azure OpenAI Service into a production web application. We'll cover API integration, request body parameters, cost analysis, limitations, and the key differences between using Azure AI Foundry endpoints versus OpenAI's native API.
Table of Contents
- Introduction to Sora Models
- Azure AI Foundry vs. OpenAI API Structure
- API Integration: Request Body Parameters
- Video Generation Modes
- Cost Analysis per Generation
- Technical Limitations & Constraints
- Resolution & Duration Support
- Implementation Best Practices
Introduction to Sora Models
Sora is OpenAI's groundbreaking text-to-video model that generates realistic videos from natural language descriptions. Azure AI Foundry provides access to two versions:
- Sora 1: The original model focused primarily on text-to-video generation with extensive resolution options (480p to 1080p) and flexible duration (1-20 seconds)
- Sora 2: The enhanced version with native audio generation, multiple generation modes (text-to-video, image-to-video, video-to-video remix), but more constrained resolution options (720p only in public preview)
Azure AI Foundry vs. OpenAI API Structure
Key Architectural Differences
Sora 1 uses Azure's traditional deployment-based API structure:
- Endpoint Pattern: https://{resource-name}.openai.azure.com/openai/deployments/{deployment-name}/...
- Parameters: Uses Azure-specific naming like n_seconds, n_variants, separate width/height fields
- Job Management: Uses /jobs/{id} for status polling
- Content Download: Uses /video/generations/{generation_id}/content/video
Sora 2 adapts OpenAI's v1 API format while still being hosted on Azure:
- Endpoint Pattern: https://{resource-name}.openai.azure.com/openai/deployments/{deployment-name}/videos
- Parameters: Uses OpenAI-style naming like seconds (string), size (combined dimension string like "1280x720")
- Job Management: Uses /videos/{video_id} for status polling
- Content Download: Uses /videos/{video_id}/content
Why This Matters?
This architectural difference requires conditional request formatting in your code:
const isSora2 = deployment.toLowerCase().includes('sora-2');
if (isSora2) {
requestBody = {
model: deployment,
prompt,
size: `${width}x${height}`, // Combined format
seconds: duration.toString(), // String type
};
} else {
requestBody = {
model: deployment,
prompt,
height, // Separate dimensions
width,
n_seconds: duration.toString(), // Azure naming
n_variants: variants,
};
}
API Integration: Request Body Parameters
Sora 1 API Parameters
Standard Text-to-Video Request:
{
"model": "sora-1",
"prompt": "Wide shot of a child flying a red kite in a grassy park, golden hour sunlight, camera slowly pans upward.",
"height": "720",
"width": "1280",
"n_seconds": "12",
"n_variants": "2"
}
Parameter Details:
- model (String, Required): Your Azure deployment name
- prompt (String, Required): Natural language description of the video (max 32000 chars)
- height (String, Required): Video height in pixels
- width (String, Required): Video width in pixels
- n_seconds (String, Required): Duration (1-20 seconds)
- n_variants (String, Optional): Number of variations to generate (1-4, constrained by resolution)
Sora 2 API Parameters
Text-to-Video Request:
{
"model": "sora-2",
"prompt": "A serene mountain landscape with cascading waterfalls, cinematic drone shot",
"size": "1280x720",
"seconds": "12"
}
Image-to-Video Request (uses FormData):
const formData = new FormData();
formData.append('model', 'sora-2');
formData.append('prompt', 'Animate this image with gentle wind movement');
formData.append('size', '1280x720');
formData.append('seconds', '8');
formData.append('input_reference', imageFile); // JPEG/PNG/WebP
Video-to-Video Remix Request:
- Endpoint: POST .../videos/{video_id}/remix
- Body: Only { "prompt": "your new description" }
- The original video's structure, motion, and framing are reused while applying the new prompt
Parameter Details:
- model (String, Optional): Your deployment name
- prompt (String, Required): Video description
- size (String, Optional): Either "720x1280" or "1280x720" (defaults to "720x1280")
- seconds (String, Optional): "4", "8", or "12" (defaults to "4")
- input_reference (File, Optional): Reference image for image-to-video mode
- remix_video_id (String, URL parameter): ID of video to remix
Video Generation Modes
1. Text-to-Video (Both Models)
The foundational mode where you provide a text prompt describing the desired video.
Implementation:
const response = await fetch(endpoint, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'api-key': apiKey,
},
body: JSON.stringify({
model: deployment,
prompt: "A train journey through mountains with dramatic lighting",
size: "1280x720",
seconds: "12",
}),
});
Best Practices:
- Include shot type (wide, close-up, aerial)
- Describe subject, action, and environment
- Specify lighting conditions (golden hour, dramatic, soft)
- Add camera movement if desired (pans, tilts, tracking shots)
2. Image-to-Video (Sora 2 Only)
Generate a video anchored to or starting from a reference image.
Key Requirements:
- Supported formats: JPEG, PNG, WebP
- Image dimensions must exactly match the selected video resolution
- Our implementation automatically resizes uploaded images to match
Implementation Detail:
// Resize image to match video dimensions
const targetWidth = parseInt(width);
const targetHeight = parseInt(height);
const resizedImage = await resizeImage(inputReference, targetWidth, targetHeight);
// Send as multipart/form-data
formData.append('input_reference', resizedImage);
3. Video-to-Video Remix (Sora 2 Only)
Create variations of existing videos while preserving their structure and motion.
Use Cases:
- Change weather conditions in the same scene
- Modify time of day while keeping camera movement
- Swap subjects while maintaining composition
- Adjust artistic style or color grading
Endpoint Structure:
POST {base_url}/videos/{original_video_id}/remix?api-version=2024-08-01-preview
Implementation:
let requestEndpoint = endpoint;
if (isSora2 && remixVideoId) {
const [baseUrl, queryParams] = endpoint.split('?');
const root = baseUrl.replace(/\/videos$/, '');
requestEndpoint = `${root}/videos/${remixVideoId}/remix${queryParams ? '?' + queryParams : ''}`;
}
Cost Analysis per Generation
Sora 1 Pricing Model
Base Rate: ~$0.05 per second per variant at 720p
Resolution Scaling: Cost scales linearly with pixel count
Formula:
const basePrice = 0.05;
const basePixels = 1280 * 720; // Reference resolution
const currentPixels = width * height;
const resolutionMultiplier = currentPixels / basePixels;
const totalCost = basePrice * duration * variants * resolutionMultiplier;
Examples:
- 720p (1280×720), 12 seconds, 1 variant: $0.60
- 1080p (1920×1080), 12 seconds, 1 variant: $1.35
- 720p, 12 seconds, 2 variants: $1.20
Sora 2 Pricing Model
Flat Rate: $0.10 per second per variant (no resolution scaling in public preview)
Formula:
const totalCost = 0.10 * duration * variants;
Examples:
- 720p (1280×720), 4 seconds: $0.40
- 720p (1280×720), 12 seconds: $1.20
- 720p (720×1280), 8 seconds: $0.80
Note: Since Sora 2 currently only supports 720p in public preview, resolution doesn't affect cost, only duration matters.
Cost Comparison
| Scenario | Sora 1 (720p) | Sora 2 (720p) | Winner |
|---|---|---|---|
| 4s video | $0.20 | $0.40 | Sora 1 |
| 12s video | $0.60 | $1.20 | Sora 1 |
| 12s + audio | N/A (no audio) | $1.20 | Sora 2 (unique) |
| Image-to-video | N/A | $0.40-$1.20 | Sora 2 (unique) |
Recommendation: Use Sora 1 for cost-effective silent videos at various resolutions. Use Sora 2 when you need audio, image/video inputs, or remix capabilities.
Technical Limitations & Constraints
Sora 1 Limitations
Resolution Options:
- 9 supported resolutions from 480×480 to 1920×1080
- Includes square, portrait, and landscape formats
- Full list: 480×480, 480×854, 854×480, 720×720, 720×1280, 1280×720, 1080×1080, 1080×1920, 1920×1080
Duration:
- Flexible: 1 to 20 seconds
- Any integer value within range
Variants:
- Depends on resolution:
- 1080p: Variants disabled (n_variants must be 1)
- 720p: Max 2 variants
- Other resolutions: Max 4 variants
Concurrent Jobs: Maximum 2 jobs running simultaneously
Job Expiration: Videos expire 24 hours after generation
Audio: No audio generation (silent videos only)
Sora 2 Limitations
Resolution Options (Public Preview):
- Only 2 options: 720×1280 (portrait) or 1280×720 (landscape)
- No square formats
- No 1080p support in current preview
Duration:
- Fixed options only: 4, 8, or 12 seconds
- No custom durations
- Defaults to 4 seconds if not specified
Variants:
- Not prominently supported in current API documentation
- Focus is on single high-quality generations with audio
Concurrent Jobs: Maximum 2 jobs (same as Sora 1)
Job Expiration: 24 hours (same as Sora 1)
Audio: Native audio generation included (dialogue, sound effects, ambience)
Shared Constraints
Concurrent Processing: Both models enforce a limit of 2 concurrent video jobs per Azure resource. You must wait for one job to complete before starting a third.
Job Lifecycle:
queued → preprocessing → processing/running → completed
Download Window: Videos are available for 24 hours after completion. After expiration, you must regenerate the video.
Generation Time:
- Typical: 1-5 minutes depending on resolution, duration, and API load
- Can occasionally take longer during high demand
Resolution & Duration Support Matrix
Sora 1 Support Matrix
| Resolution | Aspect Ratio | Max Variants | Duration Range | Use Case |
|---|---|---|---|---|
| 480×480 | Square | 4 | 1-20s | Social thumbnails |
| 480×854 | Portrait | 4 | 1-20s | Mobile stories |
| 854×480 | Landscape | 4 | 1-20s | Quick previews |
| 720×720 | Square | 4 | 1-20s | Instagram posts |
| 720×1280 | Portrait | 2 | 1-20s | TikTok/Reels |
| 1280×720 | Landscape | 2 | 1-20s | YouTube shorts |
| 1080×1080 | Square | 1 | 1-20s | Premium social |
| 1080×1920 | Portrait | 1 | 1-20s | Premium vertical |
| 1920×1080 | Landscape | 1 | 1-20s | Full HD content |
Sora 2 Support Matrix
| Resolution | Aspect Ratio | Duration Options | Audio | Generation Modes |
|---|---|---|---|---|
| 720×1280 | Portrait | 4s, 8s, 12s | ✅ Yes | Text, Image, Video Remix |
| 1280×720 | Landscape | 4s, 8s, 12s | ✅ Yes | Text, Image, Video Remix |
Note: Sora 2's limited resolution options in public preview are expected to expand in future releases.
Implementation Best Practices
1. Job Status Polling Strategy
Implement adaptive backoff to avoid overwhelming the API:
const maxAttempts = 180; // 15 minutes max
let attempts = 0;
const baseDelayMs = 3000; // Start with 3 seconds
while (attempts < maxAttempts) {
const response = await fetch(statusUrl, {
headers: { 'api-key': apiKey },
});
if (response.status === 404) {
// Job not ready yet, wait longer
const delayMs = Math.min(15000, baseDelayMs + attempts * 1000);
await new Promise(r => setTimeout(r, delayMs));
attempts++;
continue;
}
const job = await response.json();
// Check completion (different status values for Sora 1 vs 2)
const isCompleted = isSora2
? job.status === 'completed'
: job.status === 'succeeded';
if (isCompleted) break;
// Adaptive backoff
const delayMs = Math.min(15000, baseDelayMs + attempts * 1000);
await new Promise(r => setTimeout(r, delayMs));
attempts++;
}
2. Handling Different Response Structures
Sora 1 Video Download:
const generations = Array.isArray(job.generations) ? job.generations : [];
const genId = generations[0]?.id;
const videoUrl = `${root}/${genId}/content/video`;
Sora 2 Video Download:
const videoUrl = `${root}/videos/${jobId}/content`;
3. Error Handling
try {
const response = await fetch(endpoint, fetchOptions);
if (!response.ok) {
const error = await response.text();
throw new Error(`Video generation failed: ${error}`);
}
// ... handle successful response
} catch (error) {
console.error('[VideoGen] Error:', error);
// Implement retry logic or user notification
}
4. Image Preprocessing for Image-to-Video
Always resize images to match the target video resolution:
async function resizeImage(file: File, targetWidth: number, targetHeight: number): Promise<File> {
return new Promise((resolve, reject) => {
const img = new Image();
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');
img.onload = () => {
canvas.width = targetWidth;
canvas.height = targetHeight;
ctx.drawImage(img, 0, 0, targetWidth, targetHeight);
canvas.toBlob((blob) => {
if (blob) {
const resizedFile = new File([blob], file.name, { type: file.type });
resolve(resizedFile);
} else {
reject(new Error('Failed to create resized image blob'));
}
}, file.type);
};
img.onerror = () => reject(new Error('Failed to load image'));
img.src = URL.createObjectURL(file);
});
}
5. Cost Tracking
Implement cost estimation before generation and tracking after:
// Pre-generation estimate
const estimatedCost = calculateCost(width, height, duration, variants, soraVersion);
// Save generation record
await saveGenerationRecord({
prompt,
soraModel: soraVersion,
duration: parseInt(duration),
resolution: `${width}x${height}`,
variants: parseInt(variants),
generationMode: mode,
estimatedCost,
status: 'queued',
jobId: job.id,
});
// Update after completion
await updateGenerationStatus(jobId, 'completed', { videoId: finalVideoId });
6. Progressive User Feedback
Provide detailed status updates during the generation process:
const statusMessages: Record<string, string> = {
'preprocessing': 'Preprocessing your request...',
'running': 'Generating video...',
'processing': 'Processing video...',
'queued': 'Job queued...',
'in_progress': 'Generating video...',
};
onProgress?.(statusMessages[job.status] || `Status: ${job.status}`);
Conclusion
Building with Azure OpenAI's Sora models requires understanding the nuanced differences between Sora 1 and Sora 2, both in API structure and capabilities. Key takeaways:
- Choose the right model: Sora 1 for resolution flexibility and cost-effectiveness; Sora 2 for audio, image inputs, and remix capabilities
- Handle API differences: Implement conditional logic for parameter formatting and status polling based on model version
- Respect limitations: Plan around concurrent job limits, resolution constraints, and 24-hour expiration windows
- Optimize costs: Calculate estimates upfront and track actual usage for better budget management
- Provide great UX: Implement adaptive polling, progressive status updates, and clear error messages
The future of AI video generation is exciting, and Azure AI Foundry provides production-ready access to these powerful models. As Sora 2 matures and limitations are lifted (especially resolution options), we'll see even more creative applications emerge.
Resources:
This blog post is based on real-world implementation experience building LemonGrab, my AI video generation platform that integrates both Sora 1 and Sora 2 through Azure AI Foundry. The code examples are extracted from production usage.