Blog Post

Azure AI Foundry Blog
3 MIN READ

The Future of AI: An Intern’s Adventure Turning Hours of Video into Minutes of Meaning

kavinjindel's avatar
kavinjindel
Icon for Microsoft rankMicrosoft
Sep 25, 2025

The Future of AI blog series is an evolving collection of posts from the AI Futures team in collaboration with subject matter experts across Microsoft. In this series, we explore tools and technologies that will drive the next generation of AI. Explore more at: Collections | Microsoft Learn

The moment that started it

On day one of my internship, I sat with a video editor scrubbing through a 90-minute event recording to find “the best 90 seconds.” She knew exactly what the story should feel like—setup → reveal → payoff—but the tools in front of her only knew where scenes changed or audio spiked. That gap between what matters and what machines notice is where this project began.

AutoHighlight is the result: a sample app with a moment-aware highlight pattern turning long-form video into concise, purpose-built stories, guided by user intent, grounded in Azure content understanding, and planned by an OpenAI reasoning model—with humans in control of taste.

Why highlights feel hard (and why now)

Across industries, the needs are consistent—even if the audiences aren’t:

  • Broadcasters & streamers need same-day recaps that carry a game or show’s narrative, plus variants for social and archives—without starting from scratch for each cut.
  • Sports organizations want reels conditioned on players, tactics, or sponsorship moments, paced differently for TikTok vs. YouTube, while preserving match context.
  • Enterprise events & marketing teams need snackable recaps of keynotes and panels that feel like a mini story, not a collage of loud moments.
  • Learning & enablement teams want topic-centric summaries that follow problem → method → result so learners retain what matters.

Traditional auto-cutters optimize for boundaries (scene changes, loudness spikes). Editors optimize for beats (setup, escalation, payoff). AutoHighlight treats highlights as a reasoning problem, not just a trimming task.

What we built

AutoHighlight converts long-form video into coherent highlight reels by combining:

  1. Intent capture — Describe audience/outcome, pick duration and clip density, optionally name entities (players, speakers, topics).
  2. Azure AI Content Understanding (CU) — Produces a semantic index (chapters, transcript, speaker turns, on-screen cues) to surface meaningful candidate segments.
  3. Reasoned planning — A model proposes an Edit Decision List (EDL) in a three-act arc with a short “why this belongs” note per segment.
  4. Stitch & score — Assemble the reel, apply simple transitions, and run A/V checks (frozen frames, dupes, runtime) so the output is watchable and auditable.

 

Lessons from the journey

Challenge I hitWhat I changedWhat that unlocked
Reels felt like clip soup—exciting moments, no story.Gave every reel a spine (intro → peak → close).Narrative > novelty. Engagement rose when clips served a mini story.
One cut couldn’t satisfy fans vs. analysts vs. social.Put brief + audience first; tuned density and length.Intent drives the edit. Right reel for the right channel.
LLM plans drifted (duplicates, odd order, overruns).Added light guardrails + self-check: count, chronology, diversity, runtime.Small constraints raise quality without slowing teams down.
Reviews stalled on “why this clip?Added short WhyChosen notes to each EDL timestamp.Explainable EDLs turned review from debate into decisions.
CU candidates missed domain specifics.Brief-conditioned schema gen (video type, density, duration) with optional human review.Context lifts recall and reduce re-gens.

What's next

We're focused on continuing to deliver improved processing of unstructured content with Content Understanding. Content Understanding is showing a lot of promise with customers for multimodal content scenarios like this. We're pushing to stabilize and scale it so it can be a generally available service soon. 

Now it’s your turn to create with Azure AI Foundry

Updated Sep 25, 2025
Version 2.0
No CommentsBe the first to comment