This tiny app just does one thing: upload an image → get a natural one-line caption. Under the hood:
- Azure AI Vision extracts high-confidence tags from the image.
- Azure OpenAI (GPT-4o-mini) turns those tags into a fluent caption.
- Streamlit provides a lightweight, Python-native UI so you can ship fast.
All code + infra templates: image_caption_app in the App Service AI Samples repo: https://github.com/Azure-Samples/appservice-ai-samples/tree/main/image_caption_app
What are these components?
- What is Streamlit? An open-source Python framework to build interactive data/AI apps with just a few lines of code—perfect for quick, clean UIs.
- What is Azure AI Vision (Vision API)? A cloud service that analyzes images and returns rich signals like tags with confidence scores, which we use as grounded inputs for captioning.
How it works (at a glance)
- User uploads a photo in Streamlit.
- The app calls Azure AI Vision → gets a list of tags (keeps only high-confidence ones).
- The app sends those tags to GPT-4o-mini → generates a one-line caption.
- Caption is shown instantly in the browser.
Prerequisites
- Azure subscription — https://azure.microsoft.com/en-us/pricing/purchase-options/azure-account
- Azure CLI — https://learn.microsoft.com/azure/cli/azure/install-azure-cli-linux
- Azure Developer CLI (azd) — https://learn.microsoft.com/azure/developer/azure-developer-cli/install-azd
- Python 3.10+ — https://www.python.org/downloads/
- Visual Studio Code (optional) — https://code.visualstudio.com/download
- Streamlit (optional for local runs) — https://docs.streamlit.io/get-started/installation
- Managed Identity on App Service (recommended) — https://learn.microsoft.com/azure/app-service/overview-managed-identity
Resources you’ll deploy
You can create everything manually or with the provided azd template.
What you need
- Azure App Service (Linux) to host the Streamlit app.
- Azure AI Foundry/OpenAI with a gpt-4o-mini deployment for caption generation.
- Azure AI Vision (Computer Vision) for image tagging.
- Managed Identity enabled on the Web App, with RBAC grants so the app can call Vision and OpenAI without secrets.
One-command deploy with azd (recommended)
The sample includes infra under image_caption_app/infra so azd up can provision + deploy in one go.
# 1) Clone and move into the sample
git clone https://github.com/Azure-Samples/appservice-ai-samples
cd appservice-ai-samples/image_caption_app
# 2) Log in and provision + deploy
azd auth login
azd up
Manual path (if you prefer doing it yourself)
- Create Azure AI Vision, note the endpoint (custom subdomain).
- Create Azure AI Foundry/OpenAI and deploy gpt-4o-mini.
- Create App Service (Linux, Python) and enable System-Assigned Managed Identity.
- Assign roles to the Web App’s Managed Identity:
- Cognitive Services OpenAI User on your OpenAI resource.
- Cognitive Services User on your Vision resource.
- Add app settings for endpoints and deployment names (see repo), deploy the code, and run.
Startup command (manual setting):
If you’re configuring the Web App yourself (instead of using the Bicep), set the Startup Command to:
streamlit run app.py --server.port 8000 --server.address 0.0.0.0
Portal path: App Service → Configuration → General settings → Startup Command.
CLI example:
az webapp config set \
--name <your-webapp-name> \
--resource-group <your-rg> \
--startup-file "streamlit run app.py --server.port 8000 --server.address 0.0.0.0"
(The provided Bicep template already sets this for you.)
Code tour (the important bits)
Top-level flow (app.py)
First we get tags from Vision, then ask GPT-4o-mini for a one-liner:
tags = extract_tags(image_bytes)
caption = generate_caption(tags)
Vision call (utils/vision.py)
Call the Vision REST API, parse JSON, and keep high-confidence tags (> 0.6):
response = requests.post(
VISION_API_URL,
headers=headers,
params=PARAMS,
data=image_bytes,
timeout=30,
)
response.raise_for_status()
analysis = response.json()
tags = [
t.get('name')
for t in analysis.get('tags', [])
if t.get('name') and t.get('confidence', 0) > 0.6
]
Caption generation (utils/openai_caption.py)
Join tags and ask GPT-4o-mini for a natural caption:
tag_text = ", ".join(tags)
prompt = f"""
You are an assistant that generates vivid, natural-sounding captions for images.
Create a one-line caption for an image that contains the following: {tag_text}.
"""
response = client.chat.completions.create(
model=DEPLOYMENT_NAME,
messages=[
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": prompt.strip()}
],
max_tokens=60,
temperature=0.7
)
return response.choices[0].message.content.strip()
Security & auth: Managed Identity by default (recommended)
This sample ships to use Managed Identity on App Service—no keys in config.
- The Web App’s Managed Identity authenticates to Vision and Azure OpenAI via Microsoft Entra ID.
- Prefer Managed Identity in production; if you need to test locally, you can switch to key-based auth by supplying the service keys in your environment.
Run it locally (optional)
# From the sample folder
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Set env vars for endpoints + deployment (and keys if not using MI locally)
streamlit run app.py
Repo map
- App + Streamlit UI + helpers: image_caption_app/
- Bicep infrastructure (used by azd up): image_caption_app/infra/
What’s next — ways to extend this sample
- Richer vision signals: Add object detection, OCR, or brand detection; blend those into the prompt for sharper captions.
- Persistence & gallery: Save images to Blob Storage and captions/metadata to Cosmos DB or SQLite; add a Streamlit gallery.
- Performance & cost: Cache tags by image hash; cap image size; track tokens/latency.
- Observability: Wire up Application Insights with custom events (e.g., caption_generated).
Looking for more Python samples? Check out the repo: https://github.com/Azure-Samples/appservice-ai-samples/tree/main
For more Azure App Service AI samples and best practices, check out the Azure App Service AI integration documentation