Forum Discussion
Index data from SharePoint document libraries => Visioning / Image Analysis
hi namor38 Yes, what you’re seeing is expected behavior. The built-in SharePoint Online indexer does NOT automatically enable full vision/image analysis. By default, it only performs basic OCR on supported image formats if text extraction is possible. Anything beyond that (image understanding, tagging, captions, object detection, etc.) will not run unless you explicitly add a skillset.
That’s why you don’t see “visioning” working out of the box.
What you must do to enable Vision / Image Analysis
You must use a custom skillset with Vision cognitive skills.
Required components
- Azure AI Search service (Standard tier or higher)
- Azure AI Services / Vision resource (same region recommended)
- Custom skillset attached to your SharePoint indexer
Minimum working approach
1.Create a skillset with Vision skills
Typical skills used:
- OCRSkill (text extraction from images)
- ImageAnalysisSkill (tags, captions, objects)
Example (simplified):
{
"skills": [
{
"@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
"name": "ocrSkill",
"inputs": [
{ "name": "image", "source": "/document/normalized_images/*" }
],
"outputs": [
{ "name": "text", "targetName": "ocrText" }
]
},
{
"@odata.type": "#Microsoft.Skills.Vision.ImageAnalysisSkill",
"name": "imageAnalysis",
"visualFeatures": ["Tags", "Description"],
"inputs": [
{ "name": "image", "source": "/document/normalized_images/*" }
],
"outputs": [
{ "name": "tags", "targetName": "imageTags" },
{ "name": "description", "targetName": "imageDescription" }
]
}
]
}
2.Attach the skillset to the SharePoint indexer
When creating or updating the indexer:
- Reference the skillset
- Ensure imageAction is enabled (e.g., generateNormalizedImages)
3.Map outputs to index fields
Your index must have fields like:
- ocrText
- imageTags
- imageDescription
Otherwise, the data is generated but never searchable.
Limitations
- SharePoint indexer does not support advanced vision automatically
- Vision skills add cost (Cognitive Services usage)
- Processing large image libraries can be slow
- No layout-aware image understanding like Copilot — this is raw vision AI