Connect Azure SRE Agent to NeuBird Hawkeye for AI-powered autonomous incident investigation and root cause analysis across your cloud providers and monitoring tools.
Integrate NeuBird Hawkeye MCP with Azure SRE Agent
TL;DR
If your infrastructure spans multiple clouds say Azure and GCP, or Azure alongside any other cloud provider investigating incidents means jumping between completely separate consoles, log systems, and monitoring stacks. Azure SRE Agent now integrates with NeuBird Hawkeye via Model Context Protocol (MCP), so you can investigate incidents across all of your clouds and monitoring tools from a single conversation.
Key benefits:
- 90-second investigations vs 3-4 hours of manual dashboard-hopping
- Multi-cloud support - Azure, GCP, and other cloud providers investigated from a single conversation
- 42 MCP tools across 7 categories for investigation, analysis, and remediation
- Real-time streaming progress - watch investigations unfold step-by-step (v2.0+)
- MTTR tracking and continuous improvement metrics
The problem: incidents don't stay in one cloud
When an alert fires at 3 AM, your on-call engineer doesn't just need to find the problem — they need to figure out which cloud it's in. A single incident can involve an Azure Function calling a GCP Cloud Run service, with logs split across Azure Monitor and GCP Cloud Logging. Here's what that looks like:
| Challenge | Time Cost |
|---|---|
| Correlate signals across multiple monitoring tools | 30-45 minutes |
| Query logs and metrics from multiple clouds | 45-60 minutes |
| Piece together the chain of events | 30-45 minutes |
| Identify root cause and develop fixes | 60-90 minutes |
| Total | 3-4 hours |
Sound familiar?
"Is it the database? The cache? The load balancer? Let me check the GCP console... now Azure Monitor... now the other logging stack... wait, what time zone is this in?"
What NeuBird Hawkeye does
NeuBird Hawkeye is an autonomous incident investigation platform that connects to your cloud providers and uses AI to:
Core capabilities:
Investigate alerts from your monitoring tools automatically
Query multiple data sources across cloud providers and observability platforms
Generate detailed RCAs with incident timelines
Provide corrective actions with ready-to-execute scripts
Learn from your architecture through customizable instructions
Supported Integrations:
| Category | Platforms |
|---|---|
| Cloud Providers | Azure, Google Cloud Platform, AWS |
| Monitoring Tools | Datadog, Grafana, Dynatrace, New Relic |
| Incident Management | PagerDuty, ServiceNow, FireHydrant, Incident.io |
| Log Aggregation | CloudWatch, Azure Monitor, Google Cloud Logging |
How the integration works
With the new Hawkeye MCP server integration, Azure SRE Agent leverages Hawkeye's autonomous investigation capabilities through natural language conversation.
What is Model Context Protocol (MCP)?
Model Context Protocol is an open standard that enables AI agents to securely connect to external tools and data sources. It's like a universal adapter for connecting LLMs to the real world.
How it connects:
Your SRE Agent can now talk to Hawkeye, which talks to Azure, GCP, and other cloud platforms—all through a single conversational interface.
42 MCP tools across 7 categories
| Category | Tool Count | Examples |
|---|---|---|
| Projects | 5 | List, create, update, delete projects |
| Connections | 10 | Azure, GCP, and other cloud provider connections |
| Investigations | 11 | List alerts, investigate, monitor progress, RCA, follow-ups |
| Instructions | 7 | Create, test, validate, apply investigation rules |
| Analytics | 4 | MTTR, time saved, quality scores, session reports |
| Discovery | 2 | Explore resources across all project connections |
| Help | 1 | Interactive guidance system |
Remote MCP server
Azure SRE Agent connects to the hosted Hawkeye MCP server via Streamable-HTTP transport. No local installation or Node.js required.
| Mode | Best For | How It Works |
|---|---|---|
| Streamable-HTTP | All users, teams, enterprise | Connect to hosted MCP endpoint at https://<your-deployment-name>.app.neubird.ai/mcp |
Supports secure authentication via email/password headers or bearer tokens (JWT).
Example investigation walkthrough
Here's an example investigation with Azure SRE Agent + Hawkeye:
Step 1: List Uninvestigated Alerts
User: Show me uninvestigated alerts from the last 24 hours
Hawkeye response:
Hawkeye Projects:
- Production (UUID: b4b902d4-a540-466b-96cf-4ec2f734b010)
Uninvestigated Alerts (Production):
1. [P1] High CPU usage on api-service pods - 3 instances affected
2. [P2] Database connection pool exhausted - payment-service
3. [P3] Increased error rate in user-authentication endpoint
Step 2: Investigate an Alert
User: Investigate the high CPU alert
The agent calls hawkeye_investigate_alert and streams real-time progress (v2.0+):
Preparing telemetry sources [2s]
Loading investigation context [3s]
Classifying datasources (Step 1) [18s]
Fine-tuning datasources [26s]
Consulting telemetry (Query 7) [1m 6s]
Analyzing telemetry data (Analysis 12) [1m 13s]
Summarizing results [1m 29s]
Investigation completed! Here's what I found...
Step 3: Get Root Cause Analysis
User: Show me the root cause analysis
Hawkeye provides a comprehensive RCA:
Incident summary
| Field | Value |
|---|---|
| Severity | P1 |
| Duration | 45 minutes |
| Status | Resolved |
| Affected Services | api-service (3 pods) |
Root cause
A memory leak in the api-service v2.3.1 deployment caused garbage collection pressure, leading to CPU exhaustion. The leak originated from unclosed database connections in the /users/search endpoint.
Timeline
| Time | Event |
|---|---|
| 14:00 UTC | Deployment of api-service v2.3.1 completed |
| 14:15 UTC | Memory usage begins climbing (500MB 1.2GB) |
| 14:30 UTC | GC pressure increases, CPU spikes to 95%+ |
| 14:35 UTC | Alert fired: High CPU usage |
| 14:45 UTC | Auto-scaling triggered (3 6 pods) |
| 14:50 UTC | Investigation identified unclosed connections |
| 14:55 UTC | Rollback initiated to v2.3.0 |
Corrective actions
Immediate Fix:
# Rollback to previous stable version
kubectl set image deployment/api-service api-service=api-service:v2.3.0
# Force restart to clear leaked connections
kubectl rollout restart deployment/api-service
Preventive measures:
- Add connection pool monitoring to your cloud monitoring dashboard
- Implement database connection lifecycle tests in CI pipeline
- Add memory leak detection to staging environment tests
Time saved
| Metric | Value |
|---|---|
| Manual investigation estimate | 3-4 hours |
| Hawkeye investigation time | 90 seconds |
| Time saved | ~3.5 hours |
Step 4: Follow-Up Questions
The conversation doesn't stop at the RCA. You can ask intelligent follow-ups:
User: Has this happened before?
Agent: Checking investigation history...
Yes, a similar pattern was detected 2 weeks ago in the same service.
The previous incident was also caused by unclosed database connections,
but in a different endpoint (/orders/list).
Recommendation: Implement connection lifecycle testing across all endpoints.
User: What tests should we add to catch this?
Agent: Based on the RCA, I recommend:
1. Connection pool exhaustion test:
- Simulate high request volume to database-heavy endpoints
- Monitor for unclosed connections over 1000 requests
2. Memory leak detection:
- Run load tests with memory profiling enabled
- Alert if memory growth exceeds 20% over 10-minute baseline
3. Integration test for connection lifecycle:
- Verify all database connections are properly closed
- Check connection pool statistics after each test run
Key capabilities
1. Multi-Cloud Investigation
Hawkeye unifies investigation across all your connected infrastructure.
Example scenario:
An alert fires about an Azure Function returning errors. Hawkeye can:
Query Azure Monitor for function execution metrics
Check GCP Cloud Logging for upstream API errors on the GCP side
Review GCP Cloud Monitoring metrics for the dependent Cloud Run service
Correlate with recent deployments in GitHub Actions or Azure DevOps
"Finally, one place to investigate instead of 7 browser tabs!"
2. Instruction Management
Customize how Hawkeye investigates incidents by creating instructions:
| Instruction Type | Purpose | Example |
|---|---|---|
| SYSTEM | Provide architecture context | "We use microservices on Kubernetes with PostgreSQL and Redis" |
| FILTER | Reduce investigation noise | "Only investigate P1 and P2 incidents" |
| RCA | Guide investigation steps | "For database issues, check slow queries and connection pools first" |
| GROUPING | Group related alerts | "Group alerts from the same service within 5 minutes" |
Instruction testing workflow
Before deploying instructions to production, test them on past investigations:
| Step | Action | Tool |
|---|---|---|
| 1 | Validate content | hawkeye_validate_instruction |
| 2 | Apply to test session | hawkeye_apply_session_instruction |
| 3 | Rerun investigation | hawkeye_rerun_session |
| 4 | Compare RCAs | Manual review |
| 5 | Measure improvement | Check quality score |
| 6 | Deploy if better | hawkeye_create_project_instruction |
Note:
Test instruction changes on historical data before applying them to live investigations. No more "oops, that filter was too aggressive!"
3. Analytics and Continuous Improvement
Track the effectiveness of your incident response process:
| Metric | What It Measures |
|---|---|
| MTTR | Mean Time to Resolution |
| Time Saved | Efficiency gains vs manual investigation |
| Quality Score | Accuracy and completeness of RCAs |
| Noise Reduction | Percentage of duplicate/grouped alerts |
Use cases for analytics:
Justify investment in SRE tooling to leadership
Demonstrate continuous improvement over time
Identify patterns in recurring incidents
Measure impact of instruction changes
4. Proactive Investigation
You don't need an alert to investigate. Create manual investigations for proactive analysis:
User: Investigate potential memory leak in user-api pods.
Memory usage increased from 500MB to 1.2GB between 8am-10am UTC today.
Hawkeye will:
- Query metrics for the specified time range
- Correlate with deployment events
- Check for similar patterns in the past
- Provide root cause analysis and recommendations
When to use proactive investigation:
| Use Case | Example |
|---|---|
| Pre-production testing | "Investigate performance regression in staging" |
| Performance analysis | "Why did latency increase after the last deploy?" |
| Capacity planning | "Analyze memory growth trends over the past month" |
| Post-incident deep dive | "What else happened during that outage?" |
Setup guide
Prerequisites
- Azure SRE Agent resource
- Active Hawkeye account (contact NeuBird to get started)
- At least one connected cloud provider in Hawkeye (Azure, GCP, etc.)
Step 1: Add the Remote MCP Connector
- Navigate to your SRE Agent at sre.azure.com (e.g.,
https://sre.azure.com/agents/subscriptions/3eaf90b4-f4fa-416e-a0aa-ac2321d9decb/resourceGroups/sre-agent/providers/Microsoft.App/agents/dbandaru-pagerduty) - Go to Builder > Connectors
- Click Add connector > MCP server (User provided connector)
| Field | Value |
|---|---|
| Name | hawkeye-mcp |
| Connection type | Streamable-HTTP |
| URL | https://<your-deployment-name>.app.neubird.ai/mcp |
| Authentication | Custom headers |
Authentication headers:
| Header | Value |
|---|---|
X-Hawkeye-Email | Your Hawkeye email |
X-Hawkeye-Password | Your Hawkeye password |
Or use bearer token (JWT) for CI/CD:
| Header | Value |
|---|---|
Authorization | Bearer <your-jwt-token> |
To obtain a bearer token:
curl -s -X POST "https://<your-deployment-name>.app.neubird.ai/api/v1/user/login" \
-H "Content-Type: application/json" \
-d '{"email": "your@email.com", "password": "your-password"}' \
| jq -r '.access_token'
Step 2: Create a Hawkeye skill
After adding the connector, create a skill that knows how to use the Hawkeye tools. The skill has a system prompt tuned for incident investigation and a reference to your MCP connector.
- In the left navigation, select Builder > Skills
- Click Add skill
- Paste the following YAML configuration (see below)
- Click Save
api_version: azuresre.ai/v1
kind: AgentConfiguration
metadata:
owner: your-team@contoso.com
version: "1.0.0"
spec:
name: HawkeyeInvestigator
display_name: Hawkeye Incident Investigator
system_prompt: |
You are an incident investigation specialist with access to
NeuBird Hawkeye's autonomous investigation platform.
## Capabilities
### Finding alerts
- List uninvestigated alerts from the last N hours/days
- Filter alerts by severity (P1, P2, P3, P4)
- Search alerts by keyword or service name
### Running investigations
- Investigate existing alerts by alert ID
- Create manual investigations for proactive analysis
- Monitor investigation progress in real-time
### Root cause analysis
- Retrieve detailed RCA reports with incident timelines
- View chain of thought and reasoning
- Get data sources and queries consulted
- Ask follow-up questions about incidents
### Remediation
- Execute corrective action scripts
- Implement preventive measures
- Generate post-mortem documentation
### Project management
- List and switch between Hawkeye projects
- View connected data sources and sync status
- Create and manage investigation instructions
- Get organization-wide incident analytics (MTTR, time saved)
## Best practices
- Start with uninvestigated alerts from the last 24 hours
- Investigations typically complete in 30-90 seconds
- First investigation may take 5-10 minutes while connections sync
- Review corrective actions before executing
## Permissions
All investigations use the connected data sources in your
Hawkeye project. Ensure connections are properly synced
before investigating.
mcp_connectors:
- hawkeye-mcp
handoffs: []
The mcp_connectors field references the connector name from Step 1. This gives the skill access to all 42 Hawkeye tools.
Customizing the skill: Edit the system prompt to match your team's workflow. For example, add instructions like "Always check P1 alerts first" or "Include deployment history in every investigation." The YAML above is a starting point.
Step 3: Test the Integration
- Open a chat session with your SRE Agent
- Type
/agentand select HawkeyeInvestigator - Try these prompts:
Show me uninvestigated alerts from the last 24 hours
List all Hawkeye projects and their connections
Investigate the first P1 alert
Show me the root cause analysis
What corrective actions are recommended?
Has this happened before?
Security
Authentication methods
| Method | Headers | Best For |
|---|---|---|
| Email/Password | X-Hawkeye-Email + X-Hawkeye-Password | Simple setup, most use cases |
| Bearer Token (JWT) | Authorization: Bearer <token> | CI/CD pipelines, OAuth, enterprise |
Data security
Encrypted traffic - HTTPS with TLS 1.2+
Read-only access to cloud providers and monitoring tools
SOC 2 compliant - Secure data processing environment
RBAC support - Role-based access at project level
Access controls
- Each user authenticates with their own Hawkeye credentials
- Investigations scoped to connected data sources in your project
- Respects existing IAM and RBAC policies
Security note:
Store credentials in environment variables, never in config files. Hawkeye only needs read access to investigate.
Available MCP tools (42)
Project tools (5)
| Tool | Description |
|---|---|
hawkeye_list_projects | List all Hawkeye projects |
hawkeye_create_project | Create a new project |
hawkeye_get_project_details | Get project configuration |
hawkeye_update_project | Update project name or description |
hawkeye_delete_project | Delete a project (requires confirmation) |
Connection tools (10)
| Tool | Description |
|---|---|
hawkeye_list_connections | List all available connections |
hawkeye_create_aws_connection | Create AWS connection with IAM role |
hawkeye_create_datadog_connection | Create Datadog connection with API keys |
hawkeye_wait_for_connection_sync | Wait for connection to reach SYNCED state |
hawkeye_add_connection_to_project | Link connections to a project |
hawkeye_list_project_connections | List connections for a specific project |
| + 4 additional tools | Azure, GCP, and other connections |
Investigation tools (11)
| Tool | Description |
|---|---|
hawkeye_list_sessions | List investigation sessions with filtering |
hawkeye_investigate_alert | Investigate an alert (supports real-time streaming) |
hawkeye_create_manual_investigation | Create investigation from custom prompt (supports streaming) |
hawkeye_get_investigation_status | Get real-time progress with step-by-step breakdown |
hawkeye_get_rca | Retrieve root cause analysis |
hawkeye_continue_investigation | Ask follow-up questions on completed investigations |
hawkeye_get_chain_of_thought | View investigation reasoning steps |
hawkeye_get_investigation_sources | List data sources consulted |
hawkeye_get_investigation_queries | List queries executed during investigation |
hawkeye_get_follow_up_suggestions | Get suggested follow-up questions |
hawkeye_get_rca_score | Get investigation quality score |
Instruction tools (7)
| Tool | Description |
|---|---|
hawkeye_list_project_instructions | List project instructions with type/status filtering |
hawkeye_create_project_instruction | Create SYSTEM/FILTER/RCA/GROUPING instruction |
hawkeye_validate_instruction | Validate instruction content before applying |
hawkeye_apply_session_instruction | Apply instruction to session for testing |
hawkeye_rerun_session | Rerun investigation with updated instructions |
| + 2 additional tools | Update and delete instructions |
Analytics tools (4)
| Tool | Description |
|---|---|
hawkeye_get_incident_report | Get organization-wide analytics (MTTR, time saved) |
hawkeye_inspect_session | Get session metadata |
hawkeye_get_session_report | Get summary reports for multiple sessions |
hawkeye_get_session_summary | Get detailed analysis and scoring for a session |
Discovery tools (2)
| Tool | Description |
|---|---|
hawkeye_discover_project_resources | Explore available resources across all project connections |
hawkeye_list_connection_resource_types | Get resource types for connection type and telemetry type |
Help tools (1)
| Tool | Description |
|---|---|
hawkeye_get_guidance | Interactive help system with embedded knowledge base |
Use cases
1. Faster Incident Response
| Phase | Before Hawkeye | After Hawkeye |
|---|---|---|
| Alert detection | Alert notification | Alert notification |
| Investigation | Log into multiple cloud consoles | Ask: "Investigate this alert" |
| Correlation | Manual log/metric analysis | Automated multi-source query |
| Root cause | 2-4 hours | 2-3 minutes |
| Remediation | Write runbook, execute | Copy/paste bash script, execute |
Result: roughly 95% reduction in MTTR for common incident types
2. Knowledge Retention
The problem:
Senior engineer leaves Tribal knowledge lost Junior engineers struggle with same issues
The Hawkeye solution:
Capture investigation patterns through instructions
Preserve institutional knowledge in reusable rules
Train new engineers with past investigation history
3. Reduced Toil
Common repetitive investigations:
| Issue Type | Manual Time | Hawkeye Time | Frequency |
|---|---|---|---|
| Database connection issues | 2 hours | 90 seconds | 3x/week |
| Pod restart loops | 1.5 hours | 60 seconds | 5x/week |
| Deployment failures | 3 hours | 2 minutes | 2x/week |
Result: engineers spend more time on prevention and architecture, less on firefighting
4. Cross-Team Collaboration
Platform team provides:
- SYSTEM instructions describing architecture
- FILTER instructions for noise reduction
- RCA instructions for common patterns
Application team benefits:
- Investigations leverage platform context
- No need for deep infrastructure knowledge
- Consistent incident response across teams
5. Continuous Learning
Track and improve over time:
| Month | MTTR | Time Saved | Quality Score | Noise Reduction |
|---|---|---|---|---|
| Month 1 | 45 min | 15 hours | 7.2/10 | 20% |
| Month 3 | 12 min | 45 hours | 8.5/10 | 55% |
| Month 6 | 3 min | 90 hours | 9.1/10 | 78% |
Result: data-driven improvement of incident response processes
Next steps
The Hawkeye MCP integration is available now for all Azure SRE Agent customers.
Get started
- Contact NeuBird to set up a Hawkeye account
- Connect your cloud providers (Azure, GCP, etc.)
- Add the Hawkeye MCP connector to your SRE Agent
- Create a Hawkeye skill in Builder > Skills
- Start investigating!
Learn more
- Hawkeye MCP documentation
- Tool reference (all 42 tools)
- Advanced workflows
- hawkeye-mcp-server on npm
- NeuBird help documentation
- Azure SRE Agent MCP integration guide
- NeuBird AI
Need OAuth support? Contact NeuBird support: support@neubird.ai
Try it out
Ready to get started?
Quick start checklist:
- Sign up for Hawkeye at https://neubird.ai/contact-us/
- Connect your cloud infrastructure (Azure, GCP, etc.)
- Install the MCP connector in Azure SRE Agent
- Create a Hawkeye skill in Builder > Skills
- Test with "Show me uninvestigated alerts"
- Investigate your first incident in under 2 minutes!
Questions? Drop a comment below or reach out to the Azure SRE Agent team.
Want to see Hawkeye in action? Request a demo from NeuBird: https://neubird.ai/contact-us/
Azure SRE Agent helps SRE teams build automated incident response workflows. Learn more at aka.ms/sreagent.
Tags: #Azure #SREAgent #NeuBird #Hawkeye #MCP #IncidentResponse #DevOps #SRE #AI #Automation #CloudOps #MTTR #RootCauseAnalysis