Blog Post

Apps on Azure Blog
10 MIN READ

Get started with NeuBird Hawkeye MCP server in Azure SRE Agent

dbandaru's avatar
dbandaru
Icon for Microsoft rankMicrosoft
Mar 23, 2026

Connect Azure SRE Agent to NeuBird Hawkeye for AI-powered autonomous incident investigation and root cause analysis across your cloud providers and monitoring tools.

Integrate NeuBird Hawkeye MCP with Azure SRE Agent


TL;DR

If your infrastructure spans multiple clouds say Azure and GCP, or Azure alongside any other cloud provider investigating incidents means jumping between completely separate consoles, log systems, and monitoring stacks. Azure SRE Agent now integrates with NeuBird Hawkeye via Model Context Protocol (MCP), so you can investigate incidents across all of your clouds and monitoring tools from a single conversation.

Key benefits:

  • 90-second investigations vs 3-4 hours of manual dashboard-hopping
  • Multi-cloud support - Azure, GCP, and other cloud providers investigated from a single conversation
  • 42 MCP tools across 7 categories for investigation, analysis, and remediation
  • Real-time streaming progress - watch investigations unfold step-by-step (v2.0+)
  • MTTR tracking and continuous improvement metrics

The problem: incidents don't stay in one cloud

When an alert fires at 3 AM, your on-call engineer doesn't just need to find the problem — they need to figure out which cloud it's in. A single incident can involve an Azure Function calling a GCP Cloud Run service, with logs split across Azure Monitor and GCP Cloud Logging. Here's what that looks like:

ChallengeTime Cost
Correlate signals across multiple monitoring tools30-45 minutes
Query logs and metrics from multiple clouds45-60 minutes
Piece together the chain of events30-45 minutes
Identify root cause and develop fixes60-90 minutes
Total3-4 hours

Sound familiar?
"Is it the database? The cache? The load balancer? Let me check the GCP console... now Azure Monitor... now the other logging stack... wait, what time zone is this in?"


What NeuBird Hawkeye does

NeuBird Hawkeye is an autonomous incident investigation platform that connects to your cloud providers and uses AI to:

Core capabilities:

Investigate alerts from your monitoring tools automatically
Query multiple data sources across cloud providers and observability platforms
Generate detailed RCAs with incident timelines
Provide corrective actions with ready-to-execute scripts
Learn from your architecture through customizable instructions

Supported Integrations:

CategoryPlatforms
Cloud ProvidersAzure, Google Cloud Platform, AWS
Monitoring ToolsDatadog, Grafana, Dynatrace, New Relic
Incident ManagementPagerDuty, ServiceNow, FireHydrant, Incident.io
Log AggregationCloudWatch, Azure Monitor, Google Cloud Logging

How the integration works

With the new Hawkeye MCP server integration, Azure SRE Agent leverages Hawkeye's autonomous investigation capabilities through natural language conversation.

What is Model Context Protocol (MCP)?

Model Context Protocol is an open standard that enables AI agents to securely connect to external tools and data sources. It's like a universal adapter for connecting LLMs to the real world.

How it connects:
Your SRE Agent can now talk to Hawkeye, which talks to Azure, GCP, and other cloud platforms—all through a single conversational interface.

42 MCP tools across 7 categories

CategoryTool CountExamples
Projects5List, create, update, delete projects
Connections10Azure, GCP, and other cloud provider connections
Investigations11List alerts, investigate, monitor progress, RCA, follow-ups
Instructions7Create, test, validate, apply investigation rules
Analytics4MTTR, time saved, quality scores, session reports
Discovery2Explore resources across all project connections
Help1Interactive guidance system

Remote MCP server

Azure SRE Agent connects to the hosted Hawkeye MCP server via Streamable-HTTP transport. No local installation or Node.js required.

ModeBest ForHow It Works
Streamable-HTTPAll users, teams, enterpriseConnect to hosted MCP endpoint at https://<your-deployment-name>.app.neubird.ai/mcp

Supports secure authentication via email/password headers or bearer tokens (JWT).


Example investigation walkthrough

Here's an example investigation with Azure SRE Agent + Hawkeye:

Step 1: List Uninvestigated Alerts

User: Show me uninvestigated alerts from the last 24 hours

Hawkeye response:

Hawkeye Projects:
- Production (UUID: b4b902d4-a540-466b-96cf-4ec2f734b010)

Uninvestigated Alerts (Production):
1. [P1] High CPU usage on api-service pods - 3 instances affected
2. [P2] Database connection pool exhausted - payment-service
3. [P3] Increased error rate in user-authentication endpoint

Step 2: Investigate an Alert

User: Investigate the high CPU alert

The agent calls hawkeye_investigate_alert and streams real-time progress (v2.0+):

 Preparing telemetry sources [2s]
 Loading investigation context [3s]
 Classifying datasources (Step 1) [18s]
 Fine-tuning datasources [26s]
 Consulting telemetry (Query 7) [1m 6s]
 Analyzing telemetry data (Analysis 12) [1m 13s]
 Summarizing results [1m 29s]

Investigation completed! Here's what I found...

Step 3: Get Root Cause Analysis

User: Show me the root cause analysis

Hawkeye provides a comprehensive RCA:


Incident summary

FieldValue
SeverityP1
Duration45 minutes
StatusResolved
Affected Servicesapi-service (3 pods)

Root cause

A memory leak in the api-service v2.3.1 deployment caused garbage collection pressure, leading to CPU exhaustion. The leak originated from unclosed database connections in the /users/search endpoint.

Timeline

TimeEvent
14:00 UTCDeployment of api-service v2.3.1 completed
14:15 UTCMemory usage begins climbing (500MB 1.2GB)
14:30 UTCGC pressure increases, CPU spikes to 95%+
14:35 UTCAlert fired: High CPU usage
14:45 UTCAuto-scaling triggered (3 6 pods)
14:50 UTCInvestigation identified unclosed connections
14:55 UTCRollback initiated to v2.3.0

Corrective actions

Immediate Fix:

# Rollback to previous stable version
kubectl set image deployment/api-service api-service=api-service:v2.3.0

# Force restart to clear leaked connections
kubectl rollout restart deployment/api-service

Preventive measures:

  1. Add connection pool monitoring to your cloud monitoring dashboard
  2. Implement database connection lifecycle tests in CI pipeline
  3. Add memory leak detection to staging environment tests

Time saved

MetricValue
Manual investigation estimate3-4 hours
Hawkeye investigation time90 seconds
Time saved~3.5 hours

Step 4: Follow-Up Questions

The conversation doesn't stop at the RCA. You can ask intelligent follow-ups:

User: Has this happened before?

Agent: Checking investigation history...

Yes, a similar pattern was detected 2 weeks ago in the same service. 
The previous incident was also caused by unclosed database connections, 
but in a different endpoint (/orders/list).

 Recommendation: Implement connection lifecycle testing across all endpoints.
User: What tests should we add to catch this?

Agent: Based on the RCA, I recommend:

 1. Connection pool exhaustion test:
 - Simulate high request volume to database-heavy endpoints
 - Monitor for unclosed connections over 1000 requests

 2. Memory leak detection:
 - Run load tests with memory profiling enabled
 - Alert if memory growth exceeds 20% over 10-minute baseline

 3. Integration test for connection lifecycle:
 - Verify all database connections are properly closed
 - Check connection pool statistics after each test run

Key capabilities

1. Multi-Cloud Investigation

Hawkeye unifies investigation across all your connected infrastructure.

Example scenario:

An alert fires about an Azure Function returning errors. Hawkeye can:

Query Azure Monitor for function execution metrics
Check GCP Cloud Logging for upstream API errors on the GCP side
Review GCP Cloud Monitoring metrics for the dependent Cloud Run service
Correlate with recent deployments in GitHub Actions or Azure DevOps


"Finally, one place to investigate instead of 7 browser tabs!"


2. Instruction Management

Customize how Hawkeye investigates incidents by creating instructions:

Instruction TypePurposeExample
SYSTEMProvide architecture context"We use microservices on Kubernetes with PostgreSQL and Redis"
FILTERReduce investigation noise"Only investigate P1 and P2 incidents"
RCAGuide investigation steps"For database issues, check slow queries and connection pools first"
GROUPINGGroup related alerts"Group alerts from the same service within 5 minutes"

Instruction testing workflow

Before deploying instructions to production, test them on past investigations:

StepActionTool
1Validate contenthawkeye_validate_instruction
2Apply to test sessionhawkeye_apply_session_instruction
3Rerun investigationhawkeye_rerun_session
4Compare RCAsManual review
5Measure improvementCheck quality score
6Deploy if betterhawkeye_create_project_instruction

Note:
Test instruction changes on historical data before applying them to live investigations. No more "oops, that filter was too aggressive!"


3. Analytics and Continuous Improvement

Track the effectiveness of your incident response process:

MetricWhat It Measures
MTTRMean Time to Resolution
Time SavedEfficiency gains vs manual investigation
Quality ScoreAccuracy and completeness of RCAs
Noise ReductionPercentage of duplicate/grouped alerts

Use cases for analytics:

Justify investment in SRE tooling to leadership
Demonstrate continuous improvement over time
Identify patterns in recurring incidents
Measure impact of instruction changes


4. Proactive Investigation

You don't need an alert to investigate. Create manual investigations for proactive analysis:

User: Investigate potential memory leak in user-api pods. 
Memory usage increased from 500MB to 1.2GB between 8am-10am UTC today.

Hawkeye will:

  1. Query metrics for the specified time range
  2. Correlate with deployment events
  3. Check for similar patterns in the past
  4. Provide root cause analysis and recommendations

When to use proactive investigation:

Use CaseExample
Pre-production testing"Investigate performance regression in staging"
Performance analysis"Why did latency increase after the last deploy?"
Capacity planning"Analyze memory growth trends over the past month"
Post-incident deep dive"What else happened during that outage?"

Setup guide

Prerequisites

  • Azure SRE Agent resource
  • Active Hawkeye account (contact NeuBird to get started)
  • At least one connected cloud provider in Hawkeye (Azure, GCP, etc.)

Step 1: Add the Remote MCP Connector

  1. Navigate to your SRE Agent at sre.azure.com (e.g., https://sre.azure.com/agents/subscriptions/3eaf90b4-f4fa-416e-a0aa-ac2321d9decb/resourceGroups/sre-agent/providers/Microsoft.App/agents/dbandaru-pagerduty)
  2. Go to Builder > Connectors
  3. Click Add connector > MCP server (User provided connector)
FieldValue
Namehawkeye-mcp
Connection typeStreamable-HTTP
URLhttps://<your-deployment-name>.app.neubird.ai/mcp
AuthenticationCustom headers

Authentication headers:

HeaderValue
X-Hawkeye-EmailYour Hawkeye email
X-Hawkeye-PasswordYour Hawkeye password

Or use bearer token (JWT) for CI/CD:

HeaderValue
AuthorizationBearer <your-jwt-token>

To obtain a bearer token:

curl -s -X POST "https://<your-deployment-name>.app.neubird.ai/api/v1/user/login" \
 -H "Content-Type: application/json" \
 -d '{"email": "your@email.com", "password": "your-password"}' \
 | jq -r '.access_token'

Step 2: Create a Hawkeye skill

After adding the connector, create a skill that knows how to use the Hawkeye tools. The skill has a system prompt tuned for incident investigation and a reference to your MCP connector.

  1. In the left navigation, select Builder > Skills
  2. Click Add skill
  3. Paste the following YAML configuration (see below)
  4. Click Save
api_version: azuresre.ai/v1
kind: AgentConfiguration
metadata:
  owner: your-team@contoso.com
  version: "1.0.0"
spec:
  name: HawkeyeInvestigator
  display_name: Hawkeye Incident Investigator

  system_prompt: |
    You are an incident investigation specialist with access to
    NeuBird Hawkeye's autonomous investigation platform.

    ## Capabilities

    ### Finding alerts
    - List uninvestigated alerts from the last N hours/days
    - Filter alerts by severity (P1, P2, P3, P4)
    - Search alerts by keyword or service name

    ### Running investigations
    - Investigate existing alerts by alert ID
    - Create manual investigations for proactive analysis
    - Monitor investigation progress in real-time

    ### Root cause analysis
    - Retrieve detailed RCA reports with incident timelines
    - View chain of thought and reasoning
    - Get data sources and queries consulted
    - Ask follow-up questions about incidents

    ### Remediation
    - Execute corrective action scripts
    - Implement preventive measures
    - Generate post-mortem documentation

    ### Project management
    - List and switch between Hawkeye projects
    - View connected data sources and sync status
    - Create and manage investigation instructions
    - Get organization-wide incident analytics (MTTR, time saved)

    ## Best practices
    - Start with uninvestigated alerts from the last 24 hours
    - Investigations typically complete in 30-90 seconds
    - First investigation may take 5-10 minutes while connections sync
    - Review corrective actions before executing

    ## Permissions
    All investigations use the connected data sources in your
    Hawkeye project. Ensure connections are properly synced
    before investigating.

  mcp_connectors:
    - hawkeye-mcp

  handoffs: []

The mcp_connectors field references the connector name from Step 1. This gives the skill access to all 42 Hawkeye tools.

Customizing the skill: Edit the system prompt to match your team's workflow. For example, add instructions like "Always check P1 alerts first" or "Include deployment history in every investigation." The YAML above is a starting point.


Step 3: Test the Integration

  1. Open a chat session with your SRE Agent
  2. Type /agent and select HawkeyeInvestigator
  3. Try these prompts:
 Show me uninvestigated alerts from the last 24 hours

 List all Hawkeye projects and their connections

 Investigate the first P1 alert

 Show me the root cause analysis

 What corrective actions are recommended?

 Has this happened before?

Security

Authentication methods

MethodHeadersBest For
Email/PasswordX-Hawkeye-Email + X-Hawkeye-PasswordSimple setup, most use cases
Bearer Token (JWT)Authorization: Bearer <token>CI/CD pipelines, OAuth, enterprise

Data security

Encrypted traffic - HTTPS with TLS 1.2+
Read-only access to cloud providers and monitoring tools
SOC 2 compliant - Secure data processing environment
RBAC support - Role-based access at project level

Access controls

  • Each user authenticates with their own Hawkeye credentials
  • Investigations scoped to connected data sources in your project
  • Respects existing IAM and RBAC policies

Security note:
Store credentials in environment variables, never in config files. Hawkeye only needs read access to investigate.


Available MCP tools (42)

Project tools (5)

ToolDescription
hawkeye_list_projectsList all Hawkeye projects
hawkeye_create_projectCreate a new project
hawkeye_get_project_detailsGet project configuration
hawkeye_update_projectUpdate project name or description
hawkeye_delete_projectDelete a project (requires confirmation)

Connection tools (10)

ToolDescription
hawkeye_list_connectionsList all available connections
hawkeye_create_aws_connectionCreate AWS connection with IAM role
hawkeye_create_datadog_connectionCreate Datadog connection with API keys
hawkeye_wait_for_connection_syncWait for connection to reach SYNCED state
hawkeye_add_connection_to_projectLink connections to a project
hawkeye_list_project_connectionsList connections for a specific project
+ 4 additional toolsAzure, GCP, and other connections

Investigation tools (11)

ToolDescription
hawkeye_list_sessionsList investigation sessions with filtering
hawkeye_investigate_alertInvestigate an alert (supports real-time streaming)
hawkeye_create_manual_investigationCreate investigation from custom prompt (supports streaming)
hawkeye_get_investigation_statusGet real-time progress with step-by-step breakdown
hawkeye_get_rcaRetrieve root cause analysis
hawkeye_continue_investigationAsk follow-up questions on completed investigations
hawkeye_get_chain_of_thoughtView investigation reasoning steps
hawkeye_get_investigation_sourcesList data sources consulted
hawkeye_get_investigation_queriesList queries executed during investigation
hawkeye_get_follow_up_suggestionsGet suggested follow-up questions
hawkeye_get_rca_scoreGet investigation quality score

Instruction tools (7)

ToolDescription
hawkeye_list_project_instructionsList project instructions with type/status filtering
hawkeye_create_project_instructionCreate SYSTEM/FILTER/RCA/GROUPING instruction
hawkeye_validate_instructionValidate instruction content before applying
hawkeye_apply_session_instructionApply instruction to session for testing
hawkeye_rerun_sessionRerun investigation with updated instructions
+ 2 additional toolsUpdate and delete instructions

Analytics tools (4)

ToolDescription
hawkeye_get_incident_reportGet organization-wide analytics (MTTR, time saved)
hawkeye_inspect_sessionGet session metadata
hawkeye_get_session_reportGet summary reports for multiple sessions
hawkeye_get_session_summaryGet detailed analysis and scoring for a session

Discovery tools (2)

ToolDescription
hawkeye_discover_project_resourcesExplore available resources across all project connections
hawkeye_list_connection_resource_typesGet resource types for connection type and telemetry type

Help tools (1)

ToolDescription
hawkeye_get_guidanceInteractive help system with embedded knowledge base

Use cases

1. Faster Incident Response

PhaseBefore HawkeyeAfter Hawkeye
Alert detectionAlert notificationAlert notification
InvestigationLog into multiple cloud consolesAsk: "Investigate this alert"
CorrelationManual log/metric analysisAutomated multi-source query
Root cause2-4 hours2-3 minutes
RemediationWrite runbook, executeCopy/paste bash script, execute

Result: roughly 95% reduction in MTTR for common incident types


2. Knowledge Retention

The problem:

Senior engineer leaves Tribal knowledge lost Junior engineers struggle with same issues

The Hawkeye solution:

Capture investigation patterns through instructions
Preserve institutional knowledge in reusable rules
Train new engineers with past investigation history


3. Reduced Toil

Common repetitive investigations:

Issue TypeManual TimeHawkeye TimeFrequency
Database connection issues2 hours90 seconds3x/week
Pod restart loops1.5 hours60 seconds5x/week
Deployment failures3 hours2 minutes2x/week

Result: engineers spend more time on prevention and architecture, less on firefighting


4. Cross-Team Collaboration

Platform team provides:

  • SYSTEM instructions describing architecture
  • FILTER instructions for noise reduction
  • RCA instructions for common patterns

Application team benefits:

  • Investigations leverage platform context
  • No need for deep infrastructure knowledge
  • Consistent incident response across teams

5. Continuous Learning

Track and improve over time:

MonthMTTRTime SavedQuality ScoreNoise Reduction
Month 145 min15 hours7.2/1020%
Month 312 min45 hours8.5/1055%
Month 63 min90 hours9.1/1078%

Result: data-driven improvement of incident response processes


Next steps

The Hawkeye MCP integration is available now for all Azure SRE Agent customers.

Get started

  1. Contact NeuBird to set up a Hawkeye account
  2. Connect your cloud providers (Azure, GCP, etc.)
  3. Add the Hawkeye MCP connector to your SRE Agent
  4. Create a Hawkeye skill in Builder > Skills
  5. Start investigating!

Learn more

Need OAuth support? Contact NeuBird support: support@neubird.ai


Try it out

Ready to get started?

Quick start checklist:

  • Sign up for Hawkeye at https://neubird.ai/contact-us/
  • Connect your cloud infrastructure (Azure, GCP, etc.)
  • Install the MCP connector in Azure SRE Agent
  • Create a Hawkeye skill in Builder > Skills
  • Test with "Show me uninvestigated alerts"
  • Investigate your first incident in under 2 minutes!

Questions? Drop a comment below or reach out to the Azure SRE Agent team.

Want to see Hawkeye in action? Request a demo from NeuBird: https://neubird.ai/contact-us/


Azure SRE Agent helps SRE teams build automated incident response workflows. Learn more at aka.ms/sreagent.

Tags: #Azure #SREAgent #NeuBird #Hawkeye #MCP #IncidentResponse #DevOps #SRE #AI #Automation #CloudOps #MTTR #RootCauseAnalysis

Published Mar 23, 2026
Version 1.0
No CommentsBe the first to comment