Event details
Modern distributed systems fail in complex, unpredictable ways and traditional logging often can’t keep up. As engineers, we need more than dashboards; we need deep, connected telemetry that explains why things break, not just what broke. In this session, I’ll share real-world lessons from building the telemetry and observability platform behind Azure Container Apps and the Aspire Dashboard used by thousands of customer environments.
We’ll walk through the practical challenges of capturing end-to-end visibility across microservices, containers, proxies, and control planes. I’ll demonstrate how OpenTelemetry (OTEL) unified our traces, metrics, and logs across heterogeneous workloads; how distributed context propagation reveals root causes hidden behind multiple hops; and why cardinality explosions, sampling decisions, and inconsistent instrumentation are some of the most expensive mistakes teams make.
This talk is not theoretical — it’s a behind-the-scenes look at what worked, what didn’t, and what we learned while building an observability system for real production-scale traffic. I’ll show common anti-patterns from the field, patterns that consistently improve developer velocity, and how to design telemetry that accelerates debugging rather than creating noise.
Live examples will walk through trace visualizations, anomaly detection patterns, and debugging scenarios that surfaced unexpected bottlenecks. Attendees will leave with a clear blueprint for building reliable, debuggable distributed systems with observability as a first-class concern.
If your team is scaling microservices, struggling with opaque failures, or trying to modernize your observability stack, this session will give you the mental model, patterns, and practices needed to move from "reactive firefighting" to "proactive insight-driven engineering."
Join us as https://www.linkedin.com/in/sneha-parthasarathy/ talks to us about telemetry.