Everyone loves a good mystery, but not when it involves operating our services. When investigating a production issue, a software engineer can feel like a detective — digging through dashboards and combing through log stores to gather evidence and reconstruct the scene of a crime. During this investigation, every second counts! Time researching an issue is costly, for engineers as well as for customers.
At Netflix, we built a tool called Edgar to empower engineers to troubleshoot production issues quickly. Edgar starts with distributed tracing and supplements traces with additional context like logs, metadata, and analysis to give a more detailed picture of what happened, without hopping from one log store to another dashboard.
You’ll leave this talk with an understanding of how we supplement distributed tracing with additional context to minimize footwork and speed up resolution. You’ll hear about ways Edgar has delivered value and learn about the challenges we’ve faced. Finally, we’ll show you how linking together your data sources can help you and your team solve mysteries faster.
To view the video you must have a CMG membership. Sign up today!
For existing members sign in here.
Speaker bio: Elizabeth Carretto is a Senior Software Engineer at Netflix in Productivity Engineering, where she builds UIs for the observability space. Her work focuses on delivering value from observability data to service operators through products like Edgar, a troubleshooting tool built on top of distributed tracing. She enjoys building tools that help engineers quickly understand an issue when they get paged in the middle of the night.