Everyone loves a good mystery, but not when it involves operating our services. When investigating a production issue, a software engineer can feel like a detective — digging through dashboards and combing through log stores to gather evidence and reconstruct the scene of a crime. During this investigation, every second counts! Time researching an issue is costly, for engineers as well as for customers.
At Netflix, we built a tool called Edgar to empower engineers to troubleshoot production issues quickly. Edgar starts with distributed tracing and supplements traces with additional context like logs, metadata, and analysis to give a more detailed picture of what happened, without hopping from one log store to another dashboard.
You’ll leave this talk with an understanding of how we supplement distributed tracing with additional context to minimize footwork and speed up resolution. You’ll hear about ways Edgar has delivered value and learn about the challenges we’ve faced. Finally, we’ll show you how linking together your data sources can help you and your team solve mysteries faster.