JOB ALERT! Senior Software Engineer, Observability

Making the case for cloud and containers to non-engineers in your organization

September 20, 2021

IMPACT 2022 – Physical data center capacity planning – Performance management at scale

September 30, 2021

JOB ALERT! Senior Software Engineer, Observability

Senior Software Engineer, Observability

New York, NY

Job Description

About the Team

The Delivery Engineering team is an essential part of The New York Times’ engineering organization. Its responsibilities are profoundly technical and include system cloud architecture, developer tooling, observability, and development process, to name a few.

You will be a part of the Observability Platform Team within Delivery Engineering responsible for standardizing monitoring and observability practices across engineering teams at the New York Times to provide valuable insights to drive decisions for a resilient and positive customer experience.

Role Description

As a senior engineer of the Observability Platform team, you will be responsible for building capabilities like secure provisioning of tools, reliable data collection, correlation and visualization to help derive valuable insights to improve the visibility and reliability of The New York Times products.

These observability systems include logs, metrics, tracing and profiling data management, open source standards (OpenTelemtry, OpenMetrics, etc), and data correlation and visualization related techniques.You will report to the Senior Manager of Observability Platforms and contribute to these capabilities, as well as evaluate the current observability practices and tooling, and evolve them to be more efficient.

Responsibilities

Improve NYT’s observability landscape, by allowing easy access to metrics, logs, tracing and profiling with a reliable set of tools.
Determine and promote the right observability patterns and instrumentation/correlation standards that suits the needs of NYT applications.
Build high quality reliable insights that provide visibility and standards for key indicators to understand the health of most critical systems.
Work within multiple areas of focus (e.g. reliability, edge platform, cloud runtime, secrets management, deployment pipeline, containerization) and research, strategize, and propose solutions that meet requirements, reduces friction for product engineers, and consolidates existing solutions.
Promote Observability and SRE best practices through Architecture Reviews, blameless postmortems, technical talks, and tooling.
Document best practices, prescribed solutions, and production support playbooks.
Production support by participating in on-call rotations for the systems we build, and providing expertise to users of our solutions. Contribute to our mission of reaching 10+ million paid subscribers by 2025

Required Experience

A minimum of 8+ years of backend software development experience with a minimum of 5+ years experience building high quality secure provisioning automations, seamless self service onboarding practices and building observability shared platforms for an entire organization with a focus on reliability and operation visibility.
A high degree of passion and interest in Observability and Reliability practices and experience working with application teams to deliver their apps and services with high levels of observability standards.
A good grasp of multi-tier application architecture and concepts of reliable system engineering, open source observability standards and practices(Ex. OpenTelemtry, OpenMetrics).
Solid programming and troubleshooting skills. You may be called upon to help with systems written in Go, Python, Java, Scala, PHP, and Ruby amongst many other programming languages. We don’t expect you to know everything but be able to learn and adapt quickly to these needs.
An understanding of cloud-based design and deployments on Amazon Web Services and/or Google Cloud Platform.
A passion towards automation and proficiency with Cloud-native App Development, Ex. 12-factor apps, container orchestration technologies, and immutable cloud provisioning.
A bias towards helping people. Many teams will rely upon you for help to build their systems.
A high degree of empathy for existing solutions and issues. The New York Times is modern in many ways but is also prone to having issues that a 165 year old organization may have – including legacy systems. There are many things to fix.

Nice To Have Experience

Site Reliability Engineering(SRE) practices and blameless incident management for large, system-wide issues
Configure and deploy systems and software in production
Infrastructure as Code (IaC) practices specifically using open source projects like Terraform, Vault, Consul, OPA

Some of the tech we use:

Sumologic, Datadog, OpenTelemtry, Go, GCP, AWS, Docker, Kubernetes, Drone, Terraform, Vault, Consul, Fastly

For more info and to apply for this position, click here.

JOB ALERT! Senior Software Engineer, Observability

Making the case for cloud and containers to non-engineers in your organization

IMPACT 2022 – Physical data center capacity planning – Performance management at scale

Upcoming Events

CMG Toronto Regional: Mainframe & AI

JOB ALERT! Senior Software Engineer, Observability

Making the case for cloud and containers to non-engineers in your organization

IMPACT 2022 – Physical data center capacity planning – Performance management at scale

Making the case for cloud and containers to non-engineers in your organization

IMPACT 2022 – Physical data center capacity planning – Performance management at scale

Senior Software Engineer, Observability

New York, NY

Job Description

Related posts

CMG Be Curious Series | Featuring Art Gutowski, President of SHARE

CMG Honors Code Magus with the 2025 IMPACT Innovation Award

Where Innovation Meets Collaboration – CMG Atlanta 2025

Upcoming Events

CMG Toronto Regional: Mainframe & AI