Observability Engineer
Fairfax, VA
Full Time
IT
Experienced
DevOps Observability Engineer
Location: 100% Remote (U.S. Only)
Employment Type: Long-term Contact (36 months+)
Authorization: U.S. Citizen Only
About the Role:
We are seeking a highly skilled and dedicated DevOps Observability Engineer to join our evolving team. This is a critical role focused on building, enhancing, and maintaining robust monitoring, logging, and tracing solutions as we transition our infrastructure from on-premise environments to Microsoft Azure. You will be instrumental in ensuring the performance, reliability, and health of our systems, providing deep insights that drive operational excellence and proactive problem-solving.
Key Responsibilities:
- Design and Implement Observability Solutions: Architect, implement, and manage comprehensive monitoring, logging, and tracing systems for both existing on-premise infrastructure and new Azure cloud environments.
- Azure Migration Support: Play a key role in the migration to Azure, specifically designing and deploying observability tools and practices within the Azure ecosystem.
- Tooling Expertise: Utilize and optimize tools such as Dynatrace, ELK Stack (Elasticsearch, Logstash, Kibana), and other relevant platforms to capture and visualize system metrics, logs, and traces.
- Automated Alerting & Reporting: Develop and configure automated alerts, dashboards, and reports to provide real-time insights into system health, performance bottlenecks, and potential issues.
- Performance Optimization: Analyze observability data to identify performance degradation, troubleshoot complex incidents, and recommend solutions for system optimization and stability.
- Scripting & Automation: Write and maintain automation scripts (primarily in Python) for integrating observability tools, automating data collection, and streamlining operational tasks.
- Incident Response & Root Cause Analysis: Support incident response efforts by providing critical data and analysis, facilitating rapid root cause identification and resolution.
- Collaboration & Best Practices: Collaborate closely with development, operations, and security teams to embed observability best practices throughout the software development lifecycle.
Qualifications:
- Experience: 5-7 years of progressive experience in DevOps roles.
- Dedicated Observability Experience: Minimum of 2 years of dedicated experience specifically in DevOps Observability, focusing on implementing and managing monitoring, logging, and tracing solutions.
- Cloud Proficiency: Strong hands-on experience with Microsoft Azure services, particularly those related to infrastructure, networking, and monitoring.
- Observability Tools: Expert-level proficiency with Dynatrace, ELK Stack (Elasticsearch, Logstash, Kibana).
- Scripting: Strong programming and scripting skills, particularly in Python, for automation and data manipulation.
- Problem-Solving: Excellent troubleshooting, analytical, and problem-solving abilities.
- Communication: Strong communication skills, both written and verbal, with the ability to convey complex technical information to diverse audiences.
Nice to Have:
- Experience with other monitoring tools (e.g., Prometheus, Grafana, Splunk, Datadog).
- Familiarity with containerization technologies (Docker, Kubernetes).
- Experience with Infrastructure as Code (Terraform, Azure Resource Manager templates).
- Background working with hybrid cloud environments (on-premise to cloud migration).
Apply for this position
Required*