SME – ELK Stack & Monitoring Solutions
Senior ELK Stack Engineer (SME)
Long-term contract - 2+ years 100% remote in the continental US
Job Description:
Our client's Enterprise Observability team is looking for a senior-level ELK Stack Subject Matter Expert (SME). The team is responsible for enterprise infrastructure, application, and network observability, with a primary focus on log management and metrics. The selected candidate will be joining a team of skilled engineers with a broad background in enterprise observability.
Your Impact:
As an ELK Stack Engineer, this role is focused on maintaining the reliability, scalability, and availability of our enterprise Elastic Stack solution. This platform is used for log management, metrics, and observability. The role heavily utilizes automation with tools like Terraform and Ansible and requires the candidate to maintain performance KPIs and define SLOs for the platform.
Responsibilities:
- Maintain and deploy monitoring and alerting systems within the ELK Stack.
- Design, configure, and maintain our large-scale log aggregation solution using Elasticsearch and Logstash.
- Set up and manage data ingestion pipelines and transformations using tools like Filebeat, Logstash, and/or Fluentd/Fluentbit.
- Embrace the mindset of "automate any task" to improve efficiency.
- Build and maintain robust monitoring systems using Elasticsearch, Kibana, and Beats to proactively detect potential issues and trigger timely alerts.
- Maintain associated documentation as it applies to our audit and certification requirements.
- Participate in troubleshooting, capacity planning, and performance analysis activities related to the ELK Stack.
- Research new observability requirements and, in many cases, write code to implement them.
- Possess strong expertise in setting up monitoring policies, rules, and templates, and writing scripts to accomplish observability requirements.
What you need to succeed:
- BS/MS in CS/Engineering or equivalent, OR 5+ years of experience.
- 4+ years of experience working directly with the Elastic Stack as either an Admin, SME, or Architect.
- Hands-on experience with designing data pipelines using Filebeat, Logstash, and/or Fluentd/Fluentbit.
- Expert-level knowledge of the Elastic Stack (on-prem and cloud), including best practices related to performance, security, and component setup (Elasticsearch, Logstash, Kibana, Beats).
- Fluent in writing scripts in languages like Python and (Bash or PowerShell) to automate tasks.
- Experience in Terraform and Ansible, including syntax, best practices, and managing complex configurations to build and manage infrastructure and applications.
- Very good working knowledge of Linux OS.
- Highly self-motivated and directed.
- Good analytical and problem-solving/troubleshooting abilities.