SRE
Fort Worth, TX 76120 United States
Job Description
What you'll do
- Make monitoring and alerting notify on symptoms and not on outages.
- Document so your findings turn into repeatable actions–and then into automation.
- Improve the deployment process, change mgmt., release mgmt. processes to make it efficient and streamlined.
- Debug production issues across services and levels of the stack.
- Proposes ideas and solutions within the product team to improve resiliency, availability, security.
- Plan and execute configuration change operations both at the application and the infrastructure level.
- Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation
- Complete Root Cause Analysis (RCA) investigations
- Improving DevSecOps practices and accelerating delivery and take a lead role in troubleshooting technical issues
- Assist in providing inputs to develop strategic technology roadmaps
- Respond to incidents and provide support for customer incidents.
Minimum Qualifications- Education & Prior Job Experience:
- Bachelor’s degree in Computer Engineering, Computer Science, Electrical Engineering or related field, and 8+ years of experience
- General knowledge of the following areas with deep understanding of below highlighted area...
- Implement "Infrastructure as Code" using Terraform in Azure and on-prem infrastructure resources
- Implement Github, GitAction CI/CD and ADO cloud for automation
- Load balancing the application including Proxies and CDN (automate)
- Implementing monitoring, observability in AKS and Azure cloud, Kubernetes
- Monitoring and Metrics in Dynatrace, Prometheus, Grafana and integrations with Moogsoft/xMatters
- Open source Logging infrastructure
- Able to script Automated performance testing scenarios for APIs and Web front ends and embed in CI/CD pipelines dashboarding/reporting query languages
- Master’s degree in Computer Engineering, Computer Science, Electrical Engineering or related field, and 3 years of experience
- Airline Industry experience helpful
Proficiency and demonstrated experience in the following technologies:
- Worked in an environment with Node JS and GQL with for 2 years of experience
- Hands-on experience with Infrastructure as a Service (IaaS), Platform as a Service (PaaS) tools and platforms, and containers and container orchestration platforms (aka Docker & Kubernetes)
- Expertise in one or more cloud native relational databases such as MySql, PostgreSql and NoSQL databases such as Cassandra and MongoDB highly desired
- Strong technical knowledge and skills that are broad and deep, covering various hardware, software, and technology platforms
- Nodejs, Typescript, JavaScript
- Experience with Mongo Schema Design and Mongo Aggregation Framework
- Develop, implement, and maintain applications and systems that integrate MongoDB
- Web Services: Graph QL, REST/SOAP (JSON/WSDL/XML)
- DB Admin/SQL Server
- Terraform
- SysAdmin
- Troubleshooting Network Issues
- VM Management
- Dynatrace
- Mezmo
- Security Vulnerabilities (remediation/compliance)
Share This Job:
Related Jobs:
About Fort Worth, TX
Are you sure you want to apply for this job?
Please take a moment to verify your personal information and resume are up-to-date before you apply.