We are looking for a SRE / DevOps Engineer to build and scale enterprise-grade cloud platforms. This is a balanced role (70% engineering, 30% operations) focused on:
Requirements
Good to Have
Benefits
- Building reliable, scalable infrastructure
- Driving automation and platform engineering
- Enabling system resilience through testing and chaos engineering
- You will not just operate systems—you will design and build the platform itself.
- Build and operate Tier-0 / Tier-1 systems where reliability is critical
- Design infrastructure that scales predictably without runaway costs
- Develop automation frameworks for load testing, validation, and resilience
- Enable secure and compliant environments (FedRAMP-aligned systems)
- Contribute to internal platforms and developer tooling (IDP mindset)
- Modernize and refactor legacy systems without disrupting production
- Design and implement scalable AWS infrastructure for production systems
- Build Infrastructure-as-Code modules for consistent, reproducible environments
- Develop and maintain CI/CD pipelines for deployment, testing, and validation
- Load testing and system readiness
- Snapshot validation and recovery checks
- Smoke testing and health verification
- Monitoring, alerting, and observability systems
- Incident prevention (not just response)
- Collaborate across teams to build shared platform capabilities
- Contribute to architecture decisions and platform evolution
Requirements
- 2-4 years of experience in DevOps / SRE / Cloud Engineering roles
- Strong hands-on experience in: AWS production environments (enterprise scale preferred), Infrastructure-as-Code (Terraform or CloudFormation), CI/CD pipelines (Jenkins, GitHub Actions)
- Strong coding/scripting skills in: Python or Bash (must)
- Proven experience with: Designing and operating scalable, reliable systems. Debugging production issues and improving system stability
Automation of infrastructure and workflows - Solid understanding of: Distributed systems and cloud architecture.Performance, scalability, and cost optimization
- AWS (RDS, Lambda, EventBridge, ECS/Kubernetes, FIS*)
- Terraform / CloudFormation (IaC)
- CI/CD: Jenkins, GitHub Actions
- Observability: CloudWatch, Prometheus, Grafana
- Scripting/Development: Python, Bash (Node.js a plus)
Good to Have
- Experience with chaos engineering (AWS FIS, Gremlin, etc.)
- Exposure to FedRAMP or regulated environments
- Experience with Kubernetes or ECS
- Background in database operations and disaster recovery
- Experience transitioning from backend engineering to SRE
Benefits
- Opportunity to work with a dynamic and fast-paced IT organization.
- Make a real impact on the company's success by shaping a positive and engaging work culture.
- Work with a talented and collaborative team.
- Be part of a company that is passionate about making a difference through technology.