Do you want to take the plunge from an Ops role and into Site Reliability? We need a Junior Site Reliability Engineer to join the software team for one of our core partner products.
Since you will be working on a distributed SaaS application built on cutting edge tech (fully serverless, CI / CD, infrastructure as code), you are ideally excited for the opportunity to greatly influence the design, deployment, operating and maintenance of our development and production systems.
As the Junior SRE, you will be working as a bridge between the Production Team, the nascent Service Team, and our startup partner on one of Code11’s most complex technical projects.
You likely have a mixture of ops and development skills, with a focus on providing a great customer experience to our end users.
Our app is in active development, so practicing sustainable incident response with our partners is just as important as building new features and services.
You will spend most of your time owning the incident management pipeline, using sound site reliability principles and observability to fix problems.
You will also get to develop tools & automations for the team to use, and build a resilient system for our team and partner.
SLA Management :
Troubleshoot priority incidents, fixing bugs, enhancing the platform performance and implementing feature requests where necessary
Working through (developer) service issues and communicating in a professional and timely manner
Facilitate blameless post-mortems and ensure permanent closure of incidents
Oversee system infrastructure, including implementing and managing monitoring, fixing, testing and maintaining services, application software and system management tools
Fight the logging, monitoring and alerting spam by careful gardening and pruning
Complete coding assignments related to applications, testing, systems reliability, monitoring, alerting, build and deployments, and analytics.
Develop and deliver configuration and deployment automation
Work with the team, championing platform improvements and DevOps principles
Important! As we are still defining this role, it might require that you work afternoon shifts and weekend shifts. We need you to be ok with that.
You Are Someone Who Has
2+ years of experience supporting complex applications (SaaS preferred) in the Cloud
2+ years of experience in AWS-based serverless technologies and AWS CDK
3+ years experience with IT Automation tools, with a preference for cloud provider's native tools.
3+ years of experience in container orchestration frameworks and container management in : Kubernetes, Docker Swarm, or Amazon EKS
2+ years experience with non relational or NoSQL databases (e. g. MongoDB, DynamoDB, ArangoDB etc.)
Proficiency with infrastructure-as-code-tools, in a CI / CD environment
2+ years experience with data aggregation, alerting, and reporting and supporting technologies
2+ years experience in an on-call rotation or as part of an IT Service Management organisation
1+ years experience developing and managing software reliability metrics (e. g. SLO / SLI / SLA), tracking the most important ones in curated SRE Dashboards
1+ years experience with ticketing system and source control (e. g. JIRA / Git repos
code11 is a Norwegian-Romanian company built brick by brick with Agile core values and cutting edge development technologies.
We power start-ups and enterprises, working in innovation programs as their tech partner. We believe that there is always a better way to do it, whether it’s methodology, technology or digital transformation.
Competitive compensation (depending on your experience level)
First class private medical insurance
300 RON / month flexible benefit budget for meal tickets, transportation, sports etc.
Discounts via our benefit provider (benefits. ro)
Work from home for the duration of the pandemic, then flexible approach to WFH
Great hardware (say goodbye to 3 minutes for a simple npm install)
Regular team events (online as well)