Python (advanced)
Kubernetes (advanced)
Linux (master)
Who we are looking for
OpenX, a leading provider of digital and mobile advertising technology, seeks a Senior Cloud SRE (Site Reliability Engineer).
OpenX serves 100 billion ad requests per day and operates worldwide. We are currently in the process of migrating our entire infrastructure footprint into Google Cloud Platform (GCP).
You will be primarily responsible for the performance, uptime, and growth of various OpenX systems and services on GCP. Experience operating software at large web scale is desirable, though we are willing to train people with the right skills and attitude.
Similar to Google’s approach to SRE, you should adhere to the engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.
Much of your software development focuses on optimizing cloud-native systems, orchestrating cloud infrastructure and eliminating manual work through automation.
The ideal candidate has extensive experience with software engineering practices, building and supporting on-demand, burstable, virtualized and cloud-native environments on major public cloud providers (GCP preferred).
Excellent communication skills are required in order to successfully interact with globally distributed OpenX teams operating in a 24x7 manner.
Developing and supporting our infrastructure presents many interesting technical challenges. We especially desire candidates with a passion for open-source software and an interest in the latest technology trends.
What we offer
Working with the newest technologies such as Cloud Computing (GCP)
Experienced Team (50% of the company are senior developers!)
Challenges at work that are difficult to find anywhere else!
Solving important problems in a scale
Joining a company that is growing and scaling
Flexible working hours & remote work option
Key responsibilities
Design, write and deliver software to implement and support large web-scale, highly-performant, highly-available infrastructure on GCP
Demonstrate and promote best practices for teams deploying and supporting our infrastructure on GCP
Monitor infrastructure, respond to incidents, correct and improve systems to prevent incidents, and plan capacity
Support system deployments and product releasesTune large-scale clusters for optimal performance and efficiency
Participate in on-call rotation
Work closely with engineering, project management, and operational peers to develop innovative technical tools and solutions
Identify tactical issues and react to emerging areas of concern
Adhere to the DevOps philosophy by evangelizing communication, collaboration, and integration with software development teams
Think long-term and be unsatisfied with band-aids
Identify unnecessary complexity and remove it
Required Qualifications
Extensive experience maintaining a large production infrastructure hosted on GCP, AWS, or equivalent public cloud providers
Extensive understanding how to manage public cloud services and tasks, such as : VPC; load balancing; relational and non-relational datastores (e.
g., Google Cloud SQL, Memorystore, AWS RDS); storage (e.g., GCS, AWS S3); monitoring (e.g., GCP Stackdriver, AWS CloudWatch, Prometheus);
serverless computing (e.g., GCF, AWS Lambda); and auto-scaling
Solid experience with software development life cycle (SDLC) best practices, such as test-driven development (TDD), algorithms, data structures, complexity analysis, CI / CD, and software design
Solid understanding of programming languages, such as Java, Golang / Erlang, C / C++, or others
Automate tasks that are scalable, maintainable, and repeatable by utilizing APIs and practicing Infrastructure-as-Code through GitOpsExperience with managing large-scale Kubernetes clusters in a microservices and containerized environment using Docker and package management using Helm
Strong knowledge of core protocols and tech such as : TCP / IP, HTTP, DNS, load balancers, distributed file systems, relational and non-relational datastores
Automate tasks in at least one language (other than Bash), ideally Python or GolangDemonstrated experience in network and large-scale *NIX system troubleshooting and maintenance practices
Desired Characteristics
Configure and manage security policies, resource auditing, compliance policies, and access controls to resources in GCP
Analyze performance bottleneck of our platform hosted in GCP based on monitoring data
Solid experience with cloud orchestration platforms, such as TerraformSolid experience building GCP big data platforms, such as DataProc, BigQuery, Pub / Sub, and other technology
Self-starter with the ability to independently identify and act on areas of improvement
Knowledge and interest in the latest system architecture trends
Ability to rapidly learn and understand new systems
Ability to communicate effectively and write accurate, clear documentation
Our Benefits
Annual performance bonus
Tax-deductible system due to copyright protection
Private health care for you and your family (covered by OpenX)
Private life and travel insurance (Covid insurance included)
MultiKafeteria program
Training : access to the LinkedIn Learning platform, Tech workshops, English lessons
Holiday Allowance
Pension scheme (PPK from PZU)
Additional paid day off
Free parking lot
Sports activities : online yoga / stretching classes : )
Access to peer to peer recognition platform
Possible trips to California once in a while
Company events (online during the pandemic time)
Monthly work from home allowance and one-time payment when you join us to help you set up your home office
We celebrate team members' important personal milestones (vouchers, gifts)