Site Reliability Developer
5 zile în urmă
Site Reliability Developer-19000QLE

Preferred Qualifications

Oracle's Demonstration Services

Our mission is to continuously deploy, integrate, and manage Oracle'sproducts to create rich content to demonstrate Oracle's Cloud platform topotential and current Oracle customers. In order to support thismission we deliver applications and tooling, what we call our Demo CloudPlatform, to Oracle's Sales, Consulting and strategic partners to manage theirdemonstration environments. Our applications and tooling support billions,yes billions, of dollars in sales annually. This is where you comein. Reliability of the applications and tools that support thedemonstration environments are critical to Oracle's success as a leading cloudprovider.

Site Reliability Engineering Team

The SRE team is responsible for the overall health, performance andreliability of our Demo Cloud Platform. Demo Services (DS) leveragesOracle Cloud services as well as open source components to deliver a fullfeatured, self-service and extensible demo platform for Oracle Sales,Consulting and strategic partners. In order for Demo Services to meet thegoals of the Business, the SRE team works alongside the DS Development and DSArchitecture teams to rapidly, deploy new functionality for the platformthrough CI/CD methodologies. We are looking for candidates that have astrong passion for automation and are enthusiastic to pick up new technologies,product stacks and industry current solutions. 

As part of the SRE team you...

  • Willdevelop, enhance and maintain automation solutions to create infrastructure ascode in Oracle Cloud aimed to improve and support productivity of the Demoecosystem
  • Possess a contagious sense of ownership and are capable of using all available tools to solve any issues you encounter
  • Extendand improve ChatOps automation to speed-up deployment, triaging, systemtracking and metrics
  • Actas the primary point of contact for Production incidents, perform detailed rootcause analysis, identify and resolve underlying problem patterns, whiledeveloping automated and self-healing solutions
  • Monitor,detect and troubleshoot issues during code deployments in Production. Analyzereal-time data to determine impact and advise development on release GO/NO-GO
  • Participatein the development of tools and processes that leverage observability bestpractices to proactively identify and resolve issues before they becomeincidents
  • Work hand-in-hand with the Developmentteam and participate in team rotation
  • Your Skills…

  • You have a Bachelor’s Degree in Computer Science, Software Engineering, Information Systems or equivalent and 5+ years of relevant work experience.
  • You have worked in an SRE/DevOps role and managed highly complex production environments at scale
  • You are fluent in writing code, with 3+ years of experience in developing in languages like JavaScript, NodeJS, Java, Python & Perl
  • You have developed tools and provided scalable, maintainable and autonomized solutions to support mission-critical applications
  • You have practical experience with continuous integration and continuous delivery methodologies
  • You have hands-on experience with orchestration and configuration management tools such as Ansible, Terraform, Puppet or others
  • You have deep understanding of monitoring and observability best practices across distributed systems
  • You have a solid foundation on network concepts - DNS, load balancing, VCN, firewall, proxy server
  • You are intimately familiar with Linux and its administration life cycle - deployment, upgrading, compiling, and debugging
  • You have a solid foundation in database administration and are comfortable with the complete database Life Cycle, including provisioning, backup & recovery, cloning, performance tuning, maintenance and troubleshooting
  • You are able and willing to work in on-call rotation that will include weekend coverage
  • Your Bonus Skills...

  • You have a Master’s Degree in Computer Science or related studies
  • You have experience in working with major cloud platform(s): Oracle Cloud, Microsoft Azure, Google Cloud Platform or AWS - any certification(s) a plus
  • You have experience with Container and Container Management technologies: Docker, Kubernetes
  • You are adept with SQL, PL/SQL and query performance tuning
  • You have worked with monitoring solutions such as Prometheus, Grafana, Nagios, Oracle Enterprise manager/Management Cloud or similar software

  • Detailed Description and Job Requirements

    Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.

    Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.

    A BS or MS in Computer Science, or equivalent. Identifies and implements complex solutions to knowledge of server hardware and software configuration, networking, standard internet services, scripting languages, cloud computing patterns, technology security and compliance. Experience running large scale customer facing web services. Identifies and implements complex solutions to understanding of load balancing technologies and experience with development in programming languages, databases and big data stores, and container technologies. Work involves defining and documenting technical architecture of complex and highly scalable products. A minimum of 8+ years experience of running large scale customer facing web services.

    As part of Oracle's employment process candidates will be required to successfully complete a pre-employment screening process. This will involve identity and employment verification, professional references, education verification and professional qualifications and memberships (if applicable).


    :Product Development
    Adaugați la favorite
    Eliminați de la favorite
    Email-ul meu
    Făcând clic pe "Continuă", acord nevoo consimțământ de a procesa datele mele și de a-mi trimite alerte prin e-mail, așa cum este detaliat în policyApplicația de confidențialitate a lui neuvoo. Pot să-mi retrag consimțământul sau să mă dezabonez în orice moment.