Senior Site Reliability Engineer – New Zealand or Australia

Website Zapier

We are seeking a seasoned SRE to join our team!   Working within an engineering team, you’ll improve application reliability by using a software engineering approach to operations. You’ll develop internal tools and systems for all engineering teams to use. Using site reliability principles and a robust approach to observability, you will not only fix problems but solve the issues that contributed to them when things go wrong.

This position works closely with Release Engineering and other engineering teams in our System’s Zone to develop and maintain the tools and systems that support all of Zapier engineering. This role calls upon a broad range of experience and technologies. You’ll get to interact with every engineering team in the organization. Maintaining excellent relationships and communicating effectively with those teams regularly is key to success.

Zapier is rapidly scaling and growing, and you will work directly on the applications that support over 5 million customers. When bad things happen, you’ll have the support of your team to solve contributing causes, to learn from failures, and to build a robust and resilient system for our customers.

Building new features and services is a big part of this role. We are continually developing and implementing new ways to support our teams, understanding our customers needs, and becoming experts in site reliability.

If you’re interested in taking your career to the next level at a fast-growing and profitable startup, then read on.

About You

We’re looking for an experienced engineer who is eager to use software development approaches to operations. You should have a breadth of experience in software development, operations, and be actively practicing site reliability principles. There is a lot to learn, and we’re continually improving our approaches to SRE. There are plenty of learning opportunities. We don’t expect you to know it all.

Ideally, you’ll have several years of experience in practicing infrastructure as code, including using tools like Ansible, Terraform, and using platforms like Kubernetes. Well-honed experience with the fundamentals of software development goes a long way here. Python and Go, we do it all. Generalists thrive in this role.

Writing is our primary means of communication, from pull requests, team chat, knowledge sharing, and communicating changes. Excellent writing skills are crucial to success here at Zapier. We are 100% remote and commonly work asynchronously. We even wrote a book on it.

You should feel comfortable taking a default to action. Most decisions are changeable. It’s better to deliver something real today over something maybe better later. Sharing context, goals, objectives, and in-progress work in public helps us all achieve a common goal.

Things We’ve Done Recently

  • Develop new methods for retaining task history
  • Migrating applications and services from EC2 to Kubernetes
  • Write custom Kubernetes controllers to improve resilience
  • Create deployment pipelines in ArgoCD
  • Develop autoscaling strategies to handle bursts in workloads
  • Implementing OPA to enforce policies across our Kubernetes Clusters
  • Deploying ProxySQL for pooling connections against MySQL databases

To apply for this job please visit zapier.com.