How To Become a SRE

If you're interested in technology careers but looking for something more operationally-focused, Site Reliability Engineering may be a good fit.

Site Reliability Engineering is a job category devised by Google that quickly became duplicated (with varying degrees of accuracy) across the tech industry. The general goal of the role is 'an engineer who treats operations as a software problem.'

This lofty objective may only be truly attained at top technology companies; but with the growth of cloud technologies and platforms most companies have access to similar tools that these elite firms use.

suitability

If you're interested in working with computers and systems in general, but not being a full-fledged software engineer, SRE is worth a look.

Graduates fresh out of school with degrees in computer engineering or computer science usually have an easy time finding work. If you're coming from another background, it's usually necessary to spend time accumulating skills in other positions before advancing into SRE.

Some of the core skills that hiring managers look for;

  • Experience with system administration/*nix operating system
  • Strong software skills in at least one programming language - Python/Ruby/Go are all popular choices.
  • Familiarity with cloud platforms and their offerings (AWS/Azure/GCP)
  • Familiarity with Continuous Integration/Deployment technologies (DevOps)

A candidate that can demonstrate proficiency in these areas should have little trouble finding a SRE role.

the role

The expectations of the SRE role vary heavily, depending on where you end up. In startups and other environments that aren't as mature, SREs will likely be wearing many different hats. It's not uncommon to be doing networking, some security, as well as assisting the development team. It's not uncommon for SRE to have overlap with "infrastructure engineers."

The role is typically very operations oriented. You get to be on-call, and when something breaks, it's usually the SRE that will take responsibility for triaging the software/infrastructure problem and figuring out how to resolve or escalate it.

When not on-call, the SRE's responsibility is to garden and improve infrastructure.

This can entail:

  • Building or implementing a monitoring solution for the application and associated infrastructure
  • Creating automation to assist with routine elements of the job
  • Ensuring the continuous integration/deployment pipelines are working correctly
  • Planning for capacity increases as well as evaluating application health
  • Conducting security reviews
  • Providing guidance to the development team on cloud technologies/patterns
  • Whatever else devs don't feel like doing

the good

SRE is a highly in-demand position. Experienced SREs with a manicured LinkedIn profile will get a lot of attention from recruiters. Even more so if you happen to have a brand-name company in your work history.

SREs are typically compensated in-line with SWE - many salaries on levels.fyi for 'Software Engineer' map to identical compensation packages for SRE. However, at the higher levels of experience there is compression; SWE will start to earn more while SRE will taper off.

SRE is a relatively "easier" way in to top companies. Because of the variety of skills needed to be successful, many individuals find it easier to use SRE as the "soft entry" and then pivot within the organization into SWE/Product Management/Management/etc.

SRE doesn't take that many years of experience to launch into. Anecdotally, I know several engineers from non-traditional backgrounds that broke into big tech with less than two years of experience.

the bad

As stated, SRE eventually caps out if you're working W2 for an employer. You will learn a valuable skillset that is almost often purchased through hiring employees, not as a contracted individual.

SRE salaries don't continue to progress as aggressively as top SWEs. If you want to earn more, you have to aggressively switch between top tech companies.

Remaining on-call gets a bit tiresome once you've been doing it for years. You will end up putting in hours fixing some problem and missing time with friends and family. It's part of the package.

Because SRE ends up wearing so many different hats, one downside is they often end up well versed in a number of areas, but not a true master of any of them. Working in SRE for a long term requires a lot of time spent learning tools that may not end up being used for a long period in one's career. Because the field changes so quickly, it's hard to invest years into learning a technology that may be replaced within 4 years - take a look at all the configuration management tools that have been laid by the wayside with the advent of containerization/k8s.

getting started (with no experience)

If you're starting out, you have to either get some experience in a lower-responsibility role or have a connection that allows you to get into a job you're not really qualified for.

Starting out in a data center technician role is a good start and allows you to interact with more senior people who you can learn from. You can learn a great deal of the foundational knowledge necessary from taking ~$100 certification courses, such as this one from Google.

As you're working on gaining some knowledge you can start applying for jobs. Since it's mostly a numbers game while you're getting started, apply for jobs regardless of whether you feel qualified or not. Many of the job requirements listed for entry-level positions are wish-lists and not at all indicative of the actual work you'll be doing.

If you find that you enjoy learning the topics, setting up your own stuff, writing your own programs - good! You may end up thriving in this type of career. If you're finding it hard to learn and you're not that interested, this may be a good sign to steer away.

getting into SRE (with some experience)

Getting into SRE from a technical background usually requires a bit of experience or certification on some SRE subject domains in order to convince a hiring committee that you'll be effective in the role.

Typically the strongest indicators are:

  • Experience with cloud infrastructure platforms and related technologies - you can obtain an AWS certification fairly quickly by studying through A Cloud Guru or another learning platform.
  • Proficiency with a popular "SRE" programming language - Python/Ruby/go all remain popular. Most SRE interview loops will use Leetcode-style questions to winnow out candidates, so investing some time in practicing for these problems is worthwhile.
  • Experience with containerization - most of this can be accomplished by familiarizing yourself with Docker and k8s - again this can be accomplished through educative.io or A Cloud Guru if you don't have a real-world environment to use at your current company.

For a motivated individual, some self-directed study over the course of a couple months is sufficient to get a working understanding of most of these topics. After that, start applying to companies.

conclusion

Moving into SRE is a good choice for early career software-engineers who want to break into bigger companies, as well as mid-career IT professionals who are looking to move into a popular and growing job role.

An interest in the work is crucial for success in the role, as career progression slows down between 5-8 years. Moving between companies periodically is necessary to increase compensation as well as ensure one's skillset remains aligned with the market.