top of page
  • Writer's pictureHijmen Fokker

How to manage GitOps environments at scale (a technical guide)

Many organizations are adopting GitOps for Kubernetes, as it is becoming a standard for Kubernetes deployments. This article will focus on solving the technical GitOps environment problem and provide solutions that you can use to setup your GitOps deployment architecture. If you are unfamiliar with GitOps, I suggest reading this article.


When implementing GitOps in mid- to enterprise-sized companies, some challenges arise. One of those challenges is the GitOps environment problem. How do you manage different environments in GitOps without compromising on important factors like quality, stability, security, compliance and automation?

The GitOps environment challenge

Depending on your DevOps maturity, organization type or compliance requirements, your most common deployments on Kubernetes may be (partly) automated or manual, small or big, simple or complex. But generally, when it comes to Kubernetes, there are two different kinds of GitOps deployments:


1. Image version change (simple)

In modern DevOps teams, these simple deployments happen most often because of automation. When using a yaml templating like Helm or Kustomize (please use one of them!), this is usually an environment-specific configuration file (custom folder or values.yaml file) where only an image tag is updated. For each environment, there is a separate file, so updating these for 1 environment at a time (development --> staging --> production) is fairly easy to do.

A simple change - version increment for production

2. Manifest change (less simple)

When a change needs to be made to the Kubernetes yaml configuration (e.g. change configmap structure, a volume mapping or a service type), these resources are templated (reused for environments). So we need some mechanism in place if we wish to test our changes in a staging environment before rolling out in production. Also, for compliance reasons (especially on enterprise-level), we need to be sure that a change to staging does not affect production, even if it is an honest developer-mistake. For this, we need a good separation of environments. This article will zoom in on this challenge and propose possible solutions.

A less simple change - manifest update


Some GitOps Considerations

Before we dive into possible solutions, it is important to set some ground rules and best-practices. Of course, it depends on your company standards and way of working, but in general these rules apply to most DevOps teams implementing GitOps.


1. All GitOps environments should use the mainline (master branch)

One proposed pattern for the GitOps environment challenge (especially popular during the early GitOps days) is the branch-per-environment setup. You create a branch for each environment and promote changes using Pull Requests. For some organization types this solution works, but there are quite some downsides to this approach, especially when implementing automation. A good article on the downsides for this (anti-)pattern can be found here.


For modern DevOps teams using Continuous Deployment we need a better solution, especially if we want a good Continuous Deployment setup. Branches are not meant to be permanent. We should use the mainline (master / main branch) for all permanent environments and separate environment logic using other mechanisms.


2. Code duplication should be avoided as much as possible

By minimizing code duplication we make sure that deployments are predictable. Manually changing duplicated code is time consuming and a recipe for disaster, especially when environment configurations diverge or when environment count increases. Code duplication can be avoided using templating tools like Helm or Kustomize


3. Access Control and compliance rules should be evaluated

Depending on the type of organization, the roles and access control for teams, environments or company departments may be different. Some companies have strict compliance rules for production environments. For example, some compliance rules may be:

  • Production should be completely isolated

  • A limited amount of users (people or apps) have edit-permissions for production

  • All source code (including Kubernetes manifests) should be reviewed by 2 peers before deploying to production

Access Control in GitOps is done via Git. So you should consider splitting environments and teams to multiple GIT repositories, just for the sake of RBAC.


4. Keep your manifests close to the source code

I have seen many teams placing Kubernetes manifests in separate repositories. Sometimes only senior developers, DevOps Engineers, or (even worse) other teams know about these Kubernetes manifests, making developers unaccustomed to using Kubernetes manifests. Keeping manifests close to your source-code will motivate developers to use them in their local development environments and edit them, which will increase your DevOps maturity. Also, placing them with your source code will be easier to enforce test automation on your Kubernetes manifests. You can simply make this part of your CI process, testing every single yaml change on each Pull Request to your mainline.


5. Deployments should be automated

Even if your organization is not ready for Continuous Deployment, you should focus on automating as much as possible. It will increase deployment frequency, reliability, lead time and eventually the quality of your applications.


6. Auto-sync should be enabled for GitOps deployments

The power of GitOps comes from Git as your single source of truth. This is why we should configure our GitOps tool of choice (e.g. ArgoCD or Flux) to auto-sync. If we manually click the sync-button or if we let our pipeline run the sync using an API-call, we lose the GitOps advantage of no configuration drift, and cannot depend on the single source of truth (GIT) anymore. This is why a deployment should be immediately triggered after a git commit, nothing else. A more detailed explanation of why auto-sync should be used can be found here.


Possible solutions

Taking into consideration the GitOps best-practices and ground-rules, we can design a solution framework that fits most use-cases. For simplicity, I will use Helm as an example for all solutions as this is the most popular templating method for deploying Kubernetes manifests.


Where to put Kubernetes manifests

As mentioned before, I would recommend to build the Kubernetes manifests in the Source Code repository. However, I would not recommend using this repository as the GitOps repository linked to your ArgoCD or Flux for your stable environments (staging, production). Doing that has some downsides:

  • You cannot update manifests and source-code at the same time, because your Docker build needs time to complete. GitOps will immediately sync the changes (with auto-sync), so if the image doesn't exist, this will cause issues.

  • With GitOps , your commit history is your deployment history. Using your source code repository for deployments will make it harder to see 'what happened when' from a deployment perspective.

  • Deploying your change to multiple environments will result in multiple commits in your mainline - in different folders. In the meantime, someone might have changed something else that conflicts with your deployment.

  • Rollbacks become more complex, because your mainline is polluted with other non-related source-code updates.

  • Observability is another GitOps challenge, which will become a lot harder if you don't centralize your deployments to a limited amount of repositories.

  • If you need clear separation of environments (risk, compliance, security or other reasons), you will need some kind of RBAC on your production environment. With Git, this is only possible on branch or repository level. Accidentally (manually) changing the wrong folder might happen, which can be unacceptable at enterprise-level.

This is why I recommend building your manifests to a package (Helm Chart), and using this chart in a separate GitOps repository. Environment-specific configuration is something that can be stored in the GitOps repository.


Your pipeline will include something like this:

  1. Build your image and publish it to a Container Registry

  2. Build your Helm Chart and publish it to a Helm Chart Repository

Source code repository folder structure


How many repositories should I have for my environments

There are a lot of ways to structure your GitOps repositories, but this entirely depends on your organization. The easiest way to get started is to have one repository for all GitOps environments. I would recommend to use as few repositories as possible. This makes it easier to implement, monitor and maintain. Also, it helps with maintaining a uniform standard across your organization because all code is grouped together.


However, having 1 single GitOps repository for all deployments is almost impossible if you are serious about the stability for production environments. The more GitOps repositories you create, the more flexibility you have to implement the following requirements:

  1. Do I implement RBAC for certain environments (production)?

  2. Do I need a manual (Pull Request) approval before going to production?

  3. Do I allow team A to modify deployments for team B?

  4. How many environments do I have? Do I want to group all development environments together in to 1 GitOps repository?

  5. Should my staging environment be structured the same as production? Then I might create 3 GitOps repositories (development, staging, production)




The image above shows 3 examples on how you can structure your GitOps repositories. I generally find that the optimal structure is somewhere in the middle (1 repo per environment), having a separate repository for each application will quickly get out of hand at scale, but it gives you most flexibility There are many variations in between these solutions, so choose carefully based on your organization structure. Some variations might include:

  • 1 repository per team per environment

  • 1 repository per technology stack (group of teams) per environment

  • 1 repository for all non-production environments, 1 repository for production


Helm dependency vs templates

Helm charts in GitOps repositories can be defined locally, with a templates folder including all the Kubernetes manifests or without a templates folder using a dependency. This dependency points to an external Helm Chart Repository where your chart is located. The pros and cons of each approach are explained below.


1. Without a dependency using the templates folder

When resources are defined in the templates folder of the GitOps repository, the helm Chart will has to be copied from the App repository to the GitOps repository. This method is easier to use, but less scalable and less reliable in terms of versioning and stability.


Your resources are structured as follows:

  • A local Helm chart is used without external dependencies in the Chart.yaml

  • The values.yaml for this particular environment contains all the environment-specific configuration and includes an image tag that points to a tag in the Container Registry

  • The resources are copied from the app repository to each environment's template folder


Pros

  • Easy to debug for developers without Helm experience. There is no direct need to download the Helm cli for debugging and no external dependencies need to be downloaded

  • Easier observability. Everything is directly visible in git, making it easier to see the single source of truth without the Helm CLI. Also, it is easier to see the deployment differences in history using git diff.

  • Easy to hot-fix a production issue by manually modifying a generated manifest with a simple git commit

Cons

  • Lots of duplicated manifests, especially when you have many environments, this can become a chaos quickly. However, the resources are generated, so it may be acceptable.

  • Generated files should not be modified manually. This needs to be enforced using RBAC policies or Way of Work, otherwise it will result in deployment issues. Manually hot-fixing a generated file is very easy, but has consequences if not treated carefully.

  • No Chart versioning is done, so to rollback to a particular version is harder, because it is a git revert, not just a version update. Also, there is no relation to the Chart Version and the Docker image version, making the deployment less predictable.


2. Using an external Chart in a Helm Chart repository

With this setup, the only configuration present in the GitOps repository is the environment-specific configuration and dependencies, based on versions. This solution scales better and has better versioning capabilities, but requires some more Helm expertise from developers.


Your resources are structured as follows:

  • For this particular environment, you define a Chart.yaml with an external dependency to a Chart Version. The Helm chart is located in a remote Helm Chart repository

  • The values.yaml for this particular environment contains all the environment-specific configuration and includes an image tag that points to a tag in the Container Registry


Pros

  • No duplication. Kubernetes manifests are stored centrally in a Helm Chart

  • Better versioning. The Helm Chart has a clear version, making the Kubernetes resources part of your release pipeline.

    • The Docker Image together with the Kubernetes resources can be versioned as one, making deployments to production more predictable.

    • It is easier to quickly test a particular version on a specific environment without copying many resources.

  • Standardization. With Helm dependencies, you can use more layers of Helm dependencies. This has advantages if you are a bigger organization where central teams develop Helm Charts for your whole organization. For example, you can build central Charts for each technology stack / department.

  • Centralization. No matter how many GitOps repositories you have, you will always have 1 single place for all your Kubernetes manifests that are running in Staging / Production. This has several advantages:

    • Easy to execute central vulnerability scans on Kubernetes manifests using the central Helm Chart Repository. These vulnerabilities can be better pinpointed to particular versions

    • Better Observability to the deployed versions and their respective Kubernetes resources.

Cons

  • A quick hot-fix-forward in production is harder. It is only possible to restore an older version. In most scenario's, restoring an older version is the recommended rollback, but in more complex scenario's you need to run a pipeline or generate some custom code.

  • You need the Helm CLI when viewing the GitOps repository for debugging and observability purposes. In most scenario's this is as simple as running a 'helm template' command, but it adds a layer of complexity.



Conclusion

As with all things, structuring your GitOps repositories and environments depends on your use case. There is no one-size-fits-all, but with a good GitOps Bootstrap Platform it is really easy to change your structure later. Here at Pionative, we look at the structure of the company, teams, departments, DevOps Maturity, future goals and overall architecture while we implement GitOps solutions. If you would like to save the hassle and make sure you implement a good GitOps solution that has been battle-tested in production, has had many development iterations, plan an introduction meeting, we'll talk about your specific use case.

Subscribe to our mailing list for more content

Thanks for subscribing!

bottom of page