Your Darkest Secrets in a Git Repository

As it happens in life, we all have some secrets to keep - bigger or smaller. And we do our best to ensure they are well-protected and under our control. The IT world with tightly hidden secrets of the development of applications or infrastructure is not an exception. However, reality shows that keeping our secrets from accidentally leaking to the repository is not so easy. This article will help you understand how you can protect yourself from the problem of secrets leaking into the Git repository.

What are secrets?

To better understand the problem of leaking secrets, let's start by outlining what, in general, a secret is. In short, a secret is a sensitive data we want to keep private. In software development, secrets are most often used to authenticate or authorize actions in other applications, services, or systems. Simply put: secrets are all kinds of API keys, private keys, authorization tokens, username/password combinations, etc.

Why should I care about secrets in a private Git repository?

It is fairly obvious that public repositories hosted on platforms like GitHub or GitLab should not contain any secrets because of their public nature. But why keeping secrets in a private repository is not a good idea? After all, a private repository can only be accessed by those we authorize (at least in our perfect world)...

Well, keeping secrets in a private repository is a very slippery slope; once you slip up, things will get out of hand.

What can happen?

A repository can be cloned to a machine compromised or infected by malware.
A repository can be mistakenly published to a completely different location, or a .git directory can be leaked - no joke, automated scanners bombard applications regularly looking for these kinds of files!
One takeover developer account with access to a private repository is enough to capture all the secrets there.
The repository may later become public and with it the entire commit history.

As you see, many things can go wrong with the assumption that we want to keep our secrets in a private repository.

Isn't it enough to simply catch the secret at the code review stage?

It might seem that the simplest solution to the problem of secrets leaking into repositories is to catch them at the code review stage. However, let's consider the following example:

During a code review in a pull request, other teammates will see at a glance the difference between Commit A and Commit C. If they don't manually search commit history on this branch, then commit B will be merged to develop the branch. It's very easy to overlook such a situation.

Automation to the rescue!

To minimize the risk of human mistakes, we need to base our strategy for fighting leaked secrets on automation. Before I move on to present specific solutions, let's take a look at the moment when secret detection automation should work to help us effectively:

Pre-commit phase

It's best to work through complete prevention. With the pre-commit hooks mechanism, we are able to review the code for secrets before adding a commit. Sounds great. Let's see in practice how we can use this.

In this example, we will use a framework that makes it easy to work with pre-commit hooks called pre-commit, but of course, you can use any other solution e.g., huksy. For secret detection, we will use a tool called Gitleaks. Install the tool according to the instructions and configure your pre-commit hook. Now let's try to commit the code that has the secret.

Success, the secret was discovered! Unfortunately, there is a very simple way to bypass the hooks mechanism - just add –the no-verify option to git commit. We still can't be sure that the secret won't end up in our remote repository. In fact, we can't fully guarantee that the secret won't be there, but we can act quickly to detect it.

Post-receive phase

Once the secret has reached the remote repository, we operate in the post-receive phase. Here strategies to fight secrets can be different. One option is to check for secrets in our CI build pipeline.

With the help of the TruffleHog tool, we can quickly check for a secret change within a CI build. TruffleHog contains over 700 detectors for various API keys and tokens for the most popular services such as AWS, GCP, Slack, etc. The undoubted advantage is that this tool verifies the found secret (queries the service using the secret) and does not work based on entropy, which very often returns us false-positive. It's essential to block only CI build, which contains the real secret; thanks to that, we won't worsen developer-experience.

The implementation of TruffleHog depends mainly on what CI solution you are using in your project. You can find all the necessary information on the TruffleHog repository page.

Above, we see the result of TruffleHog running in the CI pipeline. The build was not fail because the tool cannot verify the secret (here because the secret is just a dummy).

Summary

Secrets are a severe problem that will expand our attack surface. To approach this topic comprehensively, we need to place security controls at various points in checking our code. Both locally and remotely, thus sealing our safety net. Recent research from Redhunt Labs shows that for Alexa Top 1M domains list, nearly 400k secrets were found, mainly for services such as Stripe, Google Cloud, or AWS.

In mentioned research, only the application's front-end was scanned. Still, the conclusion is simple - if security controls that look for secrets were used at the development stage - a significant number of these secrets would not make it to the final version of the application. We should remember that leaking a secret opens the way for attackers to often very serious consequences, so we should take care to detect them as soon as possible.