Unfortunately, secrets can live anywhere. During the development process, secrets are plugged into code to ensure it works as intended. Left unreviewed, they live in code until they are discovered by another developer, or worse, uncovered via public discovery. Because we don’t truly know where all secrets live, discovery of secrets is a complicated problem. This post will outline common methods for discovery of secrets that you can use to build your initial secrets inventory.
One of the best ways of discovering secrets is knowing where they typically live. Common places include:
- Configuration files (JSON, YAML, Properties, TOML, etc.)
- Databases (in plain text)
- Source code (files that drive connection to a service)
- Source control (in change history)
This is not an exhaustive list, but it tends to be the overwhelming majority. Take a few minutes and identify these locations in your application and see if you can find a plain text secret living in any of them.
Would you record the username and password for your personal bank account in an unencrypted code configuration file? Effectively, storing a secret this way is the same thing, or worse. Secrets control access to other systems, and can control encryption and decryption of all data stored within the system. It’s no secret that secrets control the most valuable information in our systems. When designing systems, we go out of our way to choose a fitting language, architecture, and deployment platform. We spend days, weeks, or even months opining on the best approach. We do this only to chuck usernames and passwords into a JSON file so that our program can execute properly. All the effort of a carefully designed system weakened by a lack of follow through on designing how the application will handle it’s sensitive configuration details.
Configuration files are the most common place you will find secrets, and lots of them. The good news is that configuration files are typically stored in standard places idiomatic to the chosen technology. Because of this the discovery process should be relatively painless. The caveat here is if you use a deployment automation tool that writes out environment specific configuration files.
Start by locating your configuration files and making a list of all files that have secrets. Even though secrets discovery in configuration files is straight forward, it can be time consuming across multiple projects. Start with a single project and see how the process works for you.
From time to time secrets are hard-coded. This may be done for expediency, it may be done because the source is a one-off script with no current configuration file, or it may be for a myriad other reasons. Regardless the reason, these secrets still need to be discovered and handled properly. We will discuss helpful discovery tools shortly, but there are some common places to look:
- Database connection code
- Third party service connection code (including internal third parties)
- Dependency resolution code (repository access secrets)
- All of those “scripts” that manage various parts of the system
While none of this is perfect, knowing where to look first can greatly reduce the amount of effort required for discovery. Combining common locations with careful grepping and some automated tooling, and you should be able to build a source code secret inventory with confidence.
Start by looking at the common locations listed above. Make a list of all files that have secrets. Remember to limit yourself to a single project for now to make sure the process works well for you.
Once you commit a secret to version control, a copy of it lives on for the life of the log that contains it. In the case of git, the reflog will retain a copy until the repository is destroyed or the commit is manually rewritten. While rewriting git history is strongly discouraged, it may be the most pragmatic way to retain important history of a long lived project while destroying secrets that are hard to rotate. In the case of secrets written to git, the best course of action is always to consider that secret compromised and rotate.
If you absolutely must rewrite history, there are tools like BFG that will help clean your repository without exposing you to the full breadth of git’s command line options that could create some serious unintended side effects. Additionally, BFG seems to work quite a bit faster on larger projects with lots of history.
The following projects are useful for secrets discovery. I have had the most success with trufflehog in the past, but all of these options should be reviewed to see which is best suited for your environment.
- yar https://github.com/Furduhlutur/yar
- trufflehog https://github.com/dxa4481/truffleHog
- detect-secrets https://github.com/Yelp/detect-secrets
- git-secrets https://github.com/awslabs/git-secrets
- repo-supervisor https://github.com/awslabs/git-secrets
- git-hound https://github.com/ezekg/git-hound
- gitrob https://github.com/michenriksen/gitrob