The secrets about exposed secrets in code

Technical

June 9 2021

7 min read

Finding exposed secrets in code sounds simple, doesn’t it? Just look for field names like “password”, “token”, or “API_Key”. Maybe dig a little deeper to search for commonly-used passwords or look for randomly-generated strings of specific lengths.

Unfortunately, there’s a lot more to it than that. There is a lot of nuance and complexity to both understanding the impact of exposed secrets in code, detecting them in the first place, and not being overwhelmed with endless false-positives. Here are the most common questions that we get and we hope that they will help you better understand why secrets in code is such an important issue and what you can do about it.

Why has this become such a critical issue?

Software development has changed! Engineers no longer write code in isolation on desktops or laptops, where an attacker compromising a device could only access locally-stored files. Cloud-based development has changed the security model so developers often have expanded access to the entire application. With the rise of DevOps, the same developers (and developer identities) have the ability to make changes to production environments. A single compromised identity can now have a catastrophic impact on the security of the entire application and infrastructure.

What are some notable examples?

Uber (2016) – Attackers gained unrestricted access to Uber’s private Github repositories, found exposed secrets in the source code, and used them to access millions of records in Amazon S3 buckets.
Scotiabank (2019) – Login credentials and access keys were left exposed in a public GitHub repo.
Amazon (2020) – Credentials including AWS private keys were accidentally posted to a public GitHub repository by an AWS engineer.
Symantec – Looking at hardcoded AWS keys in mobile apps, discovered they had a much wider permissions scope and led to a significant data leakage.
GitHub – Over 100K public repositories on GitHub were found to contain access tokens.

Why do developers expose secrets in their code?

It’s easy to say that developers should be more careful and better follow best practices but the truth is that developers are under increasing pressure to deliver. Hard-coding a token or password may be a temporary hack before implementing a better solution later on … that conveniently gets forgotten about as the next priority comes along.

In addition, developers don’t always have visibility into where their code is deployed, so they don’t have an end-to-end view of the risk. Or old code can be deployed in new ways that were never anticipated by the original developer. It is also common to see stored secrets that were intended to never leave the development environment make their way into production.

Where can we find exposed secrets?

You can find secrets in many places, including:

Source code
Configuration files
Infra-as-Code
Test code
Documentation
Package management files
Scripts
Project files

Secrets of each type can be in multiple environments, from staging to production. The challenge is to identify these places automatically and quantify risk for secrets in production source code vs. secrets in test code in staging and other environments.

What types of secrets are there in code?

There are many types of “secrets” that your developers can put into your code. These can be:

User passwords. The simplest type of secret is a username and password combination that is stored in plain-text.
API keys. Application Programming Interface (API) keys can grant privileged access to key API settings.
Authentication tokens. Tokens can replace less secure usernames and passwords and are used in authentication mechanisms such as OAuth.
Private encryption keys. Private encryption keys can either be used in symmetric key encryption or can be the private half of Public Key Infrastructure (PKI) key pair.
Digital certificates, and more. Digital certificates are used to authenticate and “prove” an identity.

How can an attacker get access to these secrets?

Don’t assume that an attacker has to have access to the source code in order to access the secret. A skilled attacker may compromise a server hosting your source code (yes, even in the cloud), but an even more skilled attacker will reverse engineer a facsimile of the source code from the binary. This can be done when your B2B software is deployed on-premises at a customer or when it’s consumer-facing, such as a Windows .exe or even an iOS or Android app.

What can an attacker do with the secrets he/she finds?

The most sophisticated attacks are multi-step and a single secret can be a launch point to further command and control. If an attacker is able to find a hard-coded token, they can use it to gain whatever access that token grants. Using the right token, an attacker can impersonate a valid user or service and then use other means to escalate privileges or “jump” horizontally to other systems that use that token.

The main issue with having secrets in your code is that it short-circuits many of your defenses. For example, even if you have a SaaS product and your cloud infrastructure is secure, an attacker could use social engineering or other methods to gain access to a developer account and access the code to find and exploit the secret. And if you have 1,000 developers, all they need is access to one account!

Why is detecting exposed secrets so hard?

The main reason is that there are many types of secrets. Some data is “structured” (not in the database sense). Certificates and access tokens are generally formatted in standard ways and are easy to find, with few false positives or false negatives.

Other types of data are less structured and consist of long strings of random characters and may be encrypted or hashed. The problem is that you can’t tell what you’re looking at if you don’t understand the code! If you find 100 seemingly “random” strings, some will be test files. Some will be binary files. Some will be encrypted or hashed, which are ok to have in code and a tool should be able to distinguish them.

Cryptographic keys are usually long enough that you can use statistics to determine if the string is sufficiently random enough to be a key, but for many types of files, it isn’t clear-cut and the problem becomes: how many false positives can you accept? GitHub recently changed their entire API authentication tokens so they are easily detected by scanners.

The other challenge has become the speed of development. Any organization can perform an ad hoc code review and identify a good number of secrets from a detailed manual search, but this isn’t a scalable solution.

Ok, it’s hard. So what can I do about it?

There are multiple things you can do to identify secrets, remediate issues, and reduce your risk:

Follow Key Management Best Practices. Current security requirements call for passwords and tokens to be used only once but that’s unfortunately not the reality. Developers will use API keys in test environments and the same keys for staging and production. Passwords that were always intended to be temporary get missed and left in production software in a rush to meet deadlines. We have identified keys stored in our customers’ test environments and gotten an earful about overly-sensitive alerts, only to discover that those same keys were used in production. In addition, security practices need to be followed even in test environments.
Use a third-party Secrets Detection solution. These tools will usually employ a combination of Regular Expressions (RegEx) to detect patterns in code that indicate a secret as well as entropy checks to detect randomness that may indicate a password or key.

Is Secrets-in-Code detection and remediation a stand-alone function?

Understanding and remediating the risk of secrets-in-code cannot be done in isolation! There is a significant difference in risk between finding a secret in an application with low business impact that is deployed on-premises compared to finding a similar secret in a high business impact application that stores PII! Risk is multidimensional and secrets-in-code is only one part of the larger picture surrounding multidimensional application risk.

Secrets Insights 2022

Research report analyzing 25K+ private repositories

Read Now

Where do existing solutions fall short and how can Apiiro help?

There is an entire industry around detecting exposed secrets in code but there are a few ways that many existing solutions fall short:

Code context. Having a deep understanding of the code is the essential element in not only detecting difficult secrets but in minimizing false positive rates. With an understanding of how the code functions, it is possible to test potential passwords, API keys, and more in a way similar to how the actual code would use them. This improves detection (and false positive rates) by orders of magnitude.
Developer Behavior. If a particular developer has added multiple secrets to code in the past, the certainty score for secrets is higher than for developers who do not have a history of mismanaging secrets in code.
History. Many solutions only evaluate the current source code but entirely ignore secrets that may be stored in previous versions of the application across all test environments. While secrets stored in the history of your source code manager cannot be captured by reverse engineering a binary, they still pose a risk in the event of a hacker gaining access to the source code repository, which holds all historic revisions as well.
Orchestration. Apiiro automates and orchestrates secrets discovery, remediation, and prevention. We comment on the pull request, create automated workflows, and send the appropriate alerts to make the process as simple as possible – and we use your existing tools, from Jira to Slack!

Apiiro uses a variety of techniques to identify exposed secrets in code. We use the latest algorithms for entropy detection of crypto keys and leverage our deep understanding of the code to look at the context. We also do this over the entire history of your code. In addition, Apiiro provides continuous detection of secrets, with automated workflows so you can manage your code and your risks as new secrets are introduced. Apiiro also understands which key management systems are already in place and can instruct the developers on how to remediate instead of only showing alerts.

If you’re interested in learning more, schedule a demo today! Also check out the Dependency Combubulator at combobulator.io!

Igal Kreichman

VP of Engineering