Scanning Millions of Publicly Exposed Docker Containers – Thousands of Secrets Leaked
Containers have been the rage. Containers allow developers to package the code, its dependencies, libraries, and more so they can run reliably on different computing environments. Containers are lightweight and portable.
One might deploy far more apps using containers on the same old servers than using VMs. It solves the dev-ops problem of “worked fine in dev, ops problem now” and also helps enable CI/CD.
Docker is a popular tool that has become synonymous with containers. Docker can build images and run containers. The tool also allows its users to upload their docker images to Docker Hub – the container image registry from Docker Inc. that helps share images. Docker Hub allows free public repositories for images.
To share docker images, all one needs to do is create a user account in Docker Hub and execute `docker login` with the creds. Voila, now you can keep pushing as many docker images as you want.
During this research, we found more than 1.6 million unique user accounts on Docker Hub. There are more than 6.3 million public repositories at the time of writing this blog.
The Unspoken Treasure Trove of Secrets
When it comes to automated scanning of Docker images, a huge emphasis is placed on vulnerable dependencies, excessive runtime privileges, and more.
Even docker’s very own vulnerability scanner doesn’t scan for secrets. It scans for vulnerable dependencies. While it does secure your docker image, it doesn’t stop the user from copying the AWS credentials config file or any other secrets to the docker image (knowingly or unknowingly).
There’s a default Docker Hub setting to make things worse:
This default setting makes the issue deadly.
In simple terms, if you hardcoded credentials into your docker image and push it to your free Docker Hub account, anyone can download the image and look at the credentials.
So we assume that even if an organization uses some tool to scan all their docker images, it might still not detect hardcoded credentials in the docker image and employees might have made those images public.
To test this hypothesis (and see it for ourselves) we planned to search the entire Docker Hub to fetch secrets and other interesting issues (if any).
This idea of scanning secrets in containers is not something new. There has been at least one public attempt to do the same – by Matías Sequeira. His method finds all the recently updated repositories and gets all the tags using Docker Hub APIs. Once that is done, do a docker pull of each tag and search the files using Whispers to detect secrets. He got good results, but his research was limited to recently uploaded images.
We wanted to search across all the public Docker Hub repositories and look for secrets in them.
(We were quite surprised by what we found. If you are interested in the results and not the process to find them, then head over to the top 5 exposure issues we found in the scan.)
The “Hunt” begins
After playing with Docker Hub for a while, we found that:
- Docker Hub allows fetching information of users, their repositories, and the repositories’ tags with the help of APIs. Docker Hub APIs also allow getting the Dockerfile of each image.
- Docker Hub has two types of accounts: “Users” and “Organizations.” What makes Docker “Organizations” different is that its profile information has a gravatar email address in most cases.
- Docker Hub APIs have an IP-based rate limit. You need not worry about it if you are doing a side project with those APIs. But if you are trying to query/scan all repos on Docker Hub, it becomes a bottleneck.
We understood that we can fetch the Dockerfiles for all the repos of an account with the above information. So the first step was to gather as many user/org accounts as possible. We spent a few days brainstorming different ways to get the user accounts – scraping, crawling, heuristics, etc.
Using all our techniques, we were able to gather around 1.6 million accounts.
We created a tool that does the following:
- Get the username, fetch all profile information and repositories
- For each repository, fetch all tags and metadata (like when was the tag created and which user updated it)
- For each tag, get the Dockerfile and store them for further analysis
- Analyze all docker files for secrets and other insights
Results from the Massive Docker Scan
Docker Hub Account Analysis
We were able to find around 1,684,600 unique Docker Hub accounts. In that, 67,450 (about 4%) are of the type of organization. Over 96% of the 1.6 million accounts have at least one public docker repository.
Vulnerable Base Images
A docker image created using a vulnerable base image will remain vulnerable in most cases. It’s primarily due to CVEs in the software packed with the base image.
We found more than 6.3 million unique public repositories in Docker Hub. Analyzing the latest tag in each repository showed that Alpine Linux is the most used base image. It’s lightweight, fast, and comparatively secure, making it the best choice to build your docker image.
A surprising fact is that 6 out of the top 10 base images were built more than a year ago. Any vulnerability that got patched since the build would not have been patched in the original image.
For example, analyzing the most used base image alpine:3.11.6 using Snyk shows that it has 12 vulnerabilities in which two are critical, and five are high.
Okay, let’s get back to how this experiment started – hardcoded secrets in Dockerfiles.
As expected, docker files contained a mind-blowing amount of hardcoded credentials. They included AWS and other cloud environment access keys, private keys, webhooks, and more. The most commonly found secret was the username and password to clone git repositories.
This was not only for public Git platforms like GitHub, Gitlab, etc but also self-hosted instances.
Statistics of Leaks
We observed that 46,076 docker images either hardcoding credentials or copying sensitive config files to docker images.
A total of 10,181 repositories were identified to be leaking 15,541 hardcoded secrets. In addition to that, a total of 57,589 potentially sensitive configuration files were copied to Docker IMages across 36,176 repositories.
Top 5 exposures in Docker images
After going through all the docker images, we found it hard to explain all security issues in a single blog post. Also, there is a whitepaper on critical vulnerabilities in docker images, so we will not talk about them here. Instead, let’s talk about the top five ways your docker images can expose sensitive information.
1. Hardcoded secrets
Hardcoded secrets remain the number one way how docker images expose sensitive information. These secrets are either hardcoded as environment variables (ENV), build arguments (ARG), or even as part of run commands (RUN).
Apart from secrets, the environment variables of some docker images contained all sorts of information (including metadata). Like the runtime environment type (staging/prod), subdomain/domain which the app will serve, internal telemetry endpoints, etc.
The next common way the secrets were hardcoded is in the build arguments ARGs. These build-time arguments will not be available to the docker container; however, the default values (if any) will still be hardcoded to the docker image.
The final way secrets were hardcoded is in the RUN commands. Mostly to clone a repository using code platform (GitHub, GitLab, Bitbucket, etc) access tokens. Others include fetching executables over FTP, internal code repositories, credentials to corporate proxies, package artifactories, and more.
2. Copying sensitive config files to the docker image
It’s like hardcoding secrets but as files – directly to the docker image. Using Dockerfile alone, one might not be able to guess if a sensitive config file is copied.
However, we found a good number of docker images that copy config files explicitly. Use cases include:
- copying credentials file to ~/.aws/credentials
- copying GCP service-account keys
- copying SSH keys to clone repositories
- copying maven config file to ~/.m2/settings.xml
- and much more
We found a common mistake with creating docker images: copying a private key in one layer and deleting the key in the next layer. One must understand that docker images are a collection of read-only layers. If a sensitive file is added in a docker layer, it can’t be removed in the following intermediate layers. You must remove it in the same layer.
We found 36,176 repositories that explicitly copy a sensitive config file to the docker image using Docker’s ADD / COPY instruction. The most common config files were as follows:
3. Adding the entire git repo
Copying folders to docker images along with git logs is the same as exposing the git repository.
This issue usually occurs when you copy files using ADD/COPY instruction (mostly without a .dockerignore file). We saw the insecure practice of using `COPY . .` or `ADD . .` in Dockerfiles.
Once an attacker gets access to the image, they can find emails of all committers, the source of the repository, code changes, and hardcoded secrets. Even if there’s no git directory where the Dockerfile exists, it might still copy any dotfiles containing configuration/secrets.
In many docker images that we manually analyzed, we found them to contain .git directories or other sensitive dotfiles.
4. Paid/Proprietary software licenses
There are multiple repositories where software is activated within the container. The software license is obviously sensitive. Talking about the software installers, some are not even publicly available.
So both the proprietary software and the license to activate it was found in some images.
5. Setting default credentials for applications
Default credentials hardcoded in images are not a risk in itself. They add risk when the same images are used without changing the default credentials. An attacker might be able to escalate privileges or even compromise the host using the default credentials.
These are the top 5 exposures in docker images.
Seeing all this, you ask: Enough, but..
How to proactively stop exposures in docker images?
Before we answer that, can you secure and remove exposures in all docker images created within your organization? Not only the images created in your pipelines but also those on your employee laptops.
Short answer: No
Few of the docker exposures that we reported via HackerOne were secrets exposed in repositories owned by employees’ personal accounts.
So it’s hard to have an in-house solution that checks for hardcoded credentials in all created docker images and monitor public docker registries like Docker Hub.
If you are concerned about the docker images built within your org’s CI pipelines, then follow these best practices during the build process:
- Don’t hardcode tokens/API keys in docker images if you need to access/authenticate/authorize something (internal or public). Pass them to containers as environment variables from the docker CLI tool.
- Do not clone/download the required files using credentials. Instead copy them to the image.
- Have a .dockerignore file that ignores .git directories, logs, and source code files.
- If you are creating a docker image to run a binary, just copy the required binary and its dependencies to the docker image. Not the source code. If you are creating the docker image to build a binary, do not publish the docker image (as it will contain source code).
- To share docker images, use a container registry that’s private by default – like AWS ECR or GCP Container Registry. If you want to use Docker Hub, then get a paid subscription and set the default repository privacy to “Private”.
You also need to educate internal teams on the security best practices – like secure docker image creation, regularly updating base images, removing any potentially sensitive data from existing images, and more.
Docker containers are one of the most common technologies used by organizations. They contain quite a good amount of information about the organization’s infrastructures/environments. However, organizations quite often don’t keep a track of their publicly exposed containers and secrets leaking through them. A huge amount of secrets were identified to be leaking out during our research study.
Organizations need to continuously monitor their docker containers (and other asset types) for their potential security risks.
We at Redhunt labs help organizations continuously discover their Attack Surface across subdomains, containers, git repos, etc., and help identify security risks on External Attack Surface before attackers do.
Our Agentless ASM Platform NVADR has been able to identify critical data leaks across their publicly exposed docker containers for many of our customers.
If you would like to check out your organization’s Attack Surface, we offer a Free Scan. Request Free Scan here.