Millions of Secrets Exposed via Web Application Frontends - Wave 7

Millions of Secrets Exposed via Web Application Frontends – Wave 7

Introduction

Web applications are the cornerstone of anything on the publicly accessible internet. Due to the complexities of the software development life cycle, developers tend to embed secrets within the source code of the applications. As the code-base enlarges, developers often fail to redact the sensitive data before deploying it to production.

Leaked secrets in source code are a problem that has been plaguing the security community for decades. Therefore we at RedHunt Labs decided to perform an internet-scale study on secrets exposed via the internet-facing web applications. The blog describes our attempt to understand the security posture of the internet as a whole, detailing our idea, methodology, results, analysis, and insights from the research, followed by a tool release at the end.

Methodology

The initial target of our research was mostly directed toward finding secrets in client-side source code. Most modern web applications make use of client-side JavaScript code to provide better usability for the users. A lot of times secrets embedded into those JS files are used in managing authentication like API keys or cryptographic secrets.

However, during the course of the research, we found out that debug pages used in popular software frameworks leak out secrets as well when exceptions occur in the web application. A few such examples are depicted in the picture below.

In order to analyze these leakages, we wrote a tool in GoLang that was capable of scanning and processing a huge number of hosts swiftly yet reliably. During the course of the research, a lot of different features were added to the tool – which included the ability to automatically fill forms (carefully) and trigger debug pages for specific technology stacks.

An architectural overview of the tool is depicted in the flowchart diagram below:

Hunting for secrets

Operational Methodology

To meet our objective, we decided to conduct the research in 2 different phases. To keep the study non-intrusive, we decided to keep the scan restricted to just the homepage of the domains and not crawl them invasively.

Phase 1 – Alexa Top 1M Domains

The initial phase consisted of scanning the Alexa top 1M domains. We chose Alexa top 1M domains because this is where the majority of the traffic from the internet lands. 

The entire scan took 5 days with the tool running on 1,000 threads. A single machine with a 4 core CPU was enough for this phase to complete.

Insights & Analysis

We were able to capture a total of 395,713 secrets from the top 1 million domains of the internet. Statistically, the following visualization details the top 5 exposed secret types. 

A very interesting highlight to notice here is that Google services, viz. Google reCAPTCHA, Google Cloud, and Google OAuth consumed a major portion – totaling almost 70% of the services where the secret exposure was the highest.

An eye-opening perspective regarding Phase 1 was that in spite of these domains in scope belonging to the top 1 million domains of the internet, the secret exposure was massive.

Phase 2 – 500M Domains

Phase 2 consisted of scanning ~500M hosts covering domains from every top-level domain available. These domains are a small subset of the total number of domains collected by our bots tirelessly scanning small parts of the internet every day.

Phase 2 took almost one and a half months to complete, i.e. 2,000 sites being processed concurrently at once. The scan was distributed across 12 different cloud instances. A lot of effort and time went into maintaining the balance between speed and accuracy during the runtime of the scan and at the same time making sure that target servers don’t experience any issues because of our scans.

Insights & Analysis

For the second phase of the research, we were able to capture a total of 1,280,920 secrets.

From the secret exposure visualization above, interestingly, Stripe token exposures were the highest during this phase of the scan, followed by Google reCAPTCHA keys, Google Cloud API keys, AWS Access and Secret Keys, and Facebook tokens.

The infographic below shows us a graph of the top TLDs exposing secrets.

Digging Deeper into the results

Overall, the total number of secrets that we captured during both of the phases combined was 1,676,634.

Since we majorly focused on the front-end, we anticipated that a majority of the exposures would be through the JavaScript files. Analyzing the results, we found out that almost 77% of the exposures occurred through the JavaScript files being used in the frontend code.

Since most of the JavaScript was being served through content delivery networks, we decided to map the exposures to their sources and extract out insights from our data. The highest number of exposures came from Squarespace CDN leading to over 197k exposures.

Below is a graph of the top CDN providers hosting the files exposing secrets (sorted in descending order).

Country-wise, most of the exposed secrets were from servers located in the US, totaling 86,402 unique IP addresses. [Interactive Map]

Impact

The number of secrets exposed via the front end of hosts is alarmingly huge. The research throws a flood of light on the feasibility of secrets being leaked through client-side source code. It becomes effectively easy for attackers to easily abuse them for compromise. Once a valid secret gets leaked, it paves the path for lateral movement amongst attackers, who may decide to abuse the business service account leading to financial losses or total compromise.

Tool Release

We believe it is important for us to give back to the security industry. Therefore we are releasing a community version of the tool along with this blog. A few highlights of the open-source version of the tool are:

  • Automatically crawl and scrape URLs asynchronously.
  • Check for leaked secrets in JavaScript files.
  • Find and fill out forms to trigger error/debug pages.
  • Automatically detect tech stack and try to extract secrets from debug pages.
  • Has highly customizable flags to tweak the tool usage.

To demonstrate the capabilities of the tool, we have added a video showing the tool used on a site running Laravel with debug mode enabled.

You can get the tool with the detailed installation and usage instructions from our GitHub.

Avoiding issues

Secret leakage via the front end is a crucial issue that can lead to critical security risks. Here are a few best practices to avoid such issues:

  • Setting restrictions on access keys: Services that allow limiting their access via unique URLs, IP addresses, etc are great at preventing such issues.
  • Avoid embedding secrets in code: Centrally managing secrets in a restricted environment or config file prevent hard-coding secrets in source code, thereby avoiding secret leakage via source code.
  • Setting up alerts: Setting up alerts for leaked secrets is another alternative to detecting secrets. Managed services like Amazon Macie do a great job at finding leaked secrets.
  • Regular scanning of secret source code: Continuous monitoring of secret leakage via source code is yet another feasible option.

Conclusion

Our research shows that secret leakage via the front end is rampant and far from solved, placing developers and services at persistent risk of compromise and abuse. As researchers, we continue to strive and raise awareness regarding these issues evilly affecting the security industry through our Project Resonance’s R&D program.

How can we help?

If you want to track your organization’s overall security posture which should never be exposed publicly on the internet, our SaaS-based offering NVADR can help in strengthening your organization’s external security continuously.