Thousands of Unauthenticated Databases Exposed on the Internet
Databases are the cornerstone of every piece of technology existing in this cyberspace. As more items across the world get connected to the internet, the risk of sensitive data exposure also increases. Over the past few years, large-scale data breaches have become so commonplace that a few million records leaking feel quite unremarkable. One of the most common reasons that get buried beneath the headlines are poorly secured / unauthenticated databases that connect directly to the internet.
As we studied the stats from our Attack Surface Management (ASM) portal “NVADR”, around 40% of the exposures contain untracked assets having sensitive unauthenticated content exposed on the internet. While these resources range from unauthenticated source code repos to internal documentations, querying systems/portals, dashboards, etc., unauthenticated databases came out at the top. These exposed DBs are not only very frequently discovered, but are also quite critical in terms of the impact and increase the attack surface manifolds.
Therefore, at RedHunt Labs we decided to do an internet-scale study on the security posture of databases exposed on the internet. This blog post details all the specifics as well as analysis of results obtained during our research.
Unearthing Unsecure Databases
What we did.
One of the baby steps of this idea was choosing the databases we should cover in our research. A total of eight databases were selected for our research, each of which we ingeniously studied in our home environment:
- Apache CouchDB
- Apache Cassandra
- Hadoop HBase
We created a scanner keeping in mind both speed and accuracy to cover the entire internet while respecting our exclusion lists. We chose to keep this very non-intrusive by making use of a uniform single packet scan across the entire IPv4 space. An architectural overview of the tool is depicted in the flowchart below:
We will now examine the databases one by one in-depth as well as discuss our findings and their implications.
MongoDB is an open-source cross-platform document-oriented database program and is one of the most popular NoSQL databases that uses JSON-like storage objects. Although recent MongoDB versions implement strict ACL policies, versions before 2.6.0, listened for connections on all interfaces by default. Naturally, any default installation would be open to unauthenticated connections from the internet.
We found a total of 21,387 unauthenticated/exposed databases. [Interactive Map]
Thankfully, the latest versions of MongoDB now by default listen for local connections only. However, our research suggests that it’s not just insecure defaults behind these exposures, since the majority of the unsecured MongoDB versions were greater than 2.6.0. A quick overview of the vulnerable database versions are depicted in the figure below:
Despite the efforts of the MongoDB team coming up with security best practices, a major portion of the databases is still lying unauthenticated, open to the internet.
Elasticsearch is a NoSQL document-oriented database aimed at high-performance searching, analytics and visualization. In its essence, Elasticsearch implements different ACL policies for different versions of their software which varies from license to license. From the elasticsearch docs, if “you use the free/basic license, the Elasticsearch security features are disabled by default.” For other enterprise licenses, authenticated is turned on.
In our research, we identified a total of 20,098 exposed elasticsearch instances. The vulnerable IPs were scattered across 104 countries and 982 cities. [Interactive Map]
Interestingly, a very old version – 1.4.1 popped the second position in top elasticsearch versions and had 577 instances. A quick overview of the versions can be found in the graph below:
As a security measure, with the recent versions of ElasticSearch, default installations include a Warning header implying that the “built-in security features are not enabled”:
Redis is an in-memory data-structure storage system that can be used as key-value pair databases, caches or as message brokers. It was designed for use by trusted clients in a trusted environment, with no robust security features of its own. Although the docs clearly state “Access to the Redis port should be denied to everybody but trusted clients in the network”, we uncovered a huge number of databases in the wild.
A total of 20,528 unsecured databases were unearthed during our research. [Interactive Map]
As a remedy to the alarming number of users failing to secure their Redis instances, the developers decided to introduce “Protected Mode” from version 3.2.0 — whereby Redis only replies to queries from the loopback interfaces. Clients connecting from other addresses receive an error message explaining how to configure Redis properly. However, despite the security steps taken to remediate this, a majority of the exposed Redis instances had versions >3.2.
Similar to Redis, Memcached is yet another general-purpose distributed memory-caching system often used to speed up database functions. In terms of security, Memcached too doesn’t implement any authentication mechanisms and listens on all interfaces by default. This behaviour has already proven dangerous in the past due to the denial of service amplification attacks.
Appallingly, we found a total of 25,575 Memcached servers exposed publicly in our research. [Interactive Map]
The top versions for memcached are visualized below:
CouchDB is a very popular NoSQL database similar to MongoDB. Since its inception, CouchDB has followed an “open-by-default” approach that made the default installation vulnerable to attacks.
We found a total of 1,977 unsecured CouchDB instances. [Interactive Map]
Thankfully, with the development of v3.0, CouchDB developers have finally decided to step up and switch to a “secure-by-default” rather than the “open-by-default” approach. This requires setting up administrative accounts before initializing a database. A valid observation from CouchDB databases is that the majority of the database versions are below 3.0.
Apache Cassandra is an open-source NoSQL distributed database for scalability and high availability and performance. However, from a security perspective, a default installation may well be considered ripe for compromise. Quoting Cassandra docs:
By default, these (security) features are disabled as Cassandra is configured to easily find and be found by other members of a cluster. In other words, an out-of-the-box Cassandra installation presents a large attack surface for a bad actor.
In our research, we found 3,340 databases exposed to the internet without any authentication. [Interactive Map]
Top vulnerable Cassandra versions are visualized in the graph below. Interestingly, v2.0.15 comprised almost 70% of the vulnerable server versions.
RethinkDB is yet another open-source database that makes use of JSON documents with dynamic schemas for real-time data processing. By default, admin is the built-in account that has all permissions from a global scope but has no password. Now comes the fun part: the web administration interface always connects with admin privileges with no authentication. Adding fuel to fire, there is no way you can enable authentication on Web Administration UI at all. The only way to secure it is by changing the interface on which the cluster listens for connections.
During our research, we found 570 such exposed databases left open to the internet. [Interactive Map]
Surprisingly, one of the top RethinkDB versions included an extremely old version – 1.16.2-1 (released in 2015). The graph below depicts top versions from our scan:
Apache HBase, often referred to as the Hadoop database, is a distributed, big data storage system. The Hadoop ecosystem is quite complex and has multiple dependents, including HDFS, YARN and Zookeeper. Quite obviously, the authentication process is also fairly complex. HBase implements authorization on the RPC level via Kerberos as it strictly follows SASL. Once again, the default installation doesn’t come with any requirement for authentication (hbase-default.xml):
A total of 1,846 unsecured HBase installations were discovered during our research study. [Interactive Map]
Hadoop also ships with a WebUI administration interface that allows easy access, including full read/write access to the filesystem (HDFS). We noticed that all the vulnerable databases also included an unauthenticated HTTP WebUI over port 50,070 exposed to the internet. This broadens the attack surface since the presence of a WebUI simplifies the attacker’s process.
It was also found that a majority of the unsecured databases also had port 2,181 (Zookeeper) open — indicating that the entire infra is ripe for exploitation. The involvement of multiple components within the Hadoop ecosystem significantly increases the attack surface overall.
The top 10 vulnerable Hadoop versions are graphed below:
Let’s discuss all the probable factors that might have contributed to the exposure and takeaways:
One of the strikingly most common similarities between the databases we studied was the insecure settings that come with default database installations. A lot of people often argue over the fact that such configurations are an attempt to balance usability and security, but it’s not a minor issue. Any presumption of the fact that databases will purposefully be installed with considerable security measures is a total fallacy. Fortunately, a few of the database developers have stepped up to resolve this problem collectively by following a “secure-by-default” strategy.
Lack of Awareness
Upon uncovering the huge number of databases open to the internet, we somehow feel that there is not enough awareness amongst developers. From the statistics above, it is clear that despite databases coming with secure installations, they are being left open to the internet for some reason. Awareness of the security context is very crucial when building an internet-facing product. Developers should be encouraged to go through the official documentation thoroughly (esp. when configuring security) before setting up any infrastructure.
Untracked Assets including Shadow IT and Crown Jewels
Shadow IT, or infrastructure deployed without proper tracking can lead to critical data theft if not discovered early. Mission-critical information assets – an organisation’s “crown jewels” — are information assets of greatest value and would cause major business impact if compromised. Such high-value assets need to be continuously tracked and managed throughout the development lifecycle so that proper action can be taken before the event of an exposure.
The consequences of an exposed database can be devastating which can range from breaching of existing data to abuse and escalation of privileges to compromise. A typical breach can have very destructive effects on the company reputation and can severely impact the consumer base reliability.
Final Thoughts and Conclusion
We have stepped into an age where every bit is getting connected to the internet. From the statistics above, we can see that a major share of the databases on the internet remains prone to attacks. This also throws light on a crucial point — that proper asset management is not being done efficiently. As security researchers, we continue to strive to raise awareness amongst organizations of such potential threats through our Project Resonance R&D program.
If you want to track your organization’s databases (and other such sensitive components) which should never be exposed publicly on the internet, our SaaS-based offering NVADR can help in strengthening your organization’s external security continuously.