The Current State of Security, Privacy and Attack Surface on Android: Scanning Apps for Secrets and More – Wave 8
With over 6.8 billion smartphone users worldwide, the mobile phone market is booming at an unprecedented rate, and so is the production and usage of applications on all sorts of platforms. One of these smartphone platforms is Android, and its primary method of distributing applications is through the pre-installed Google Play Store. With over 2.87 million apps available for download, the platform experienced more than a record-breaking 218 billion downloads (as of 2020), making it the most used application-distribution platform in the world. With the increased usage of android applications, consumer demands are growing too, especially regarding their security and privacy. Producing quality apps while maintaining user trust and not compromising security has become a significant challenge for most Android developers.
In our Attack Surface scans, we at RedHunt Labs have encountered several cases where many apps were exposed to security issues after they were distributed among users due to lacklustre security practices and human errors. Hence we decided to conduct a mass scan of more than 30,000 apps listed on the Google Play Store. This blog details the security posture of the apps we scanned and how the implications may impact developers and end users by highlighting our approach, findings, and conclusions.
Our first step was to create a dataset containing 30,000 Android apps to be scanned. We specifically scanned apps which were more popular, i.e., with more than 100,000+ downloads for more realistic estimates. Furthermore, we specifically scanned apps from the given categories:
- Health & Lifestyle
While the main objective was to scan these apps for hardcoded secrets, we also extracted the following data for some additional insights that we will talk about later in the blog:
- IP Addresses
- App Permissions
- Secret Tokens
Our in-house tools used to conduct this wave are written in Golang due to its simplicity and speed. These tools perform a combination of several operations, which include downloading the latest Android application archives (called “APKs”) based on their application ID, “decompiling” them and extracting various pieces of information, such as dex files, smali files, and XML files. These files are typically where sensitive information (such as API keys) may be visible in plain sight that an attacker can maliciously exploit.
Now that we had everything in place to correlate assets and secret tokens to their respective apps, we proceeded with the analysis. Before the actual scan, we conducted a prototype scan to check for points of failure. The trial scan concluded with downloads of around 7,000 apps in 48 hours. Once we were ready with our scanner and had a working structure, we proceeded with our scan of 30,000 apps.
Below is an architectural overview of our process:
Our tools scanned all the target apps for hardcoded secrets with a list of 314 unique signatures. From a total of 30,000 apps, we discovered 2,772 apps that exposed at least one directly correlatable secret.
As illustrated above, Google Cloud Platform (GCP) API keys consumed a significant portion of this list, exposing 2,749 keys. 2,392 of these keys had a Firebase reference in the URLs extracted alongside the other assets from the apps. The AndroidManifest.xml file was a significant source of the exposed secrets, with 2,554 files revealing such secrets. Other files revealing secrets included common culprits such as public.xml, index.html, and strings.xml, with 100+ exposures each.
The above illustration highlights the distribution of secrets in different files. Although this may seem trivial, there is a noticeable trend in the files – such as a naming convention, extension, and hierarchy.
Additionally, we extracted the URLs and endpoints in the apps during our scan. Out of more than 532,000 URLs, we found 3,418 URLs belonging to popular cloud storage platforms. Amazon Web Services (AWS) contributed significantly to these URLs, with over 2,430 URLs containing “amazonaws.com”. Other platforms we found included Google Cloud Platform (firebasestorage.googleapis.com and storage.googleapis.com), Microsoft Azure (blob.core.windows.net), and DigitalOcean (digitaloceanspaces.com).
The above graphic summarises the top cloud platforms that we discovered during the scan used by developers for object storage. Integrating a cloud platform doesn’t necessarily indicate a security risk for the application or the end-user. Still, resources such as storage buckets are often misconfigured and left open to the general public, which can introduce critical risks to an organization, especially to sensitive user information. If you have any S3 or open-directory-listed URLs that you would like to test for leaking PII (Personal Identifiable Information), you can try out and perhaps contribute to our very own scanner Octopii.
From more than 403,000 permissions extracted from the AndroidManifest.xml files, we found 9,784 unique permissions that these apps required. These permissions were not only native Android permissions but also included permissions related to other applications or services.
The illustration above breaks down the most common permissions requested by the apps, such as android.permission.INTERNET, which is unsurprisingly used by apps to connect to the Internet, consumed a significant portion of this list with over 25,400 mentions. Apart from this permission, android.permission.ACCESS_NETWORK_STATE and android.permission.WAKE_LOCK also contributed significantly to this list.
The most extensive set of permission requirements we discovered from our list was the app, Alipay (com.eg.android.AlipayGphone) by Alipay (Hangzhou) Technology Co. Ltd, which had a whopping 244 permission requirements as of July 12, 2022.
While the permission requirement count itself doesn’t indicate a security risk but perhaps one to a user’s privacy.
Some examples of such permissions are:
We recommend checking out this research paper for more information on how these permissions can be abused in malware campaigns.
Google Analytics Tracking IDs
Although Google Analytics Tracking IDs had a minor contribution to the scan results with a total of 41 IDs, we thought of mentioning them anyway, considering how crucial user engagement data is for organizations. Let us suppose an attacker gains access to a Google Analytics Tracking ID (format: UA-XXXXXX-X). The attacker can use that tracking ID to obscure all future analytics data by putting the same tracking ID on other websites or applications.
There is no direct in-code way for developers to obfuscate certain secrets, such as API keys and personal access tokens stored in apps – as the APIs directly use the stored/static values. Some forums and resources may recommend using ProGuard, despite ProGuard often leaving API keys in the compiled dex files and offering minimal obfuscation, such as changing variable names.
Considering the various secret tokens and IDs that these apps have exposed, we recommend app developers follow secure practices when storing such data in the source code. Some of these could be:
- Use the Secrets Gradle Plugin: The open-source Secrets Gradle Plugin for Android Studio reads secrets and API keys from a properties file and ensures that the file is not checked into a version control system. The plugin then exposes those properties as variables in the Gradle-generated BuildConfig class and manifest file. The secret is obfuscated using XOR and is stored as data in an Android NDK binary as a hexadecimal array to make disassembly difficult. This prevents the compiler from disclosing the secret by forcing optimization of the de-obfuscation logic at runtime.
- Leveraging the NDK: Developers can also use proprietary techniques with the Android NDK (Native Development Kit) and its C/C++ integration for Android applications to mask API keys, thus wasting an attacker’s time. NDK libraries and source code are difficult to decompile, making the process of finding secret tokens and IDs via code inspection more cumbersome than the traditional approach of going through decompiled source code. Click here to learn more about this approach and how it can help.
- Post-authorization: Instead of hard-coding secret tokens or IDs in the source code, developers can arrange dynamic loading of credentials during the runtime of those applications after successful end-user authentication. This eliminates the need to hardcode secrets but must be done over a trusted and secure TLS connection.
- Monitoring API usage: APIs can be rate-limited by actively monitoring them. This can be done by restricting keys, migrating to multiple API keys, or using separate API keys for each app. If unauthorized usage continues, regenerate or delete the affected keys.
Similar to some of our prior project resonance waves, we are releasing data from this study while ensuring we don’t reveal any sensitive/confidential/vulnerable details.
Our research shows that secret leakage is still an issue developers must be addressed with secure development practices. We recommend that developers ensure that any cloud computing solutions they integrate into their apps are correctly configured to avoid the leakage of sensitive information and prevent threats for both the organization and end-users. We recommend that end-users pay attention to what permissions their favourite apps require for everyday usage and check if they are even needed to combat predatory practices.
RedHunt Labs is currently in the process of responsibly disclosing the exposed secrets to the affected developers. Once we get a response from the developers, we will dispose of the secrets we have discovered.
How can we help?
Our SaaS-based Attack Surface Management solution, NVADR, continuously keeps track of your organization’s external digital footprint by identifying and profiling the assets as they surface on the internet. The assets we identify are way beyond IP Addresses and Subdomains, and we cover a wide variety, including Docker Containers, Mobile Applications, Code Repositories, and much more. Once these assets are identified, we find security misconfigurations across all asset classes (including secret exposures and misconfigured cloud storage buckets).
To understand how NVADR can help your organization improve its external digital footprint and security posture, Request a Demo.