Dependency Confusion Attack – What, Why, and How?

Introduction

Software libraries have become such an integral part of software development that most of the popular programming languages come with their own ‘package manager’. Usually, the user only needs to specify the name or source of the library, and the package manager handles download and installation on its own. These package managers have become simpler by abstracting the complex logic of managing packages from the user, and as a result, have led to supply chain attacks. One such attack vector recently discussed in the one of the bug-bounty report was dependency confusion attacks.

In the dependency confusion attacks, a user can be tricked into installing a malicious dependency/library instead of the one they intended to install. It can be as simple as creating a package named emailextract to infect any users that may forget to put the hyphen in the actual package name email-extract.

This kind of malicious library installation can happen not only on end-user machines but also on machines where CI/CD pipelines are built and hence can cause a large number of organizations to be vulnerable. Microsoft has already warned Enterprises to stay safe from this ‘Dependency Confusion’ or some may call it, ‘Substitution Attack’.

In this article, we are going to talk about some techniques that exploit interesting behaviors of package managers as well as some generic problems like Typosquatting.

Alright, let’s understand how this attack can cause a high risk to your organization’s Attack Surface.

Python

Python packages are installed via pip and the primary repository for them is pypi.org. The dependencies are usually kept in a file named requirements.txt in the root directory of the program.

Case #1: Different package & import names

In python, if you run a program that depends on a library named requests and it is not installed on your system, you will get the following error:

ModuleNotFoundError: No module named 'requests'

Installing it is simple, you just need to run pip install requests. However, this is not always the case. Sometimes the name of a package and the name by which it is referenced in programs is different.

OpenCV, a popular image processing library can be installed by the name opencv-python but is imported into programs as import cv2. If you try to use it without having it installed, you will get the following message:

ModuleNotFoundError: No module named 'cv2'

But to use it, you need to install opencv-python, not cv2. What can go wrong? A lot, and Murphy’s Law.

There are 501 questions on StackOverflow and 230 issues on Github that mention pip install cv2 isn’t working.
Around 400 developers on Github are suggesting their end-users to try pip install cv2 if their software doesn’t work.

A malicious actor can register a malicious package with the import name and hence infect all the users making this mistake.

Case #2: Multiple indexes in CLI

pip cli provides two options to control installation sources and their priority:

--extra-index-url
--index-url

`--extra-index-url`

This option can be used to specify a fallback server for installing packages in case a package is not available publicly on PyPI.

So, if you have a private package named mycompany-stuff that is not available on PyPI, you can use --extra-index-url=https://your.company.com/index to tell pip to try to download it from your custom package index.

What can go wrong with this?

If someone registers a public package with the same name as your private package mycompany-stuff, your attempts to install your package with --extra-index-url option will result in downloading the public package because your server is just a fallback.

`--index-url`

https://pypi.org/simple is the default package index used by pip for installing python packages but it can be overridden by specifying a custom index with the --index-url option. But then you won’t be able to install public PyPI packages.

To get around this, you can make https://pypi.org/simple a fallback index using the --extra-index-url. Yes, that works and is suggested by a lot of folks on internet forums.

Here comes the catch, pip will check both indexes for the package and will choose the one with the latest version of the package. So to hijack your mycompany-stuff 0.1.12 installation, a malicious actor can just create a PyPI package with the name mycompany-stuff and a much higher version e.g. 100.1.1 to always let his package get greater precedence.

Ruby

Ruby packages are installed via gem or bundle command-line utilities and the primary repository for them is rubygems.org. The dependencies are kept in the root directory of the program in a file named Gemfile.

Case #3: Multiple Sources

The first line in a Gemfile is usually source "https://rubygems.org" which specifies the source for downloading the packages.

It is possible to define multiple sources as follows:

source "https://rubygems.org" source "https://yourgems.com" source "https://mygems.com"

The order in which these sources are prioritized for downloading packages is from bottom to top i.e. the last source is preferred. Using multiple sources is a security concern and the bundle warns the users when using such configuration.

To address issue and allow better source control, Ruby introduced support for specifying sources and versions for each dependency individually as follows:

gem 'rack', '2.1', git: 'https://github.com/rack/rack' gem 'nokogiri', '1.7', git:'https://github.com/sparklemotion/nokogiri'

Here’s an excerpt from bundle’s latest docs about its priority handling (gem means a ruby library btw):

When attempting to locate a gem to satisfy a gem requirement, the bundler uses the following priority order:

1. The source explicitly attached to the gem (using :git, :source or :path) 2. For implicit gems (dependencies of explicit gems), any source, git, or path repository declared on the parent. This results in bundler prioritizing the ActiveSupport gem from the Rails git repository over ones from rubygems.org 3. The sources specified via global source lines, searching each source in your Gemfile from last added to first added.

Despite this, the practice of defining multiple sources can still be seen in the wild. It is another problem that is addressed later in this article.

PHP

PHP packages are installed via composer and the primary repository for them is packagist.org. The dependencies are kept in the root directory of the program in a file named composer.json.

Case #4: Composer’s global sources

Opposite to Ruby’s bundle package manager, PHP’s composer prioritizes sources from top to bottom i.e. the first mentioned source gets the most priority.

Apart from this quirk, composer’s config command has a command-line option --global which lets a user add a package source to composer’s global configuration for subsequent usage. Installing packages without removing the default package repository leaves the user vulnerable to previously discussed version preference-based attacks.

JavaScript

JavaScript packages are installed via npm and the primary repository for them is npmjs.com. The dependencies are kept in the root directory of the program in a file named package.json.

Priorities, again

As demonstrated by Alex in his blog, npm suffers from priority confusion as the package managers discussed earlier. If your internal dependency has a lower version than the public package of the same name, the public package will be preferred.

The typosquatted sources problem

Typosquatting is something very common that we programmers do, and is one of the major issues that affect almost all kinds of development environments.

One very evident example at the time of writing this blog, is the popular Golang package logrus, where someone deliberately registered a possible typosquatted import path name, i.e. siruspen from sirupsen. To validate this issue, a quick search on GitHub yields around 16 results which inadvertently seems to use the typosquatted package.

Another extremely common, yet crucial problem in the GoLang development environment is automation when writing code. The popular VS Code Go Extension along with the Go Programming Language Server resolves automatic imports as you type in your code. So if you mistyped an import statement, the go.mod file (used for managing dependencies) automatically gets updated as a part of the automation process.

In such cases, once a typo squatted malicious package gets introduced in the code, it can be virtually impossible to detect it and might live for a long time before it gets discovered.

Case #5: The deleted sources

If the owner of a package deletes their account or a domain serving dependency packages expire, the supply chain becomes prone to hijacking. It is a common problem that affects all package managers.

Taking a quick look at alternative ways of installing Python packages, we can find that pip allows package installs from git URLs, e.g.:

pip install "git+https://github.com/‹USER›/‹REPO-NAME›.git@‹TAG-OR-SHA›#egg=‹PKG-NAME›"

Now it is obvious that if the user has deleted his GitHub account entirely, the package can be hijacked simply by registering the username on GitHub itself.

Even if it’s not possible to hijack the account/domain, making it unavailable or uninstallable may make a package manager use a default source.

For example, an attacker may inject faulty code into a seemingly helpful merge request to the original package which may cause it to throw an error on installation, forcing the package manager to download the package from another server.

The human dependency problem

People often represent the weakest link in the security chain and are chronically responsible for the failure of security systems.
Bruce Schneier

Solutions for most of the technical problems are just a google search away. However, these ‘solutions’, especially the ones that are not posted on communities like StackOverflow are never checked for quality nor updated for the latest best practices.

Once a solution is upvoted by users and builds a reputation, it doesn’t take long to appear all over the internet. If a security issue is detected within that solution or it becomes a deprecated practice, it still stays on the internet continuing to put users at risk.

Conclusion & Recommendations

Even though the collaborative nature of software libraries poses a risk even for the most security-conscious organizations, they are irreplaceable in the software development process.

Here are a few recommendations to minimize the risk of supply chain attacks:

Only use reputed and actively maintained libraries.
Always follow the best practices for installing packages.
Verify the package source before installing or importing code from any library.
If you’re a maintainer of an open-source package, you should register common aliases in which your library is usually imported, viz. similar to what the Python BeautifulSoup library did by claiming bs4.
If you are using private packages, name them in company.software format and register a dummy public package with the same name to prevent hijacking.
A private repository of libraries can be maintained where the libraries only update after a manual/automatic inspection of changes.

What Next?

We realized that this is an extensive area of work and while this attack might not appear significant, it has the potential to do widespread catastrophic damage to the organizations across the internet. We will be doing a Project Resonance Wave very soon to understand the impact of this issue and as usual, an analysis will be shared with the community. If you would like to collaborate, please reach out to research@redhuntlabs.com. Stay tuned…

Update: We did the research as promised and here is the for the same: https://dreamsdesign.us/redhunt/oldsite/blog/top-organizations-on-github-vulnerable-to-dependency-confusion-attack.html

attack surface management Dependency Confusion Attack

One thought on “Dependency Confusion Attack – What, Why, and How?”

The Supply Chain Cybersecurity Saga: Challenges and Solutions - Checkmate says:
February 28, 2022 at 8:44 am

[…] name ‘Dependency Confusion‘ was given to the vulnerability that allows an attacker to execute Malware within a […]

Comments are closed.

Platform

Powerful Features

How a mistakenly published password exposed Mercedes-Benz source code

For the Community

Internet Attack Surface Dashboard

RedHunt Labs' Open Internet Attack Surface Research Dashboard

Latest Blog Releases

Internet Attack Surface Research

Company

Meet Our Vibrant Team!

Helping security teams rise above risks!