tl;dr unpkg.com is a pretty popular CDN for serving up assets from npm packages. I found a vulnerability in a tar implementation that allowed me to write arbitrary files onto the unpkg server, including into other packages. If exploited, this bug would have allowed an attacker to execute malicious Javascript on thousands of websites, including the homepages of PNC Bank, React.js, and the state of Nebraska. Don’t trust a third-party CDN – use subresource integrity and pin hashes!

Vulnerability

When you request a URL like https://unpkg.com/react@16.3.2/, unpkg checks if it already has the package downloaded and extracted at /tmp/unpkg-react-16.3.2/. If it doesn’t, it pulls the corresponding tar file from npm.

Unpkg lets you read any file out of a package once it’s extracted. To serve react’s package.json file, for example, you can just visit https://unpkg.com/react@16.3.2/package.json. Or, to get a directory listing of the whole package, you can visit https://unpkg.com/react@16.3.2/.

Here’s a snippet of what unpkg used to extract the tar file when it pulled a package:

function ignoreSymlinks(file, headers) {
  return headers.type === "link";
}

function extractResponse(response, outputDir) {
  return new Promise((resolve, reject) => {
    const extract = tar.extract(outputDir, {
      readable: true, // All dirs/files should be readable.
      map: stripNamePrefix,
      ignore: ignoreSymlinks
    });

    response.body
      .pipe(gunzip())
      .pipe(extract)
      .on("finish", resolve)
      .on("error", reject);
  });
}

The first problem with this code is that it doesn’t actually ignore symlinks like it says it does. With this tar library, the header.type for a symlink entry is symlink, not link. So right away this gives us arbitrary file reads on the server by creating a symlink to / and browsing through the directory with the web interface.

The second problem with this code is that even if headers.type did correctly check for symlink, the main attack below still worked due to issues with the tar library’s implementation of ignore functions.

First exploit attempt

On my local instance of unpkg, I was able to use this bug to read /proc/self/environ, which would spit out the environment variables of the webserver process. In these environment variables is a Cloudflare API key which I was thinking an attacker could use to do nefarious DNS-related things with their API (this was an untested assumption – I don’t know if Cloudflare supports restricting the permissions on their API keys).

Unfortunately (fortunately?), something about Heroku’s environment made it so that I couldn’t read /proc/self/environ on the real unpkg server. My guess is that this had to do with the incorrect HTTP Content-Length returned by the server. When reading the /proc/self/environ, my local instance reported Content-Length: 0, but still returned the file in the response body. My guess is that some reasonably clever reverse proxy at Heroku sees the Content-Length: 0 and cuts out the body of the reply.

The reason the server returns Content-Length: 0 is because stat /proc/self/environ returns a size of 0, and that’s what unpkg uses to set that header.

Second exploit attempt

At this point I was kind of bummed that I couldn’t figure out a way to take over this server. I went ahead and reported the symlink issue to the unpkg maintainer and went to sleep.

But then I started thinking more about tar files. We can extract files into a folder, we can create symlinks… can we extract files into a directory pointed to by a symlink that’s already been extracted? I pulled out my hex editor and made a tar file that tries this. It creates a symlink to /tmp called link, and then tries to extract a file to link/oops.txt.

I figured there is no way this would work with any mature tar implementation, and sure enough this fails to extract on my laptop:

$ tar -xvf symlink-oops.tar 
exploit/
exploit/link
exploit/link/oops.txt
tar: exploit/link/oops.txt: Cannot open: Not a directory
tar: Exiting with failure status due to previous errors

But unpkg doesn’t use GNU Tar, it uses a package called tar-fs. And tar-fs happily extracts this archive.

And then we win! Since we can write (and overwrite) files anywhere that the webserver user is able to do so, we can overwrite files in the directories set aside for other packages, like /tmp/unpkg-react-16.3.2/. To test this out, I made two versions of a package, and had the second version overwrite files in the first (it worked).

A worse bug than I thought

Many tar implementations also support unpacking hardlinks. Since creating a hardlink to a directory is more often than not an invalid operation, I made a variant of my original exploit that would:

  1. Make a hardlink foo to a file I knew should exist and
  2. Unpack a regular file named foo with arbitrary contents

Sure enough, tar-fs was vulnerable to this attack as well and would allow me to overwrite files as long as I had the proper permissions and knew where they lived on the filesystem.

After reporting this variant of the original bug to the tar-fs maintainer, he got back to me the next morning sounding a little worried. Surprisingly, node-tar, a much more popular tar library, was vulnerable to the hardlink variant. The tar-fs maintainer and I submitted a bug report and node-tar was quickly patched as well.

Oh, and if you ever need a textbook example of defense-in-depth doing its job, just remember that the only reason the npm client (which uses pacote and thus node-tar) wasn’t vulnerable to this attack was because a pacote developer made the prescient decision to never extract hardlinks or softlinks.

Conclusion

How screwed are you and your users if your Javascript CDN starts serving malware? Completely? Then either host files yourself or use subresource integrity. It lets you pin the cryptographic hash of whatever file you’re trying to load, protecting you from attacks like this in modern browsers.

Thank you to the maintainers of unpkg.com, tar-fs, and node-tar for resolving these vulnerabilities quickly.

Shameless plug

If you’re interested in ditching #birdsite and want to use a social network that actually respects your freedoms, you should consider joining Mastodon! It’s a federated social network, meaning that it works in a distributed way sort of like email. Join us over in the fediverse and help us build a friendly security community!