Saturday, January 24, 2015 - 01:15

unserialize() vs. json_decode() in PHP. The exploit devil is in the detail.

For quite some time there is a dogma battle going on what to use. And some folks will still proclaim that json_decode is really just unserialize in slow. That's however not the only difference.

The main difference between the two is the fact that unserialize is magnitudes more dangerous if used for data coming from the network.

Both functions essentially do the same with serialized data representing an object. json_decode however is limited to create a stdClass no matter what you originally encoded. Unserialize on the other hand will create any class it can create.

So let's look at the difference of both in an example. Our test subject will be a sloppy implementation of a 'Cache'. It's not really a Cache but I wanted to delete something ;-)

class Cache {
    public $variable = NULL;
    private $dir = NULL;
 
    function __construct() {
        // we are probably also doing magic shit(tm) here!
        $this->dir = $_SERVER["DOCUMENT_ROOT"].$dir;
    }
 
    function __destruct() {
        //clear our caches!
        if ($this->dir !== NULL && is_dir($this->dir)) {
            $files = glob($this->dir."*");
            foreach ($files AS $file) {
                print "deleting $file\r\n";
            }
        }
    }
 
    public function setVariable($var) {
        $this->variable = $var;
    }
}

Our Cache class sets up the cache on it's own in the constructor and has a public function to set a meaningless variable. The cache directory is private to Cache, the variable is public. Our constructor sets up the cache directory in a way that it is always a tmp directory relative to our web root. Rather static but also rather safe in not hitting the wrong directory with our destruction.

How do those two compare if encoded with serialize and json_encode?

Serialize produces this beast. The special enclosure for private members is replaced with a space!

O:5:"Cache":2:{s:8:"variable";s:3:"Foo";s:10:" Cache dir";s:16:"/htdocs/test/tmp";}

whereas json_encode produces this rather assessable representation

{"variable":"Foo"}

Quite obviously there's a difference. The important difference is that the serialized version not only retains private and protected members of a class. It also retains the class itself.

Hence if we decode both versions we will not get the same result. Since json_encode does not retain the class it can only create a PHP stdClass with a public member variable having a value of Foo. It's basically a data object; a rather harmless construct.

The unserialized version however is of class Cache and may have the cache directory set to whatever an attacker wants it to point at. For example this here:

O:5:"Cache":2:{s:8:"variable";s:6:"Busted";s:10:" Cache dir";s:12:"/htdocs/test";}

And there's not much you can do about it. The moment the serialized class above hits unserialize() a class of type Cache is created and like any class it will be destroyed. In our case this leads to the execution of the code in __destruct(). And unfortunately for us this means that our web root is going to be deleted.

If we would actually serialize Cache in our framework we probably would also use __wakeup(). This could make the problem even worse as we are probably re-initializing the class from variables that we previously stored on disk.__wakeup() is also called if an object is unserialized.

If we are working in an environment where classes are used unserializing network data is a major fuckup in the making. It is best not to use it at all for data. serialize/unserialize are for encoding and decoding to enforce persistence whereas json_encode and _decode are for encoding and decoding data objects to be shared. There are situations where serialize looks just sweeter. But it's bitter sweet and really just a fuckup going to happen. It's doing unwanted shit when it's doing it's magic on encoded data coming in from the network.

You cannot really use unserialize() in a safe way with such data. If you also use __wakeup() you'd have to check the object before you decode it. And quite obviously we don't want to do that. And even if the actual object could not do any harm through functions fired at some point after it was created it's mere existence could create a problem if it's not expected. And having no problem because of a lack of features or because we are just lucky isn't exactly good code.

Add new comment

This form is protected by Google Recaptcha. By clicking here you agree to include Google Recaptcha for this session. The page will reload and the form will become avaiable.