PHP serialization/unserialization has several drawbacks 1.
On the serialization side, the Serializable
interface:
- breaks hard and soft references inside serialized data structures;
- delegates the responsibility of the serialization format to userland
implementations, to the detriment of optimized formats that e.g.
igbinary
provides.
On the unserialization side:
- security exploits have been demonstrated when using
unserialize()
on user-submitted data; - serialized string referencing missing classes create placeholder objects of
type
PHP_Incomplete_Class
, which behave in an unusual manner and most importantly break the semantics of the original structure.
The root of these security issues is that creating objects out of serialized
strings can led to code execution, namely of the callable defined by the
unserialize_callback
ini setting and/or of the __wakeup()
, unserialize()
and/or __destruct()
methods. The first three are part of the typical
unserialization lifecycle: a security issue caused by them would be the
responsibility of their authors. But __destruct()
is much more nasty: authors
usually don't think of it as an attack vector and thus fail to implement needed
safety measures (which could e.g. consist of throwing an exception in a
__wakeup()
method).
To mitigate these security issues, the unserialize()
function handles an
allowed_classes
option since PHP 7.0. Implementing Serializable
has this
security-mitigation advantage of allowing authors to filter the allowed classes
in the subgraph managed by their objects. This feature is only a mitigation
because not all use cases know all the possible classes beforehand.
- handle a new
__serialize(): array
method, replacing__sleep()
andSerializable::serialize()
when implemented; - serialize the returned array using a new
S:
type (e.g for an object of classFoo
whose__serialize()
method returns[123]
:S:3:"Foo":a:1:{i:0;i:123;}
); - forbid using
C:
orO:
for classes implementing__serialize()
; - handle a new
__unserialize(array $data, array $nested_objects): void
method, replacing__wakeup()
andSerializable::serialize()
when implemented; - have
$data
set to the unserialized value; - for validation purposes, have
$nested_objects
contain the list of all objects in$data
, excluding those already inspected by nested implementations of__unserialize()
; - have the
unserialize()
function handle a newvalidation_callback
option that would accept a$nested_objects
argument with same semantics as above; - have the PHP engine disable any destructors found in the unserialized value
whenever the
unserialize()
function throws anyThrowable
or terminates the script execution (alternatively, if disabling destructors is not technically possible, the engine should empty all properties of unserialized objects.)
- fixing compatibility with soft and hard references;
- moving the responsiblity of the serialization format to the outside of the userland serialization steps;
- same or higher validation capabilities of the unserialized objects/classes;
- ability to reject
PHP_Incomplete_Class
instances independently from theunserialize_callback
ini setting; - higher security by not calling destructors on any early termination of
unserialize()
.
The global unserialize_callback
ini setting and the related
PHP_Incomplete_Class
objects could be left unchanged. But we could also take
this RFC as an opportunity to make enabling the validation_callback
option
also disable them and always throw a specific type of Throwable
instead.
As described before 2, having __serialize()
and __unserialize()
be magic
methods has a distinct backward compatibility advantage. For this reason, this
RFC doesn't mention any new interface that implementations should use.
Instead, the PHP engine should have a rule that checks that both methods are defined at the same time (implementing only one of them would make no sense) and that they have the expected signature.