Someone asked a question some weeks back about why size_of::<Option<T>>
is always double. Answer is because of alignment.
C
doesn't have the ability to directly represent complex Rust enum, hence, the need for a workaround. To understand that, let's take a look at how Option<i32>
is represnted in C
.
E.g. Given:
#[repr(C)]
enum Option {
Some(i32),
None
}
It will get expanded to the following Rust code which is compatible with C:
/// This is our focus
#[repr(C)]
struct OptionRepr {
tag: OptionDiscriminant,
payload: OptionUnion
}
/// Before now, I didn't know there was a `Union` type in Rust. Apparently
/// it requires unsafe for performing operations on it. You can read more here:
/// https://doc.rust-lang.org/reference/items/unions.html).
#[repr(C)]
union OptionUnion {
Some: OptionSomeVariant,
None: OptionNoneVariant
}
#[repr(C)]
struct OptionSomeVariant(i32)
#[repr(C)]
struct OptionNoneVariant;
With the above, you'll noticed that to properly represent an enum, you'll require:
OptionUnion
which holds the variants of the enum (OptionSomeVariant
andOptionNoneVariant
), andOptionDiscriminant
which holds the keys of the variant (just like a dictionary key).
The above are then combined to have the OptionRepr
. That means if we ever want to access the field in an Option, we just do something like option_repr.payload[option_repr.tag]
(just like a dictionary).
For reference, there's a part in Rust book explains how enums are expanded to Unions and struct.
For context:
println!("{}", std::mem::size_of::<i32>()); // 4
println!("{}", std::mem::size_of::<Option<i32>>()); // 8
Still using our Rust-to-C representation of Option<i32>
:
size_of::<OptionRepr>()
==size_of::<OptionDiscriminant>() + size_of::<OptionUnion>()
.
Let's do some maths:
size_of::<i32>()
==4
size_of::<OptionDiscriminant>()
==1
size_of::<OptionSomeVariant>()
==size_of::<i32>()
==4
size_of::<OptionNoneVariant>()
==1
size_of::<OptionUnion>()
==max(size_of::<OptionSomeVariant>, size_of::<OptionNoneVariant>())
==max(1, 4)
==4
size_of::<OptionRepr>()
==size_of::<OptionDiscriminant>() + size_of::<OptionUnion>
==1 + 4
==5
By our calculation above, that means the size of our enum by default is 5 (which is very different from the 8 we were expecting).
That's because of the concept of alignment. Aligment allows us know the address a value can be stored in, and it usually a multiple of 2. Because of it's a multiple of a known value, along with the size of the value being stored, we can:
- Easily allocate memory before-hand,
- Know the upper and lower bound position of the value in memory,
- It's location in memory is more-deterministic.
To learn more about alignment, you can read this article.
Let's find out what the alignment is for our example:
println!("{}", std::mem::align_of::<i32>()); // 4
println!("{}", std::mem::align_of::<Option<i32>>()); // 4
What the above tells us is that, the alignment for Option<i32>
is always the alignment of i32
.
Now that we know the alignment, to get the size of any value, we basically round up the size of the value to the next multiple of the alignment.
To explain the above, lets look at this function:
fn size_by_alignment(raw_size_of_t: usize, alignment_of_t: usize) -> usize {
let padding = alignment_of_t - (raw_size_of_t % alignment_of_t);
raw_size_of_t + padding
}
You'll notice that alignment is basically rounding up the original size of T
to the next multiple of it's alignment.
That means, for our example OptionRepr
, our function will be called as size_by_alignment(5, 4)
, which returns 8
because that's the neareast alignment multiple for our original size.
That is the reason why size_of::<Option<i32>>()
is always 8
because it's rounded off to the next alignment value from 5
to 8
.
There's an exception, and that's for simple enum types. And in such cases, the size remains 1
even when wrapped in an Option
.
Let's look at an example:
enum Animal {
Chicken,
Goat,
}
The above will be represented in C as:
#[repr(C)]
enum Animal {
Chicken = 0,
Goat = 1
}
Let's get the size of the Animal
enum:
println!("{}", std::mem::size_of::<Animal>()); // 1
println!("{}", std::mem::size_of::<Option<Animal>>()); // 1
Since the above size is directly compatible with C enums, there's no need for us to represent is as we did for OptionRepr
because it's an overkill. That will mean, the size of Animal
will always size_of::<u8>()
, which is 1
. That's because the value of simple enum are stored as u8
representation in memory.
Remember, if we then proceed to call size_by_alignment(1, 1)
, we'll always still get a 1
which is consistent with the explanation given earlier.
Explaining the reasoning behind alignment is outside the scope of this article, but this article does a very good job explaining it.
Nice. Also worth remembering the superpowers of
Option<NonZeroI32>
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=738bb6ba20d11063f9eb45ec959ada5b
(If you need zero there's also the nonmax crate https://crates.io/crates/nonmax )