Integer types

This is meant to be a summary of the current status and the questions around integer fallback and type names. It's meant to be an objective overview so we can discuss at the work week and have the same starting point. (Obviously I have an opinion here, so it might not be totally objective but I'll try).

Current status

We have two 'current' places - the current implementation and the current 'spec'

i.e., current implementation + accepted RFC.

The current implementation is that we have fixed size (u8, u16, u32, u64, i8, i16, i32, i64) and pointer sized integer types (int, uint). Integer literals must have an inferrable type or be annotated with a suffix (u, u8, etc.). If a type cannot be inferred there is a type error.

We have previously accepted RFC 212. The gist of that RFC is that if the compiler cannot infer a precise type for an integer, it will assume int. We had previously had similar behaviour, it was removed because we thought it was not very important in real world code and because it introduces some scope for unexpected overflow errors. However, not having the fallback is extremely irritating in toy examples and tutorials. (The fallback in RFC 212 is not exactly the same as the old fallback - we will allow it to interact with inference, previously it did not).

We have not implemented RFC 212 because it is backwards compatible and not super-high priority for 1.0.

Proposals/open questions

There are three interelated questions:

1 Should we fallback to i32 instead of int? 2 Should we rename int and uint? 3 Should we use i32 rather than int in our examples and where it doesn't matter what type we use (because there will never be overflow)?

We could consider these seperately, but I think they are interelated because it would be strange (though not impossible) for our 'hard' default (i.e., the compiler's fallback) and our 'soft' default (what we use in examples, etc.) to be different. If we do decide to switch our defaults to i32, then it become odd for a pointer-sized int to remain called int (likewise, uint) because it is the most intuitive name for an integer, but we would otherwise discourage its use, and because it does not indicate the size of the integer like the other type names do.

1 is proposed in (RFC 452)[rust-lang/rfcs#452], 2 is proposed in (RFC 464)[rust-lang/rfcs#464]. There seems to be broad (but not unanimous) support from the community for both RFCs (although there are worries that this is an issue with a 'silent majority' who support the status quo or don't care). The consensus in the discussion for RFC 464 is for iptr and uptr, though there are several alternatives (imem, isize, index, offset/addr, and so forth).

An open question is: if we do rename int, should we then keep int as an alias for i32 to make toy examples more friendly to newcomers?

Motivation

There are a few reasons to prefer i32 as a default:

overflow safety - its easier to reason about overflow when the width of the integer is known. Using a pointer sized integer is only appropriate (from the point of view of overflow) when you are dealing with pointers or array indices or similar situations (or if you know the integer will never hold a value larger than the smallest possible size, but then using a fixed size integer of the smallest size is more efficient).
for 16bit platforms, a pointer-sized integer is too small for many cases
i32 is generally faster than i64 (particularly an issue for benchmarks)

Does it matter?

I think the key question is not in fact what is right, but if it matters at all (our conclusion from the last work week was pretty much 'no'). Since in real software, use of the fallback is rare, the effect of changing the fallback is very small. I see two counter arguments: pedagogy and benchmarks. We should encourage new programmers to do the 'right thing' around integers and overflow, and having a fixed size default integer encourages this (note, that I am not making the direct argument that changing the integer fallback will lead to safer Rust programs, only that it will, to some minor extent, encourage safer behaviour). The second worry is that we could lose benchmark points if someone naively implements a benchmark in Rust without using type annotations and runs on a 64 bit platform, then they will get the slower 64 bit integer instead of 32. Given the difficulty in writing even a small benchmark program with no integer type annotations, I'm not sure how realistic this worry is.

Data

This is the kind of question some data could really help with. The best idea I can come up with is to implement the fallback to both int and i32 and instrument the compiler to count the number of annotations we could elide either way. I'm not sure its easy to do the instrumentation, however. And the implementation work itself is non-trivial. And this is only really useful data if we assume that we are using the 'correct' annotations, which, arguably, we are not.

Conclusion and recommendation

I believe that the right thing to do is to fallback to i32 (mainly for overflow safety). I had previously supported this change because it seemed like a very minor change to an accepted but un-implemented RFC. However, if we make this change, it seems that we must also change the naming of integer types and our 'soft' default, and then it becomes a lot of work. Given that the practical effects of the change are small, that the work required to make the changes is backwards incompatible (i.e., must happen before 1.0) and significant, and that collecting really informative data will be difficult, I think the best thing to do is to stick with the status quo (which make me a little sad, but such is life).

If we agree, then we should write this up properly and explain it well to the community, in particular on the RFCs.

nrc/ints.md