4.6 — Fixed-width integers and size_t

In the previous lessons on integers, we covered that C++ only guarantees that integer variables will have a minimum size -- but they could be larger, depending on the target system.

For example, an int has a minimum size of 16-bits, but its typically 32-bits on modern architectures.

If you assume an int is 32-bits because that’s most likely, then your program will probably misbehave on architectures where int is actually 16-bits (since you will probably be storing values that require 32-bits of storage in a variable with only 16-bits of storage, which will cause overflow or undefined behavior).

For example:

#include <iostream>

int main()
{
    int x { 32767 };        // x may be 16-bits or 32-bits
    x = x + 1;              // 32768 overflows if int is 16-bits, okay if int is 32-bits
    std::cout << x << '\n'; // what will this print?

    return 0;
}

On a machine where int is 32-bits, the value 32768 fits within the range of an int, and therefore can be stored in x without issue. On such a machine, this program will print 32768. However, on a machine where int is 16-bits, the value 32768 does not fit within the range of a 16-bit integer (which has range -32,768 to 32,767). On such a machine, x = x + 1 will cause overflow, and the value -32767 will be stored in x and then printed.

Instead, if you assume an int is only 16-bits to ensure your program will behave on all architectures, then the range of values you can safely store in an int is significantly limited. And on systems where int is actually 32-bits, you’re not making use of half of the memory allocated per int.

Key insight

In most cases, we only instantiate a small number of int variables at a time, and these are typically destroyed at the end of the function in which they are created. In such cases, wasting 2 bytes of memory per variable isn’t a concern (the limited range is a bigger issue). However, in cases where our program allocates millions of int variables, wasting 2 bytes of memory per variable can have a significant impact on the program’s overall memory usage.

Why isn’t the size of the integer types fixed?

The short answer is that this goes back to the early days of C, when computers were slow and performance was of the utmost concern. C opted to intentionally leave the size of an integer open so that the compiler implementers could pick a size for int that performs best on the target computer architecture. That way, programmers could just use int without having to worry about whether they could be using something more performant.

By modern standards, the lack of consistent ranges for the various integral types sucks (especially in a language designed to be portable).

Fixed-width integers

To address the above issues, C++11 provides an alternate set of integer types that are guaranteed to be the same size on any architecture. Because the size of these integers is fixed, they are called fixed-width integers.

The fixed-width integers are defined (in the <cstdint>\ header) as follows:

Name Fixed Size Fixed Range Notes
std::int8_t 1 byte signed -128 to 127 Treated like a signed char on many systems. See note below.
std::uint8_t 1 byte unsigned 0 to 255 Treated like an unsigned char on many systems. See note below.
std::int16_t 2 byte signed -32,768 to 32,767
std::uint16_t 2 byte unsigned 0 to 65,535
std::int32_t 4 byte signed -2,147,483,648 to 2,147,483,647
std::uint32_t 4 byte unsigned 0 to 4,294,967,295
std::int64_t 8 byte signed -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
std::uint64_t 8 byte unsigned 0 to 18,446,744,073,709,551,615

Here’s an example:

#include <cstdint> // for fixed-width integers
#include <iostream>

int main()
{
    std::int32_t x { 32767 }; // x is always a 32-bit integer
    x = x + 1;                // so 32768 will always fit
    std::cout << x << '\n';

    return 0;
}

Best practice

Use a fixed-width integer type when you need an integral type that has a guaranteed range.

Warning: std::int8_t and std::uint8_t typically behave like chars

Due to an oversight in the C++ specification, modern compilers typically treat std::int8_t and std::uint8_t (and the corresponding fast and least fixed-width types, which we’ll introduce in a moment) the same as signed char and unsigned char respectively. Thus on most modern systems, the 8-bit fixed-width integral types will behave like char types.

As a quick teaser:

#include <cstdint> // for fixed-width integers
#include <iostream>

int main()
{
    std::int8_t x { 65 };   // initialize 8-bit integral type with value 65
    std::cout << x << '\n'; // You're probably expecting this to print 65

    return 0;
}

Although you’re probably expecting the above program to print 65, it most likely won’t.

We discuss what this example actually prints (and how to ensure it always prints 65) in lesson 4.12 -- Introduction to type conversion and static_cast, after we cover chars (and how they print) in lesson 4.11 -- Chars.

Warning

The 8-bit fixed-width integer types are often treated like chars instead of integer values (and this may vary per system). The 16-bit and wider integral types are not subject to this issue.

For advanced readers

The fixed-width integers actually don’t define new types -- they are just aliases for existing integral types with the desired size. For each fixed-width type, the implementation (the compiler and standard library) gets to determine which existing type is aliased. As an example, on a platform where int is 32-bits, std::int32_t will be an alias for int. On a system where int is 16-bits (and long is 32-bits), std::int32_t will be an alias for long instead.

So what about the 8-bit fixed-width types?

In most cases, std::int8_t is an alias for signed char because it is the only available 8-bit signed integral type (bool and char are not considered to be signed integral types). And when this is the case, std::int8_t will behave just like a char on that platform.

However, in rare cases, if a platform has an implementation-specific 8-bit signed integral type, the implementation may decide to make std::int8_t an alias for that type instead. In that case, std::int8_t will behave like that type, which may be more like an int than a char.

std::uint8_t behaves similarly.

Other fixed-width downsides

The fixed-width integers have some potential downsides:

First, the fixed-width integers are not guaranteed to be defined on all architectures. They only exist on systems where there are fundamental integral types that match their widths and following a certain binary representation. Your program will fail to compile on any such architecture that does not support a fixed-width integer that your program is using. However, given that modern architectures have standardized around 8/16/32/64-bit variables, this is unlikely to be a problem unless your program needs to be portable to some exotic mainframe or embedded architectures.

Second, if you use a fixed-width integer, it may be slower than a wider type on some architectures. For example, if you need an integer that is guaranteed to be 32-bits, you might decide to use std::int32_t, but your CPU might actually be faster at processing 64-bit integers. However, just because your CPU can process a given type faster doesn’t mean your program will be faster overall -- modern programs are often constrained by memory usage rather than CPU, and the larger memory footprint may slow your program more than the faster CPU processing accelerates it. It’s hard to know without actually measuring.

These are just minor quibbles though.

Fast and least integral types Optional

To help address the above downsides, C++ also defines two alternative sets of integers that are guaranteed to exist.

The fast types (std::int_fast#_t and std::uint_fast#_t) provide the fastest signed/unsigned integer type with a width of at least # bits (where # = 8, 16, 32, or 64). For example, std::int_fast32_t will give you the fastest signed integer type that’s at least 32-bits. By fastest, we mean the integral type that can be processed most quickly by the CPU.

The least types (std::int_least#_t and std::uint_least#_t) provide the smallest signed/unsigned integer type with a width of at least # bits (where # = 8, 16, 32, or 64). For example, std::uint_least32_t will give you the smallest unsigned integer type that’s at least 32-bits.

Here’s an example from the author’s Visual Studio (32-bit console application):

#include <cstdint> // for fast and least types
#include <iostream>

int main()
{
	std::cout << "least 8:  " << sizeof(std::int_least8_t)  * 8 << " bits\n";
	std::cout << "least 16: " << sizeof(std::int_least16_t) * 8 << " bits\n";
	std::cout << "least 32: " << sizeof(std::int_least32_t) * 8 << " bits\n";
	std::cout << '\n';
	std::cout << "fast 8:  "  << sizeof(std::int_fast8_t)   * 8 << " bits\n";
	std::cout << "fast 16: "  << sizeof(std::int_fast16_t)  * 8 << " bits\n";
	std::cout << "fast 32: "  << sizeof(std::int_fast32_t)  * 8 << " bits\n";

	return 0;
}

This produced the result:

least 8:  8 bits
least 16: 16 bits
least 32: 32 bits

fast 8:  8 bits
fast 16: 32 bits
fast 32: 32 bits

You can see that std::int_least16_t is 16-bits, whereas std::int_fast16_t is actually 32-bits. This is because on the author’s machine, 32-bit integers are faster to process than 16-bit integers.

As another example, let’s assume we’re on an architecture that has only 16-bit and 64-bit integral types. std::int32_t would not exist, whereas std::least_int32_t (and std::fast_int32_t) would be 64 bits.

However, these fast and least integers have their own downsides. First, not many programmers actually use them, and a lack of familiarity can lead to errors. Then the fast types can also lead to memory wastage, as their actual size may be significantly larger than indicated by their name.

Most seriously, because the size of the fast/least integers is implementation-defined, your program may exhibit different behaviors on architectures where they resolve to different sizes. For example:

#include <cstdint>
#include <iostream>

int main()
{
    std::uint_fast16_t sometype { 0 };
    sometype = sometype - 1; // intentionally overflow to invoke wraparound behavior

    std::cout << sometype << '\n';

    return 0;
}

This code will produce different results depending on whether std::uint_fast16_t is 16, 32, or 64 bits!

Best practice

Avoid the fast and least integral types.

Best practices for integral types

Given the various pros and cons of the fundamental integral types, the fixed-width integral types, the fast/least integral types, and signed/unsigned challenges, there is little consensus on integral best practices.

Our stance is that it’s better to be correct than fast, and better to fail at compile time than runtime. Therefore, if you need an integral type with a guaranteed range, we recommend avoiding the fast/least types in favor of the fixed-width types. If you later discover the need to support an esoteric platform for which a specific fixed-width integral type won’t compile, then you can decide how to migrate your program (and thoroughly retest) at that point.

Best practice

  • Prefer int when the size of the integer doesn’t matter (e.g. the number will always fit within the range of a 2-byte signed integer). For example, if you’re asking the user to enter their age, or counting from 1 to 10, it doesn’t matter whether int is 16-bits or 32-bits (the numbers will fit either way). This will cover the vast majority of the cases you’re likely to run across.
  • Prefer std::int#_t when storing a quantity that needs a guaranteed range.
  • Prefer std::uint#_t when doing bit manipulation or well-defined wrap-around behavior is required (e.g. for cryptography or random number generation).

Avoid the following when possible:

  • short and long integers (prefer a fixed-width integer type instead).
  • The fast and least integral types (prefer a fixed-width integer type instead).
  • Unsigned types for holding quantities (prefer a signed integer type instead).
  • The 8-bit fixed-width integer types (prefer a 16-bit fixed-width integer type instead).
  • Any compiler-specific fixed-width integers (for example, Visual Studio defines __int8, __int16, etc…)

What is std::size_t?

Consider the following code:

#include <iostream>

int main()
{
    std::cout << sizeof(int) << '\n';

    return 0;
}

On the author’s machine, this prints:

4

Pretty simple, right? We can infer that operator sizeof returns an integer value -- but what integral type is that return value? An int? A short? The answer is that sizeof returns a value of type std::size_t. std::size_t is an alias for an implementation-defined unsigned integral type. In other words, the compiler decides if std::size_t is an unsigned int, an unsigned long, an unsigned long long, etc…

Key insight

std::size_t is an alias for an implementation-defined unsigned integral type. It is used within the standard library to represent the byte-size or length of objects.

For advanced readers

std::size_t is actually a typedef. We cover typedefs in lesson 10.7 -- Typedefs and type aliases.

std::size_t is defined in a number of different headers. If you need to use std::size_t, <cstddef> is the best header to include, as it contains the least number of other defined identifiers.

For example:

#include <cstddef>  // for std::size_t
#include <iostream>

int main()
{
    int x { 5 };
    std::size_t s { sizeof(x) }; // sizeof returns a value of type std::size_t, so that should be the type of s
    std::cout << s << '\n';

    return 0;
}

Best practice

If you use std::size_t explicitly in your code, #include one of the headers that defines std::size_t (we recommend <cstddef>).

Using sizeof does not require a header (even though it return a value whose type is std::size_t).

Much like an integer can vary in size depending on the system, std::size_t also varies in size. std::size_t is guaranteed to be unsigned and at least 16 bits, but on most systems will be equivalent to the address-width of the application. That is, for 32-bit applications, std::size_t will typically be a 32-bit unsigned integer, and for a 64-bit application, std::size_t will typically be a 64-bit unsigned integer.

The sizeof operator returns a value of type std::size_t Optional

Author’s note

The following sections are optional reading. It is not critical that you understand what follows.

Amusingly, we can use the sizeof operator (which returns a value of type std::size_t) to ask for the size of std::size_t itself:

#include <cstddef> // for std::size_t
#include <iostream>

int main()
{
	std::cout << sizeof(std::size_t) << '\n';

	return 0;
}

Compiled as a 32-bit (4 byte) console app on the author’s system, this prints:

4

std::size_t imposes an upper limit on the size of an object created on the system Optional

sizeof must be able to return the byte-size of an object as a value of type std::size_t. Therefore, the byte-size of an object can be no larger than the largest value std::size_t can hold.

The C++20 standard ([basic.compound] 1.8.2) says: “Constructing a type such that the number of bytes in its object representation exceeds the maximum value representable in the type std::size_t (17.2) is ill-formed.”

If it were possible to create a larger object, sizeof would not be able to return its byte-size, as it would be outside the range that a std::size_t could hold. Thus, creating an object with a size (in bytes) larger than the largest value an object of type std::size_t can hold is invalid (and will cause a compile error).

For example, let’s assume that std::size_t has a size of 4 bytes on our system. An unsigned 4-byte integral type has range 0 to 4,294,967,295. Therefore, a 4-byte std::size_t object can hold any value from 0 to 4,294,967,295. Any object with a byte-size of 0 to 4,294,967,295 could have it’s size returned in a value of type std::size_t, so this is fine. However, if the byte-size of an object were larger than 4,294,967,295 bytes, then sizeof would not be able to return the size of that object accurately, as the value would be outside the range of a std::size_t. Therefore, no object larger than 4,294,967,295 bytes could be created on this system.

As an aside…

The size of std::size_t imposes a strict mathematical upper limit to an object’s size. In practice, the largest creatable object may be smaller than this amount (perhaps significantly so).

Some compilers limit the largest creatable object to half the maximum value of std::size_t (an explanation for this can be found here).

Other factors may also play a role, such as how much contiguous memory your computer has available for allocation.

When 8-bit and 16-bit applications were the norm, this limit imposed a significant constraint on the size of objects. In the 32-bit and 64-bit era, this is rarely an issue, and therefore not something you generally need to worry about.

guest
Your email address will not be displayed
Find a mistake? Leave a comment above!
Correction-related comments will be deleted after processing to help reduce clutter. Thanks for helping to make the site better for everyone!
Avatars from https://gravatar.com/ are connected to your provided email address.
Notify me about replies:  
509 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments