Flexible Array Members: Typical C Shenanigans

Speed. I am Speed.

The one thing that C cares about is speed. Seriously, you might not get the right results, but at least, you will get them fast.

Let’s look at two common ways to represent a vector in C.

struct vector {
    unsigned long size;
    int data[42];
};

The first one is to just use the built-in array type. This approach is simple, but implies choosing the size of the vector when compiling. The program cannot decide the size depending on user-input or some other runtime parameter.

Note: Variable-Length Arrays (VLA) do allow declaring arrays whose length is decided at run-time. However, this is limited to local variables and is an optional feature since C11.

struct vector {
    unsigned long size;
    int* data;
};

The alternative is to not use an array at all. Instead, we use a pointer to a memory buffer that will contain the data. This memory location can be created with malloc, for instance.

Pointers can be used like arrays almost transparently, which brings (the “almost” means that most C programmer are paranoid about using sizeof on expressions). The example below shows how accessing an element at some index uses the same syntax in both cases.

But although the syntax is the same, the meaning is different. Look at the assembly code generated to implement vector_index. Indexing the built-in array uses a single MOV instruction; indexing through a pointer uses two MOV instructions.

It makes sense. rdi contains the address to the struct and rsi the value of index. In the first case, we shift by 8 bytes to skip the field size, then we move by index * 4 bytes to select the right array element, then we read the value at this memory location (indicated by the bracket in the Intel assembly syntax). In the second case, we need to first recover the value of the field data, at an offset of 8 bytes from the beginning of the struct, then read at the right offset from the address it contains.

In other words, using a pointer instead of a built-in array is twice as slow! Okay, not really. Any modern CPU is going to recompile, pipeline, prefetch and out-of-order execute this such that it really won’t matter. But it used to! And it might still matter on microcontrollers.

In fact! I did some benchmarks, and, although there was no difference on my laptop (CPU model i5-4210H), I did see a tremendously large difference on my Arduino Leonardo (microcontroller model ATmega32u4). Indeed, the code with the additional indirection through a pointer was 10 % slower.

…

Yeah, I did expect it to be worse than that. But still, 10 % was not something to scoff at in the previous millennium. Also, CPUs at the time might have seen a larger difference than my ATmega32U4 released not that long ago.

In summary, embedding a built-in C array is fast, but inconvenient. Using dynamic allocation with malloc is more flexible, but not as fast. How can we make our compiler generate faster code while keeping the convenience?

Well, we can lie to it.

Lies. So Many Lies.

Let’s look at how we will typically create a vector with the first approach:

struct vector {
    unsigned long size;
    int data[42];
};
struct vector *vector_new(unsigned long size) {
    assert(size == 42);
    struct vector *ret = malloc(sizeof(struct vector));
    ret->size = size;
    return ret;
}

Notice how malloc is just another C function, that can take any value as an argument at runtime? Notice how we are forcing ourselves to give it a compile-time value (the size of our struct)? What if we actually allocated the size we wanted?

For instance, if we allocated sizeof(struct vector) + 100 bytes, that would mean we would still have the 42 elements from the definition of struct vector, but we would have 100 additional bytes, enough to store 25 more 4-byte integers!

And so, a new idiom was born:

struct vector {
    unsigned long size;
    int data[1];
};
struct vector *vector_new(unsigned long size) {
    struct vector *ret = malloc(
        sizeof(struct vector)
        + sizeof(int) * (size - 1)
    );
    ret->size = size;
    return ret;
}

Since we are going to lie to the compiler about the size of the actual array anyway, we make the base size as small as possible. Since the C standard does not allow 0-length arrays (“An array type describes a contiguously allocated nonempty set of objects”), we just use 1. Then, when we create an instance of our vector, we are going to make sure to actually have enough memory for size elements. Since there is already 1 built-in, we need to allocate for a size - 1 elements in addition to the base struct.

C developers embraced this hack, wrote many fast programs, and lived happily ever after.

Well, kind of. I mean, if you write this, your program is probably going to be fast. Just as fast as a fixed-size array. But maybe it won’t. And maybe it will allow random people to take control of your computer. Or maybe it will blow up the spaceship. Who knows?

You might know where I am going with this: this is undefined behavior.

Yes, we did allocate enough memory to do what we want. But this is not the point. We lied to the compiler. And compilers are very delicate, gullible beings. It will believe you. It will believe that your array is actually of size 1. And it will do unmentionable things to your code, because this assumption is false.

Save one Character, Save the Entire World

When they observe developers writing C, compilers authors and language lawyers alike get the urge to start again from scratch. This is actually the reason why there are so many flood myths through the ages.

This time, however, they recognized that the end justified the means. Compiler authors, too, want the programs they write to be fast (except C++ compiler authors).

And so, GCC introduced a way to do this:

struct vector {
    unsigned long size;
    int data[0];
};
struct vector *vectornew(unsigned long size) {
    struct vector *ret = malloc(
        sizeof(struct vector)
        + sizeof(int) * size
    );
    ret->size = size;
    return ret;
}

Notice the difference? We just replaced 1 by 0 for the size of the array.

But you said the C standard did not allow this!
Someone who pays attention to details

Indeed! But GNU C does! GNU C is the dialect that GCC understands, and it allows for 0-length arrays as an extension.

Of course, if you were using another compiler, you were out of luck… until!

Finally, in the C99 revision of the C standard, a new syntax was introduced!

struct vector {
    unsigned long size;
    int data[];
};
struct vector *vector_new(unsigned long size) {
    struct vector *ret = malloc(
        sizeof(struct vector)
        + sizeof(int) * size
    );
    ret->size = size;
    return ret;
}

Can you see the difference? We just removed the 0.

That’s all?
Someone who wasted too much time reading this; but worry not, it took even longer to write!

Yup. This syntax tells the compiler to not assume anything about the size of the array. You can still use it as usual. But ensuring that you allocate enough memory, and that you do not go out-of-bounds, is up-to-you.

Less is More

We went from 1, to 0, to nothing. Since there is nothing left to take away, we attained perfection. It’s a general truism with C: the less C, the better.

Quentin Santos

Make it work. Make it right. Make it fast.

Flexible Array Members: Typical C Shenanigans

Speed. I am Speed.

Lies. So Many Lies.

Save one Character, Save the Entire World

Less is More

Leave a Reply Cancel reply