Quentin Santos

Obsessed with computers since 2002

Author: Quentin Santos

  • Rust Strings for C Programmers

    This article will quickly explain the Rust types [T; N], &[T; N], &[T], Vec<T>, &Vec<T> with C code, and what the str, &str, String, OsString and CString add.

    Arrays and Slices

    RustC
    [T; N] (array)
    Example: [i32; 100]
    Allocated on the stack
    T[N]
    Example: int[100]
    Allocated on the stack
    &[T; N] (array reference)
    Example: &[i32; 100]
    N is tracked at compilation. Bounds-checks are done at runtime (opt-out using get_unchecked¹).
    const T[N] in function parameters or const T*
    Example: const int[100] or const int*
    Partially tracked at compilation², no access bounds-checks at runtime.
    &mut [T; N] (exclusive array reference)
    Example: &mut [i32; 100]
    Same as above, and allows writing to the array.
    T[N] in function parameters or T*
    Example: int[100] or int*
    Same as above, and allows writing to the array.
    Box<[T; N]> (boxed array)
    Example: Box<[i32; 100]>
    Same as &mut [T; N]³, but the underlying array is allocated on the heap. The memory is relinquished when the object is Dropped.
    T* from malloc()
    Example: int*
    Same as T*, but the underlying array is allocated on the heap. The memory must be relinquished manually by calling free().
    &[T] (slice reference)
    Example: &[i32]
    The size is not fixed at compile time. It will be tracked using an additional variable along with the base pointer. The two make a “fat pointer”. As for &[T; N], you can opt out of runtime bounds-checks using get_unchecked³.
    struct { const T *base; size_t size }Example: struct{const int *base;size_t size}
    In practice, many C functions will take the base pointers and the size as two separate parameters. For instance, you will see: memset(base, 0, size). The compiler won’t perform any bounds-checks automatically.
    &mut [T] (exclusive slice reference)
    Example: &[mut i32]
    Same as &[i32], and allows writing to the array.
    struct { T *base; size_t size }
    Example: struct{ int *base; size_t size }
    Same as above, and allows writing to the array.
    Vec<T> (vector)
    Example: Vec<i32>
    Lets you push (append) an arbitrary number of elements. &Vec<T> can automatically be coerced to &[T] because Vec<T> implements Deref, and &mut Vec<T> can automatically be coerced to &mut [T] because Vec<T> implements DerefMut⁴.
    struct{T* data;size_t size;size_t avail}
    Example: struct { int *data; size_t size; size_t avail }
    Implementation of a dynamic array using realloc(). You can pass the data and size fields to functions that expect a T* and size_t parameters.

    ¹ Note this is slice::get_unchecked. Rust lets you coerce a &[T; N] array into a &[T] slice (a bit like a const int[N] can decay into a int*). When you write v.get_unchecked(0), it implicitly means (&v).get_unchecked(). The compilers then figures out that it can use slice::get_unchecked even though &v is a reference to an array.

    ² From the point of view of the standard, T[N] is just syntactic sugar for T*, but compilers to emit warnings when they see an incorrect function call, such as in:

    void f(int p[100]);
    void g(void) {
        int v[10];
        f(v);
    }

    ³ Technically, you will need the Box<[T; N]> object itself to be mut to modify the underlying array:

    fn f() {
        let mut v = Box::new([0; 100]);
        v[2] = 1;
    }

    ⁴ This means you can write:

    fn f(v: &[i32]) { }
    fn g(v: &mut [i32]) { }
    fn main() {
        let mut v = vec![1, 2, 3];
        f(&v);
        g(&mut v);
    }
    

    With this, we can map the following patterns:

    RustC
    &p[..]base, size
    &p[2..]base + 2, size - 2
    &p[..40]base, 40
    &p[2..40]base + 2, 38

    The main difference is that the Rust version will do bound-checks, while the C version won’t. You can again use get_unchecked() to opt out of these checks, as it works with ranges just as well as indices.

    Strings

    Once you understand arrays and slices well, strings become easy:

    • str is a [u8] which is guaranteed to contain valid UTF-8 data
    • String is a Vec<u8> which is guaranteed to contain valid UTF-8 data
    • CStr is a [u8] which is guaranteed to be null-terminated
    • CString is a Vec<u8> which is guaranteed to be null-terminated
    • OsStr is a [u8] which is guaranteed to contain data valid for the system’s API⁵
    • OsString is a Vec<u8> which is guaranteed to contain data valid for the system’s API⁵
    • Path is an OsStr that is used to represent a path
    • PathBuf is an OsString that is used to represent a path

    ⁵ To understand why OsStr/OsString is different from CStr/CString, take a look at WTF-8.

    Since str is just a [u8], the patterns below work as well. The difference is that all the resulting &str must still contain valid UTF-8. In other words, you cannot slice in the middle of the UTF-8 encoding of a codepoint. As usual, you can opt out of the automatic checks by using get_unchecked, but you will have undefined behavior if the range you pass cuts in the middle of a codepoint.

    RustC
    &s[..]base, size
    &s[2..]base + 2, size - 2
    &s[..40]base, 40
    &s[2..40]base + 2, 38