This article will quickly explain the Rust types [T; N], &[T; N], &[T], Vec<T>, &Vec<T> with C code, and what the str, &str, String, OsString and CString add.
Arrays and Slices
Rust | C |
|---|---|
[T; N] (array)Example: [i32; 100]Allocated on the stack | T[N]Example: int[100]Allocated on the stack |
&[T; N] (array reference)Example: &[i32; 100]N is tracked at compilation. Bounds-checks are done at runtime (opt-out using get_unchecked¹). | const T[N] in function parameters or const T*Example: const int[100] or const int*Partially tracked at compilation², no access bounds-checks at runtime. |
&mut [T; N] (exclusive array reference)Example: &mut [Same as above, and allows writing to the array. | T[N] in function parameters or T*Example: int[100] or int*Same as above, and allows writing to the array. |
Box<[T; N]> (boxed array)Example: Box<[Same as &mut [T; N]³, but the underlying array is allocated on the heap. The memory is relinquished when the object is Dropped. | T* from malloc()Example: int*Same as T*, but the underlying array is allocated on the heap. The memory must be relinquished manually by calling free(). |
&[T] (slice reference)Example: &[The size is not fixed at compile time. It will be tracked using an additional variable along with the base pointer. The two make a “fat pointer”. As for &[T; N], you can opt out of runtime bounds-checks using get_unchecked³. | struct { const T *base; size_t size }Example: struct{const int *base;size_t size}In practice, many C functions will take the base pointers and the size as two separate parameters. For instance, you will see: memset(base, 0, size). The compiler won’t perform any bounds-checks automatically. |
&mut [T] (exclusive slice reference)Example: &[mut i32]Same as &[i32], and allows writing to the array. | struct { T *base; size_t size }Example: struct{ int *base; size_t size }Same as above, and allows writing to the array. |
Vec<T> (vector)Example: Vec<i32>Lets you push (append) an arbitrary number of elements. &Vec<T> can automatically be coerced to &[T] because Vec<T> implements Deref, and &mut Vec<T> can automatically be coerced to &mut [T] because Vec<T> implements DerefMut⁴. | struct{T* data;size_t size;size_t avail}Example: struct { int *data; size_t size; size_t avail }Implementation of a dynamic array using realloc(). You can pass the data and size fields to functions that expect a T* and size_t parameters. |
¹ Note this is slice::get_unchecked. Rust lets you coerce a &[T; N] array into a &[T] slice (a bit like a const int[N] can decay into a int*). When you write v.get_unchecked(0), it implicitly means (&v).get_unchecked(). The compilers then figures out that it can use slice::get_unchecked even though &v is a reference to an array.
² From the point of view of the standard, T[N] is just syntactic sugar for T*, but compilers to emit warnings when they see an incorrect function call, such as in:
void f(int p[100]);
void g(void) {
int v[10];
f(v);
}
³ Technically, you will need the Box<[T; N]> object itself to be mut to modify the underlying array:
fn f() {
let mut v = Box::new([0; 100]);
v[2] = 1;
}
⁴ This means you can write:
fn f(v: &[i32]) { }
fn g(v: &mut [i32]) { }
fn main() {
let mut v = vec![1, 2, 3];
f(&v);
g(&mut v);
}
With this, we can map the following patterns:
| Rust | C |
|---|---|
&p[..] | base, size |
&p[2..] | base + 2, size - 2 |
&p[..40] | base, 40 |
&p[2..40] | base + 2, 38 |
The main difference is that the Rust version will do bound-checks, while the C version won’t. You can again use get_unchecked() to opt out of these checks, as it works with ranges just as well as indices.
Strings
Once you understand arrays and slices well, strings become easy:
stris a[u8]which is guaranteed to contain valid UTF-8 dataStringis aVec<u8>which is guaranteed to contain valid UTF-8 dataCStris a[u8]which is guaranteed to be null-terminatedCStringis aVec<u8>which is guaranteed to be null-terminatedOsStris a[u8]which is guaranteed to contain data valid for the system’s API⁵OsStringis aVec<u8>which is guaranteed to contain data valid for the system’s API⁵Pathis anOsStrthat is used to represent a pathPathBufis anOsStringthat is used to represent a path
⁵ To understand why OsStr/OsString is different from CStr/CString, take a look at WTF-8.
Since str is just a [u8], the patterns below work as well. The difference is that all the resulting &str must still contain valid UTF-8. In other words, you cannot slice in the middle of the UTF-8 encoding of a codepoint. As usual, you can opt out of the automatic checks by using get_unchecked, but you will have undefined behavior if the range you pass cuts in the middle of a codepoint.
| Rust | C |
|---|---|
&s[..] | base, size |
&s[2..] | base + 2, size - 2 |
&s[..40] | base, 40 |
&s[2..40] | base + 2, 38 |