Quentin Santos

Obsessed with computers since 2002

Asynchronous I²C in C

Context

I have a RP2350 and an SSD1306 module. I want to use the former to display stuff on the latter. The most common modules are connected over I²C:

RP2350-USB-A running the Pico demo on an SSD1306 board with 128×64 OLED display. The yellow grid is a small breadboard, that makes it easy to connect things together without soldering.

The simple way to use such a display looks like this:

int main(void) {
    // init stuff
    // …

    // main loop
    char buffer[1025]; // 1 byte for command + 1-bit per pixel
    while (1) {
        do_work();
        put_pixel_data_in(buffer);
        update_display();
        buffer[0] = 0x40; // start drawing from line 0
        i2c_write_blocking(i2c_default, SSD1306_I2C_ADDR, buffer, sizeof buffer, false);
    }
}

Basically, to refresh the SSD1306, we need to send an I²C message with a command (first byte) and data (the rest of the buffer). The command 0x40 is “Set Display Start Line”. Specifically, 0x40 means “draw from line 0”, 0x41 means “draw from line 1”, and so on until 0x7F which means “draw from line 63”. Each byte that follows represents 8 pixels of the OLED display.

From page 31 of the SSD1306 datasheet

Now, this works, but it means your program stops completely until the frame is fully sent to the device. How long does it take to send 1025 bytes over I²C? Let’s see how fast the RP2350 and SSD1306 can talk I²C.

Details on the I²C transmission

tl;dr: 26 ms at “400 kHz”, and 11 ms at “1000 kHz”

From page 985 of the RP2350 datasheet. To be more precise, these values should be understood as I²C clock frequencies. Thus, the highest I²C speed supported is 1000 kHz.
From page 54 of the SSD1306 datasheet. Clock cycle of 2.5 µs means 400 kHz.

From this, the RP2350 supports I²C at up to 1000 kHz, but the SSD1306 is only specified up to 400 kHz. However, 1000 kHz can work in good conditions.

However, transmitting 8 bits actually takes 9 cycles. To better understand, this is what the electrical levels actually look like:

SDA is the pin where the DAta is actually transmitted. SCK is a regular ClocK that tells the other device when to read bits. There are small spikes whenever the clock is high, which probably corresponds to the SSD1306 reading the electrical level. This is convenient to see which values are being transmitted.

The SSD1306 actually contains a detailed version of the exchange:

From page 20 of the SSD1306 datasheet.

If you are interested in these details, below is an annotated version of the oscilloscope screenshot from above:

The message starts when SDA becomes low while SCK remains high. There is an acknowledgment bit every 8 bits. “W” indicates that the RP2350 is going to send (write) data to the SSD1306 (not the other way around). This is indicated by the R/W# being 0. “3C” is the hexadecimal value 0x3C, which corresponds to the hard-coded I²C address for the SSD1306. Note that the address is only 7 bits. The next byte corresponds to the 0x40 command. Then, we start sending a continuous stream of data bytes (as indicated by the “Co” bit of the control bit being low and “D/C#” bit being high).

In any case, the point is to show that it takes about 9 cycles to transmit a byte, so 9,225 cycles for the whole buffer, plus about 9 cycles for starting the message. At 400 kHz, this represents 23 ms.

Zooming out a bit:

Here, we can see a full frame (preceded by an almost full frame). Each graduation along the horizontal axis being 5 ms, we can tell that the whole thing lasts about 26 ms, which is actually significantly more than one would expect.

Let’s check the speed of the actual clock signal.

Here, we can see the 7 data bytes that follow the control bytes. They are all set to zero, indicating that the first 56 pixels should be turned off.

On the screen, we can see 77 cycles of the clock over 11 horizontal graduations. This corresponds to a cycle of 2.86 µs, or 350 kHz. In fact, when using the frequency counter feature of my oscilloscope, I get a measure of 357 kHz. This is consistent with a duration of 26 ms for the full transmission of a frame.

Interestingly, this does not mean we are hitting the limit. When I initialize the I²C for 1000 kHz, I get an actual clock rate of about 800 kHz, or about 11 ms to send a frame.

Busy wait

Now, how can we avoid blocking everything for, at best, 11 ms?

One approach could be to reduce the amount of data transmitted. For instance, instead of updating the whole screen at every refresh, we could just send the parts that changed. However, this requires significant additional logic to keep track of this efficiently, and can add quite a bit of overhead.

But the right question to ask is: why is the CPU waiting for the I²C transmission to complete? Transmitting data, be it UART, I²C, SPI or USB, is normally performed by a dedicated circuit that runs independently of the CPU. In normal circumstances, the CPU is not setting the level of a GPIO pin, sleeping for a bit, then changing the level again (also called “bit-banging”). The role of the CPU is just to feed a buffer that the dedicated circuit will read from.

From a similar explanation for UART on Stack Overflow. Here, the CPU would write to a special register, which would feed the data directly into a hardware FIFO buffer. That FIFO buffer is gradually sent over the line at the right tempo. The same happens in reverse for reception.

This is how the datasheet for the RP2350 describes its dedicated circuit for I²C:

Page 984 of the RP2350 datasheet. “Each”, because the RP2350 actually has two separate dedicated circuits for I²C, which allows it to talk to two devices in parallel. Note that this does not count other protocols or PIO.

From that description, we can see that the I²C controller has a FIFO buffer of 16 bytes. So, this is what actually happens when we transmit the 1,025 bytes of our buffer:

  • check if there is room in the FIFO; yes! → push a byte to the I²C FIFO
  • check if there is room in the FIFO; yes! → push a byte to the I²C FIFO
  • repeat 14 more times
  • check if there is room in the FIFO; no! → try again immediately
  • check if there is room in the FIFO; no! → try again immediately
  • check if there is room in the FIFO; no! → try again immediately
  • repeat many times
  • check if there is room in the FIFO; yes! → push a byte to the I²C FIFO
  • check if there is room in the FIFO; no! → try again immediately
  • check if there is room in the FIFO; no! → try again immediately
  • check if there is room in the FIFO; no! → try again immediately
  • repeat many times

And here is the corresponding code in i2c_write_blocking_internal():

for (byte_ctr = 0; byte_ctr < ilen; ++byte_ctr) {
    bool first = byte_ctr == 0;
    bool last = byte_ctr == ilen - 1;

    if (timeout_check) {
        timeout_check(ts, true); // for per iteration checks, this will reset the timeout
    }

    i2c->hw->data_cmd =
            bool_to_bit(first && i2c->restart_on_next) << I2C_IC_DATA_CMD_RESTART_LSB |
            bool_to_bit(last && !nostop) << I2C_IC_DATA_CMD_STOP_LSB |
            *src++;

    // Wait until the transmission of the address/data from the internal
    // shift register has completed. For this to function correctly, the
    // TX_EMPTY_CTRL flag in IC_CON must be set. The TX_EMPTY_CTRL flag
    // was set in i2c_init.
    do {
        if (timeout_check) {
            timeout = timeout_check(ts, false);
            abort |= timeout;
        }
        tight_loop_contents();
    } while (!timeout && !(i2c->hw->raw_intr_stat & I2C_IC_RAW_INTR_STAT_TX_EMPTY_BITS));

    // If there was a timeout, don't attempt to do anything else.
    if (!timeout) {
        abort_reason = i2c->hw->tx_abrt_source;
        if (abort_reason) {
            // Note clearing the abort flag also clears the reason, and
            // this instance of flag is clear-on-read! Note also the
            // IC_CLR_TX_ABRT register always reads as 0.
            i2c->hw->clr_tx_abrt;
            abort = true;
        }

        if (abort || (last && !nostop)) {
            // If the transaction was aborted or if it completed
            // successfully wait until the STOP condition has occurred.

            // TODO Could there be an abort while waiting for the STOP
            // condition here? If so, additional code would be needed here
            // to take care of the abort.
            do {
                if (timeout_check) {
                    timeout = timeout_check(ts, false);
                    abort |= timeout;
                }
                tight_loop_contents();
            } while (!timeout && !(i2c->hw->raw_intr_stat & I2C_IC_RAW_INTR_STAT_STOP_DET_BITS));

            // If there was a timeout, don't attempt to do anything else.
            if (!timeout) {
                i2c->hw->clr_stop_det;
            }
        }
    }

    // Note the hardware issues a STOP automatically on an abort condition.
    // Note also the hardware clears RX FIFO as well as TX on abort,
    // because we set hwparam IC_AVOID_RX_FIFO_FLUSH_ON_TX_ABRT to 0.
    if (abort)
        break;
}

The important part is the assignment to i2c->hw->data_cmd, which is what populates the sending FIFO buffer. If you want to see how this assignment maps to the hardware, keep going; otherwise, skip to the next section.

From src/rp2_common/hardware_i2c/include/hardware/i2c.h, we can see how the possible values of i2c are defined:

typedef struct i2c_inst i2c_inst_t;

// …
extern i2c_inst_t i2c0_inst;
extern i2c_inst_t i2c1_inst;

#define i2c0 (&i2c0_inst) ///< Identifier for I2C HW Block 0
#define i2c1 (&i2c1_inst) ///< Identifier for I2C HW Block 1

// …

struct i2c_inst {
    i2c_hw_t *hw;
    bool restart_on_next;
};

What we want to see is where the hw field is defined and set. In src/rp2_common/hardware_i2c/i2c.c:

i2c_inst_t i2c0_inst = {i2c0_hw, false};
i2c_inst_t i2c1_inst = {i2c1_hw, false};

Now, we look for i2c_hw_t, i2c0_hw and i2c1_hw. src/rp2040/hardware_structs/include/hardware/structs/i2c.h

typedef struct {
    // …
    _REG_(I2C_IC_DATA_CMD_OFFSET) // I2C_IC_DATA_CMD
    // I2C Rx/Tx Data Buffer and Command Register
    // 0x00000800 [11]    FIRST_DATA_BYTE (0) Indicates the first data byte received after the address...
    // 0x00000400 [10]    RESTART      (0) This bit controls whether a RESTART is issued before the...
    // 0x00000200 [9]     STOP         (0) This bit controls whether a STOP is issued after the...
    // 0x00000100 [8]     CMD          (0) This bit controls whether a read or a write is performed
    // 0x000000ff [7:0]   DAT          (0x00) This register contains the data to be transmitted or...
    io_rw_32 data_cmd;
    // …
} i2c_hw_t;

// …

#define i2c0_hw ((i2c_hw_t *)I2C0_BASE)
#define i2c1_hw ((i2c_hw_t *)I2C1_BASE)

We find I2C0_BASE and I2C1_BASE in src/rp2_common/cmsis/stub/CMSIS/Device/RP2350/Include/RP2350.h:

#define I2C0_BASE                   0x40090000UL
#define I2C1_BASE                   0x40098000UL

The _REG_ macro is actually just an IDE-friendly comment but, nonetheless, the value of I2C_IC_DATA_CMD_OFFSET is found in src/rp2350/hardware_regs/include/hardware/regs/i2c.h:

#define I2C_IC_DATA_CMD_OFFSET _u(0x00000010)

Note that I2C_IC_DATA_CMD_OFFSET is not actually used to set the position of the field in the struct. This is simply achieved by the order and size of the previous fields.

In other words, i2c->hw is just a pointer to either the fixed address in memory 0x40090000 or 0x40098000. Writing to i2c->hw->data_cmd writes a further 0x10 bytes from this (0x40090010 or 0x40098010).

And, indeed:

Bottom of page 1008 of the RP2350 datasheet
Top of page 1009 of the RP2350 datasheet

Going async

This was meant to be a short article but, as usual, I went down a few rabbit holes. Now, the point is that the CPU is sending a few bytes down a FIFO buffer, and filling it up as fast as possible. But, at 350 kHz, sending 16 bytes is going to take 411 µs. This is more than enough for the CPU to go and do something else.

Here is a summary of my first draft:

void do_ssd1306_sending(void) {
    if (left_to_send == 0) {
        return;
    }
    // fill the FIFO buffer
    size_t available = i2c_get_write_available(i2c_default);
    if (left_to_send < available) {
        available = left_to_send;
    }
    left_to_send -= available;
    while (available--) {
        i2c_default->hw->data_cmd = *sending++;
    }
}

int main(void) {
    while (1) {
        do_work();
        do_ssd1306_sending();
        if (should_do_new_frame()) {
            put_pixel_data_in(buffer);
            reset_ssd1306_sending();
        }
    }
}

This almost works!

This results in a surprising behavior, where the screen gradually shifts right, a single pixel turns on very briefly at each frame refresh.

The issue is that we never actually end the message. That code still refreshes the display because, after 1024 bytes of data, the SSD1306 driver chip cycles back to the first pixel. However, the next byte that it receives is not supposed to be data, but the 0x40 control byte for the next frame. Since 0x40 contains a single 1 bit, it turns on a single pixel towards the upper left corner of the screen, and proceeds to draw the next 1023 bytes with a 1-pixel shift. When it reaches the last data byte of the second frame, it cycles back to the first pixels, thus immediately erasing the pixel set by 0x40. And so on for every frame refresh.

The missing piece is that we need to indicate the start of messages to the I²C driver. To do this, there are actually some extra bits in the FIFO elements:

Page 993 of the RP2350 datasheet

Thus, we actually need to send this on the first byte:

i2c_default->hw->data_cmd = (1 << I2C_IC_DATA_CMD_RESTART_LSB) | 0x40;

Where I2C_IC_DATA_CMD_RESTART_LSB is just defined as 10 in src/rp2350/hardware_regs/include/hardware/regs/i2c.h. And this is exactly what i2c_write_blocking_internal() does!

Summary

We have changed the main loop slightly to call do_ssd1306_sending() regularly, which will gradually feed the I²C controller as needed. This does make transmission happen faster, but allows the CPU to do other things while this happens. Note that TinyUSB does something very similar with its tud_task() and tuh_task().

In some cases, feeding the bytes one by one can take a significant amount of CPU time. In such scenarios, a more advanced approach is used, where the controller directly reads from memory without waiting for the CPU to feed it. This is DMA. However, this is overkill for many usages, and I will probably not need anything more than properly using the FIFO.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

About me

I have always been fascinated with computers. Nowadays, I mostly use Rust, but I started out with a QuickBASIC book from the local library when I was in elementary school. I also got a Master in computer science from ENSL and a PhD in cryptography from ENS.

qsantos@qsantos.fr

All articles