Tiny Docker Containers with Rust

The Magic

Thanks to Python, I have gotten used to Docker images that takes minutes (or dozens of minutes) to build and hundreds of megabytes to store and upload.

FROM python3:3.14

# install everything
RUN apt-get update \
    && apt-get install -y --no-install-recommends
    libsome-random-library0 libanother-library0 \
    libmaybe-its-for-pandas-i-dunno0 libit-goes-on0 liband-on0 \
    liband-on-and-on0 \
    && rm -rf /var/lib/apt/lists/*

# install more of everything
COPY requirements.txt .
RUN pip3 install -r requirements.txt

ENTRYPOINT ["gunicorn", "--daemon", "--workers=4", "--bind=", "app:create_app()"]

But there is another way. The Rust way. Where you build your image in a second, and it only takes 5 MB1.

FROM scratch
COPY target/release/my-binary .
ENTRYPOINT ["./my-binary"]

The Trick

This is possible thanks to the points below.

  1. Rust is compiled to binary
  2. Rust is statically2 compiled to binary
  3. You can easily use musl3 with Rust

The first point means that there is no need for a runtime to interpret the script or the bytecode.

The second point means that the binary contains the code for all the libraries4. And, more specifically, only the required code. So no need to install them externally, you remove some overhead, and the total size is reduced.

With these two, you get down to just a few runtime dependencies. Namely, glibc and the ELF loader5.

$ ldd target/release/my-binary (0x00007fffbf7f9000) => /lib/x86_64-linux-gnu/ (0x00007f7a759b2000) => /lib/x86_64-linux-gnu/ (0x00007f7a758d3000) => /lib/x86_64-linux-gnu/ (0x00007f7a756f2000)
/lib64/ (0x00007f7a762a6000)

But you can go further! We do not have to use glibc. It turns out you can just statically compile musl. And with Rust, using musl is only a matter of installing musl-tools and adding --target=x86_64-unknown-linux-musl to your cargo build command.

And, since we just got rid of our last runtime dependency, we do not even need the ELF loader! So now, we get:

$ ldd target/x86_64-unknown-linux-musl/release/my-binary
	statically linked

All the user code is now in a single file and will talk to the kernel directly using int 0x80.


This is a nice trick, and can help you if you really care about small Docker images, but it does have a drawback: targeting musl typically gives you lower performance. So keep this in mind!

  1. Sure, I am simplifying things here. In practice, you would also build the binary in a Dockerfile, and then use staging. But that does not change the point. ↩︎
  2. Static linking means including external libraries within the binary, as in opposition to dynamic libraries, where they are not included. Binaries that were dynamically linked will need to be provided with the external libraries at runtime (.so files on Linux, .dll files on Windows). ↩︎
  3. musl is an implementation of libc, the standard library for C. Virtually all binaries depend on the libc. glibc is the most common implementation of libc on Linux. ↩︎
  4. Of course, it won’t work if a library is using FFI to talk to OpenSSL for instance ↩︎
  5. The ELF loader is another program that helps your program to start up. In particular, it tells your program where to find the dynamically linked libraries. ↩︎
Morse Code

Koch’s Dissertation on Learning Morse Code

This post is a summary of the dissertation that Dr.-Ing. Ludwig Koch submitted in 1935 to become doctor engineer. This is what is referred to by “Koch method” in the context of learning Morse code. I will publish my commentary in another post.

You can find the scan, the transcription, and the full translation in the dedicated repository. The scan comes from the Internet Archive.

Edit: For another full translation, look here. For other summaries of the dissertation, look here, or here. I did my own summary independently of these, and before reading them.

Koch uses char/min as the unit for speed, where a character is 10.2 dots. In modern convention, we use the length of “PARIS”, that is, 50 dot, including the word separation. So, divide the char/min value by 5 to convert into word-per-minute (wpm).

Work-Psychological Study of The Process of Receiving Morse Code

Or, a new training procedure for radio operators

A. Introduction

Work psychological aims at improving the effectiveness of workers by studying how the human mind adapts to the work procedures. The profession of radio operator is chosen since their main activity is that of receiving Morse signals, which can be easily studied and quantified.

B. General Considerations

Professional radio operators are required to perfectly understand Morse code at 100 chars/min. Even with automation, man remains indispensable in certain cases. Morse code encodes characters as a series of short elements, or “dots” (·), and long elements, or “dashes” (–) tones. In standard Morse code, the proportions are:

  • a: length of a dot
  • b: length of a dash, equal to 3 a
  • c: separation between two elements, equal to a
  • d: separation between two characters, equal to 3 a

In the human mind, a concept is not the sum of its parts, but an indivisible whole, or “gestalt” in German. The reader recognizes whole words instead of deciphering each letter. The same applies to Morse code.

C. Rhythm when Sending

We first study the effect of transmission speed on the psychological process. For this, experienced operators send the sequence “B C V Q F L H Y Z X” and the exact timings are recorded. Characters have different durations — E (·) takes 1 dot, but Ch (––––) 15 — so the speed is an average.

Subject 1Subject 2
Subject 3Subject 4

Average values of a, b, c, d for the 4 operators, at different speeds. Abscissa is the speed in chars/min and ordinate a, b, c and d, in a unit proportional to the duration.

At low speed, a and c are close (except in subject 4), as they should be. However, b is more than 3 a. More importantly, d is much longer than b, even though it should be equal to it. The longer pauses between the characters actually correspond to the fact that, between the pauses, characters are sent faster than the standard proportions would allow.

Subject 1Subject 2
Subject 3Subject 4

Lengths of a, b, c, d stacked. From the abscissa upwards, this would read as the letter “a”, including a letter pause at the end. The recorded timings are marked with “s”, and the standard timings as “o”.

Subject 4, who sent at standard proportions at low speeds, belongs to the optical learning type, and learned Morse from the visual symbols first. She can send reliably at a speed of 110-120 chars/min but cannot receive at more than 60-70 chars/min.

Thus, the operators adapt the proportions to keep the character “whole”.

Success rate of experienced operator receiving low-speed standard Morse code. The abscissa is the speed in chars/min and the ordinate is success rate in percent.

To verify this, an automatic device was set to send 30 characters at the standard proportions. Experienced operators had trouble receiving the characters at low speeds. So the “whole” effect appears at a speed of 50 chars/min.

Characters themselves are not sent regularly. The table below shows the difference between standard proportions, and the effective rhythm at slow speed.

L·–···  –··
F··–···–  ·
C–·–·–·  –·
Q––·–––  ·–
X–··––  ··–
Y–·––– ·––
K–·––  ·–
D–··–  ··

Similarly, the operator breaks the character down in subunits. In standard Morse, C is –·–·. But it is often broken into –· –· (dah dit … dah dit) or, more rarely, into –·– · (dah dit dah … dit). Other characters, such as S (···) cannot be broken down this way.

In summary, standard Morse code is only usable at a speed of 50 char/min or above.

D. Learning to Receive

I. Classical Training Procedures

a) The Analytical Method

First, the student memorizes the visual representation of the Morse code, sometimes with mnemonics. Then, they start training to receive auditory Morse code at a low speed, and gradually increase the cadence. The disadvantages of this method are:

  1. Having to learn the visual representation first
  2. Waiting for a character to finish
  3. No holistic effect at the beginning
  4. Having to switching from non-holistic to holistic sound patterns
  5. Having to switching from counting the dots and dashes to recognizing the sound pattern
  6. Adapting as the rhythm changes
b) The Sound Image method
Learning curve for the sound image method. The abscissa is the number of weeks of practice; the ordinate is the speed attained by the student.

Students learn the visual representation. When practicing reception, a longer pause is used between the individual characters, which are sent at a high speed from the beginning. Students use the pause between characters to think about what they just heard. As the speed increases, this becomes impossible, and they have to switch to immediate recognition. This shows as a plateau in the learning curve around a speed of 50 char/min. The disadvantage of this method are:

  1. Having to learn the visual representation first
  2. Thinking about the elements of the sound pattern during the long breaks.
  3. Having to switching from counting the dots and dashes to recognizing the sound pattern
  4. Adapting as the rhythm changes

II. A New Learning Procedure

To help the student directly recognize the sound pattern, the training should not teach the visual representation and should use a high speed from the start. Since the plateau is around 50 char/min, it should start with at least 60 char/min. Higher speeds make it hard to learn new characters, so just use 60 char/min.

Because of this higher starting speed, the training should start with just two characters. Other characters should be added one by one gradually as the previous ones are mastered. The speed is not increased until all characters are mastered.

The training starts with two very different characters. At first, the student is not told what the two characters are and just puts marks in the answer sheet:

This trains the student to recognize the sound pattern as a whole. Afterward, the characters corresponding to the sounds are named.

When writing the characters, a common mistake is to pause when not grasping a character immediately to try to think about it. However, this causes a cascading effects where the next characters are not grasped immediately either, or even entirely missed. To avoid this, the student should just write a dot when they do not grasp a character immediately.

The student should master these first two characters after about 10 minutes. Then, a letter is added, again without revealing which it is at first. Once the student accurately puts marks when this new character is sent, the character is revealed. This process is then repeated for each character. Once all characters are mastered, the speed can be increased.

Some characters are harder to learn. In general, X, Y, P, Q. They should be added in the last third of the training. Characters were added as L, F, C, K, R, P, X, Y, Q, D, B, G, A, N, Z, U, V, W, J, H, S, I, E, O, M, T:

Some Morse code characters are hard to distinguish, such as S (···) and H (····), U (··–) and V (···–), D (–··) and B (–···). At a higher speed, distinguishing characters composed of dash is easier: O (–––) and Ch (––––), G (––·) and 6 (–····), W (·––) and J (·–––). The characters V (···–) and B (–···) are too different to be confused. Starting the training with similar characters is not efficient: E (·), I (··), S (···), H (····).

Characters should be added gradually one by one. An experience was conducted where the characters were added several at a time: T, M, O, Ch, then E, I, S, H, then D, B, G, then U, V, W. Each addition reset the progress of the student.

Training sessions should be spread out in time instead of packed together, and should not exceed half an hour.

The experience was conducted with students, not with soldiers in training, so the schedule was not regular. In the first half-hour, trainers could learn up to 5 characters. Other characters are added more slowly. Before adding a new character, the error rate should be below 10 %. The duration of the training was limited to 24 to 28 half-hour sessions.

In addition, early practice can be helped by using a different pitch for the dots and for the dashes. The pitches are gradually brought together during the training. Merging the two pitches after the 15th or after the 20th letter makes no difference.

Learning curve for the single-pitch (dashed line) and two-pitch (solid line) methods. The abscissa shows the number of half-hour sessions, and the ordinate the number of characters mastered.

The characters were added as H, F, A, B, G, C, D, E, CH, L, K, I, O, T, M, P, Q, R, S, N, U, W, V, Y, X, Z (no J):

Plateaus corresponds to more difficult characters: P, X, V, Y, Q, Z. Starting the training with these characters in the first third of the training, but not at the very beginning, helps in ensuring they are mastered by the end of the training, without discouraging the student who just started.

Learning curve, with the hard characters added between the 10th and 13th half-hour of practice. The abscissa shows the number of half-hour sessions, and the ordinate the number of characters mastered.

13 to 14 year-old students have the same progression rate as adults.

Mastery of the alphabet at 60 char/min is reached in 14 hours with this method, against 7 weeks with the classical ones.

E. Aptitude Test

Those who fail to master 4 characters at a speed of 60 char/min in the first 2 half-hours never mastered all characters. So the beginning of the training also serves as an aptitude test before the candidate spent dozens of hours in training. Students should be grouped by aptitude to avoid slowing down the most apt. Some students could be impeded by slow writing, but this can be remediated with practice. Those accustomed to concentrating are best, in particular those of the acoustic imagination type. Among about 100 subjects, none of the diagnoses made on the basis of the first two half-hours led to a wrong judgment. Biegel does a similar aptitude test, but she starts at 40 char/min.


Where to Start with Rust

As a Rust evangelist, I have been asked a few times for resources to start learning Rust. I have compiled my thoughts in this article. I have ordered the resources in a rough order: you should first take a look at the first resources and later at the other ones. Do not take it as gospel, and do peek at the rest of the list before completing even the official tutorial.

The Rust Programming Language

This is the official tutorial for Rust. It is also known as “the Rust book”. I do not recommend you read it entirely, but you should read at least the first 12 chapters. Of course, the rest of the chapters are worth a read, but the more advanced topics are better covered by dedicated resources.

Rust for functional programmers

If you are used to programming in Haskell or OCaml, this tutorial might help you get you to speed quicker.

Tour of Rust

For a more hands-on approach, you can use this tutorial, which drops you in a Rust playground with minimal exercises that each train you on a new concept.

Rust by Example

If the Tour of Rust is not enough, you can also try this one to complement your knowledge.

Learning Rust With Entirely Too Many Linked Lists

This tutorial is about getting over one of the main pain points of Rust: cyclic references. It goes over various solutions, how they work, and their pros and cons. A must-read if you want to become more comfortable with Rust.

Learn Rust the Dangerous Way

You can see this tutorial as an extension of the previous one. When you are used to developing in C, switching to idiomatic Rust can seem like a daunting task. This website will take you from a C program (the n-body problem from the Benchmarks Game) to a fully idiomatic and optimized Rust one.


At this point, you should be getting comfortable with the syntax and main idiosyncrasies of Rust. This collection of exercises will let you hone your skills.

Zero To Production In Rust (book)

To go further than writing small snippets of Rust, you will want to start taking into account higher-level considerations. This book narrates every step in setting up a simple mailing-list server in Rust.

Macros in Rust

Writing macros is a fairly advanced topics of Rust. This tutorial will get you more familiar with writing declarative macros (or “macros by example”).

The Rustonomicon

Most of the time, you should not need to touch unsafe. But, when you do, you enter a parallel world where every step you take could lead you to the dreaded undefined behaviors. Using unsafe safely requires thinking about many things. This tutorial will help you get started with this.

Rust for Rustaceans (book)

Like all non-trivial software projects, Rust has its quirks and unintuitive mechanisms. This book is a very dense explication of some of this crust. You might also enjoy the author’s YouTube channel.

This Week in Rust

This is the one-stop-shop to keep learning more about Rust, its ecosystem and its community. You do not need to follow all the links, or even read it from start to finish, but skimming the latest edition every week will let you stay on top of things.

Rust Koans

I conclude this list of heavy technical content with some light reading, likely inspired from the Rootless Root.


Client-Side Password Hashing

tl;dr: When you think about it, hashing password only client-side (in the browser, in JavaScript) is not as terrible as you might think. And if you do most of the work client-side, you get something that is actually pretty secure.

Storing Passwords

So, I have been working on modernizing a Web application and fixing some of its issues. As someone who has spent way too much time implementing cryptographic primitives and cracking password hashes for fun, one of the main problems was that passwords were only hashed client-side.

For the context, we have learned quite a long time ago that storing raw passwords in a database was a bad idea. If the database leaked, all passwords were immediately compromised. Since people tend to reuse the same passwords for various services, it meant that compromising a non-sensitive service would allow an attacker to access very sensitive accounts of many individuals.

So we encrypt them. Except we don’t. If the database leaks, the encryption key will probably leak as well. Instead, we cryptographically hash the password. That is, we transform them in something completely different, irreversibly. So, “correct horse battery staple” would become “c4bbcb1fbec99d65bf59d85c8cb62ee2db963f0fe106f483d9afa73bd4e39a8a”.

How your password is saved when you register, and how it is checked when you log in

Because the same password always get hashed the same way, we can authenticate people by comparing the hashes instead of the passwords themselves. No need to remember the password in the database! And if the database leaks, no attacker cannot find “correct horse battery staple” from “c4bbcb1fbec99d65bf59d85c8cb62ee2db963f0fe106f483d9afa73bd4e39a8a”.

Well, except they can.

Slowing down attackers

Although cryptographic hashes are designed to be hard to directly reversed, an attacker who got access to the hashes can just take a guess, hash it, and check whether it matches the hash of a user. If it does, it has found a password.

Since cryptographic hashes are also designed to be very fast, an attacker can try many guesses in a short amount of time.

The average password entropy is 40 bits. The median is likely to be lower. A modern CPU can compute more than 1 million SHA-256 hash computation per seconds. So trying every password with 40 bits of entropy would take less than 2 weeks.

The number of characters is not everything. What matters is how it is generated. Don’t pick a password: since humans are bad at randomness, the entropy will be even lower!

Thankfully, we can make things more difficult for the attacker. What if, instead of storing SHA-256(password), we stored SHA-256(SHA-256(password)). It would make things twice as hard for the attacker, and would not change much when we validate a user’s password for authentication. We can go further! Instead of requiring to compute SHA-256 twice, we can make it so that it takes ten, a hundred, a thousand iterations! We can choose a number that makes it so that hashing the password takes 10 ms on the server. In that case, it would take the attacker 350 years of CPU time! This is basically PBKDF2.

The more iterations, the longer it will take to crack a password. Of course, verifying the password when authenticating will also take longer, but it should only be a few milliseconds.

Except that the attacker can do something that we cannot: it can try many guesses in parallel. Today’s CPUs can have dozens of cores. GPUs scale even more. And, with Bitcoin/blockchain/web3/NFTs, people have started building dedicated hardware to hash SHA-256 en masse. For a few thousand dollars, you can get hardware to crack SHA-256 at tens of billions of guesses per second. Even with our clever scheme, the attacker is back to 2 weeks to try every password1.

That’s cheating!

Yeah, it is! But we can do something about that. GPUs and dedicated hardware are designed for very specific tasks. We can take advantage of that to make it very hard to crack passwords with them. The trick is to replace SHA-256 with a hash function that requires a lot of memory to compute. It turns out, people have designed a very good one! Argon2id2

That’s basically the state of the art in storing passwords3.

Client-Side Only Hashing

The client (your web browser) just sends the password over the network. Then, the servers hash to hash it before storing or checking it.

Now, that’s quite a bit of work for our server. We need to do a significant amount of computation every time a user tries to connect. What if we had the user’s client do it instead?

Instead of sending the password, and having the server hash it, let’s imagine we use some JavaScript to hash the user’s password with Argon2id, and then send it over the network. Now, the server only needs to compare the hash to the one in the database!

Now, the client does most of the work.

There is one issue though: from the point of view of our Web application, the actual credential is not the password anymore, but the hash. So if an attacker gets the hash, they can authenticate as the user. In other words, a database leak means the credentials leak!

It’s not as bad as in the initial scenario, however. These credentials are only valid for our service. And it is still very hard to find the original passwords from the hashes.

Mostly Client-Side Hashing

We can improve this with a simple observation: since it is very hard to get the password back from the hash, we can treat the hash like a high-entropy password.

In this case, brute-force attacks become irrelevant, and we do not need to iterate or use memory-hard functions. Instead, we could just use a fast hash. It looks like that:

  1. The browser computes an intermediate hash with a slow memory-hard function
  2. The browser sends the intermediate hash to the server
  3. The server hashes the intermediate hash with a fast function, getting the final hash
  4. The server compares the final hash with the one in the database

From the point of view of the server, the effective credential is the intermediate hash. However, if the database leak, the attacker only gets the final hash. It would be very hard to find the intermediate hash directly from the final hash, since the input space is very large. And it would be very hard to find the user’s password from the final hash, since a memory-hard function is involved.

The nice thing is that all the heavy work is offloaded to the browser, while still keeping the password secure!

The one drawback that I can think of is that this is not enforced by the back-end. If the front-end (especially a third party one) does not do proper memory-hard hashing, passwords might become exposed.

A positive side effect is that the back-end never sees raw passwords. It can be a good mitigation for inadvertently leaking information in the logs. Leaking the intermediate hash still allows an attacker to authenticate as a user, but they are not able to get their raw password and compromise the user’s accounts on other websites.


A priori, client-side hashing is a terrible idea from the point of view of security. However, looking more closely at it, it looks like it can actually make sense. But we should still use hashing on the server to avoid any risk of leaking credentials.

  1. Note that, without the iterated hashing, it would take just 2 minutes with dedicated hardware. So that’s still better than not doing it. ↩︎
  2. scrypt came first, and is pretty good as well ↩︎
  3. There is also salting, but that’s orthogonal to this discussion ↩︎