Introduction to Rust

We'll start this guide by a short introduction to Rust as well as showing a few code snippets for how you can start using Rust.

Note

It's not important to immediately understand everything. The important part is to get you to run Rust code as fast as possible, like in the next 5 minutes and without having to install anything just yet. It's always helpful to get a general feeling for the language before diving in.. This guide is structured in a way that you can run almost every code block and see the result it outputs.

That way, we keep the examples practical and down to earth. When you hover on a code block, you'll see a play button, upon clicking on it, the code will run. More on that, in a following chapter.

What is Rust?

Rust is a programming language which according the official website (January 2023) is "a language empowering everyone to build reliable and efficient software". It was designed by Graydon Hoare and was released a little more than a decade ago ~in 2006, more on Rust's history on the next chapter. The key points are that it's multi purpose programming language that has influences from a wide range of programming languages, notably C++.

It is frequently advertised as a memory safe and performant alternative to C/C++.

The following is Rust code that outputs "Hello World", try and run it :)

#![allow(unused)]
fn main() {
println!("Hello, World!");
}

How Rust works is that the code you write is first compiled to machine code and then executed. This leverages what's called the Rust compiler and LLVM. It's not important to know what these are yet, but you can look them up if you want. The gist of it being: the code generated by this process is platform independant. Matter of fact, it just ran in your browser ◕ ◡ ◕

The following is also Rust code. Can you guess what it does before running it? You can edit this one as well to replace the names of the functions if you like but keep it short, we have work to do.

fn mystery_function(input: &str) -> i32 {
    let mysteries = "aeiouAEIOU";
    input.chars().filter(|c| mysteries.contains(*c)).count() as i32
}

fn main(){
    let input_string = "Hello, world!";
    let mystery_count = mystery_function(&input_string);
    println!("The number of mysteries in '{}' is {}", input_string, mystery_count);
}

That looks pretty much like Python, right? The only weird looking parts are those &, * and the -> symbols and perhaps the str and i32 too. We'll get to those in time, not to worry.

The mystery function Counts the number of vowels in the input string "Hello, world!".

Why is Rust a good choice for data engineering?

The argument that Rust is a memory safe and performant alternative to C/C++ doesn't matter that much to Data Engineers, since we don't usually deal with C or C++.

What is important is that Rust is a systems programming language which enforces a certain approach to programming that is at the same time efficient and good at elminiating a whole category of errors that usually happen with data workloads.

Here's perhaps a very simplified example that doesn't require dwelling over performance, low-level control or even memory usage.

Store the following in a file called test.py:

import random 

def count_characters(s: str, c: str) -> int:
    return s.count(c)

def main():
    if(random.random() < 0.5):
        return count_characters("Hello world", "o") 
    else:
        return count_characters("Hello world", 23) 

main()

If you run python3 test.py multiple times, you'll see that it sometimes fails but sometimes works. Of course, a linter, a battery of tests or even a type checker if properly set up, might show this up as a warning or even an error to the user, but it won't prevent that code to end up running somewere in case it's not caught in due time. When using complex objects or when the scope of the project increases, these type of errors sneak in and they're discovered only too late. You can't test everything.

This happens more than you think, this simplistic example is meant to convey that preventing these issues to hit production is an after thought and there will never be any guarantees.

Consider the equivalent code in Rust and try to run it:

extern crate rand;
use rand::Rng;

fn count_characters(s: &str, c: char) -> usize {
    s.chars().filter(|x| *x == c).count()
}

fn main() {
    let mut rng = rand::thread_rng();
    if rng.gen_bool(0.5) {
        count_characters("Hello world", 'o');
    } else {
        count_characters("Hello world", 23);
    }
}

Notice what the compiler says. Does it make a bit more sense now? The compiler doesn't allow you to build this application, needless to say this won't end up in your production server.

In essence, interface definitions and specifications are enforced at the lowest level and not as an afterthought in a CI/CD pipeline or in some external YAML file serving as documentation. This of course always requires a bit more work and planning upfront but you'll get a few guarantees in exchange, especially the guarantees that matter in the context of working with data. This way of approaching things is not the reason for Rust's performance and scalability but certainly enables it.

We saw through a simple example what makes Rust perfect for Data Engineering, even though it's a very simple one. Over the next chapters we'll discover some more and get them to run.

If you like it so far, consider subscribing for free to get the new chapters.


Subscribtions temporarily closed

Stay tuned for a new 💅 & refreshed look. ✨

Want a sneak peek? 👀 Shoot me an email at → karim.jedda@gmail.com ←