JSON / JSONL

JSON stands for "JavaScript Object Notation", it's an extremely popular data format. JSONL stands for JSON lines. It's a twist on JSON whereby a file contains multiple JSON objects, each stored on a single line.

Contrary to CSV, JSON/JSONL have specifications. For JSON, for JSONL. This helps a lot with standardization and leveraging libraries for parsing and validating JSON files.

JSON

Let us start with the simple JSON file.

Reading a JSON file

The following is a valid JSON data:

{
    "name": "HAL",
    "age": 42,
    "hobbies": ["floating", "chilling", "writing Rust"]
}

Typically it's written to a JSON file, a file that has the above content with .json at the end. As a matter of fact, copy the content of the grey block above, and paste it in a file next to our previous CSV code, let's call it sample.json. Alternatively, initialize a new project.

Your directory should look like this:

.
├── Cargo.lock
├── Cargo.toml
├── sample.json
├── src
│   └── main.rs

Let us now read this file in Rust. There are many ways of doing this, one way would be to do it similar to how we did it in the CSV chapter. But that would be a lot of work. The cool thing being a developer, is people provide libraries and tools for free for others to use and build even more. I know very little occupations that have this embedded at their root.

For this case we'll leverage two cargo crates, called serde & serde_json. Both work together and are extremely powerful. There's much you can do with just these libraries alone, but for now, let's focus on reading this JSON file.

Run these commands to get started.

cargo add serde --features derive
cargo add serde_json

The code for this will go as follows:

use serde::{Deserialize, Serialize};
use std::fs::File;
use std::io::Read;

#[derive(Debug, Serialize, Deserialize)]
struct Person {
    name: String,
    age: u32,
    hobbies: Vec<String>,
}

fn read_json_file(filename: &str) -> Result<Person, std::io::Error> {
    let mut file = File::open(filename).expect("Couldn't open JSON file");

    let mut content = String::new();
    file.read_to_string(&mut content).expect("Couldn't read file content.");

    let p: Person = serde_json::from_str(&content).expect("Couldn't parse JSON into Person struct.");

    Ok(p)
}

fn main() {
    let filename = "sample.json";

    match read_json_file(filename) {
        Ok(person) => {
            println!("{:?}", person);
        }
        Err(e) => {
            eprintln!("Error reading JSON file: {}", e);
        }
    }
}

JSON to struct

🚧 in progress 🚧

Writing a JSON file

🚧 in progress 🚧

Struct to JSON

🚧 in progress 🚧

Now with all this in mind, here's a small task: Take the data from the CSV chapter, and try to write that to a JSONL file using the techniques explained above.

JSONL

The L at the end of the JSON stands for "Lines". It's a popular format for big data workloads. Essentially how it's made, is that you have a file where each line contains 1 JSON object, roughly.

The main benefit of this format is the ability to add data to already existing files by appending JSON records 1 line at the time, to an existing file. It also helps a lot for reading data in batches, giving predictability for the schema and more.

🚧 in progress 🚧