ProjectOrchestratorRustAudio SynthesisWAV Format

1 month ago by @sihilel.h

Building a WAV Synthesizer in Rust

I built my first Rust project—a WAV audio synthesizer that generates playable audio from JSON notation. Implemented the complete pipeline from frequency calculation to PCM encoding to RIFF file format, learning how digital audio actually works.

Building a WAV Synthesizer in Rust

Backstory

I was listening to music one evening after work when a question interrupted my relaxation: How does this analog sound wave become the ones and zeros stored on my phone? I'd learned about analog-to-digital modulation in my A/L ICT syllabus, but it remained frustratingly theoretical—no practical implementation, no hands-on understanding of how digital audio actually works. That evening, I decided to find out by building Orchestrator, a WAV audio synthesizer in Rust that generates audio files from JSON notation.

After years of working with TypeScript, JavaScript, and PHP, I needed a new challenge. I'd been learning Rust's theoretical concepts for over a week, and this became my first real project in the language. Audio synthesis requires precise control over bytes, sample rates, and binary formats—exactly where Rust excels.

flowchart LR A[JSON Input] --> B[Parse Notes] B --> C[Calculate Frequencies] C --> D[Generate Sine Waves] D --> E[PCM Encoding] E --> F[WAV File]
flowchart TD A[JSON Input] --> B[Parse Notes] B --> C[Calculate Frequencies] C --> D[Generate Sine Waves] D --> E[PCM Encoding] E --> F[WAV File]

How Musical Tuning Actually Works

Implementing frequency calculation revealed something I hadn't fully appreciated: musical tuning isn't arbitrary, it's mathematically precise. Western music uses the A440 standard where A4 (middle A) vibrates at exactly 440 Hz. Every other note follows from the equal temperament formula:

f=440×2(n9)/12f = 440 \times 2^{(n - 9) / 12}

Here, nn represents the semitone offset from C4, and the formula divides an octave into twelve logarithmically equal steps. This explains why instruments around the world can be tuned to play together—they're all following the same mathematical standard. The guitar in Tokyo and the piano in Colombo calculate the same frequency for middle C: approximately 261.63 Hz.

For different octaves, I add 12×(octave4)12 \times (\text{octave} - 4) to the semitone count. This shifts the frequency up or down by powers of two (each octave doubles or halves the frequency). The implementation captured this elegantly:

rust
let n = (note_id - 9) + 12 * (octave - 4);
let frequency = 440.0 * 2.0_f64.powf(n as f64 / 12.0);

Rethinking the Sine Wave Generator

The oscillator proved trickier than expected. A sine wave is mathematically straightforward— amplitude×sin(2π×frequency×t)\text{amplitude} \times \sin(2\pi \times \text{frequency} \times t) —but implementing the sample generator required rethinking my approach.

Initially, I tried a time-based index, thinking I'd pass time values directly. This failed because at a 44.1 kHz sample rate, a single second contains 44,100 discrete samples. Managing time values became unwieldy. The solution was to pass the sample index directly and calculate time inside the function:

rust
let t = sample_index as f64 / sample_rate as f64;
let sample = amplitude * (2.0 * PI * frequency * t).sin();

How audio sampling works explained using a chart

This shift (from thinking in continuous time to discrete sample indices) mirrors the fundamental nature of digital audio. We're not capturing continuous sound—we're taking 44,100 snapshots per second and reconstructing the wave from those snapshots. Starting with the oscillator and verifying it generated correct samples before moving forward made the problem tractable.

Why Sample Rate and Bit Depth Matter

I knew lower sample rates and bit depths produced underwater-sounding audio, but I didn't understand why until I implemented PCM encoding. Sample rate determines how many snapshots we take per second. The Nyquist theorem states you need at least twice the frequency to accurately reproduce a sound. Human hearing tops out around 20 kHz, so 44.1 kHz captures everything we can hear with room to spare.

Bit depth determines the precision of each sample. With 16-bit audio, each sample is quantized into one of 65,536 possible values (from -32,767 to +32,767). Lower bit depths mean fewer possible values, creating audible quantization noise—that underwater effect. The conversion from floating-point sine waves to 16-bit integers required clamping to prevent overflow:

rust
let float_sample = sine_sample.clamp(-1.0, 1.0);
let pcm_value = (float_sample * 32767.0) as i16;

I discovered the importance of clamping the hard way. In an early test, I accidentally set amplitude to 100 instead of 1.0. The resulting PCM values overflowed, corrupting the WAV file completely—it wouldn't play, just silence. Digital audio has hard boundaries; exceeding them doesn't just reduce quality, it breaks everything.

The WAV Format and Little-Endian Design

The WAV file format follows the RIFF (Resource Interchange File Format) structure—essentially a container with labeled chunks of data. Writing the format correctly meant understanding byte ordering, specifically why WAV uses little-endian.

How the WAV file format is structured

Little-endian stores the least significant byte first, which feels backwards. A 16-bit value like 1000 (0x03E8 in hex) gets stored as E8 03 rather than 03 E8. This initially tripped me up, but it's a design choice from WAV's origins: Microsoft and IBM developed the format for Intel x86 processors, which use little-endian architecture.

The file structure itself is logical once you see it:

  • A RIFF header identifies the file type (12 bytes)
  • A fmt chunk describes the audio format—sample rate, channels, bit depth (24 bytes)
  • A data chunk contains the raw PCM samples (8 bytes header + all audio data)

Every multi-byte integer had to be written in little-endian order. Rust made this explicit with write_all() and careful byte manipulation, forcing me to think about the actual bytes being written to disk.

Test Drive: My First Flight with Rust

I included two example outputs with the project. The first, octave.wav, is a simple C major scale—functional but unremarkable. The second, test_drive.wav, is a simplified interpretation of the opening melody from "Test Drive," the theme from How to Train Your Dragon that plays during Hiccup's first flight with Toothless.

This is a small Easter egg. I was seven or eight when I first saw that film, and that scene (and that music) stuck with me for life. The parallel felt perfect: Hiccup's first uncertain flight with Toothless, and my first real project taking flight in Rust.

Waveform of test_drive.wav

From Theory to Implementation

The entire project took a single evening—about six to seven hours after my shift ended. Those six hours were productive because I'd spent over a week learning Rust's theoretical concepts first. But at some point, you have to build something real.

I followed a principle that kept me from being overwhelmed: I wasn't building "an audio synthesizer"—I was building a function that returns a sine wave sample, then a function that converts it to PCM, then a function that writes RIFF headers. Each piece was simple; their composition created something complex.

What This Actually Means

We take everyday technology for granted. Listening to music on a phone seems trivial—tap an app, press play, sound emerges. But underneath lies decades of work by engineers who solved problems like "How do we represent continuous sound waves in discrete digital form" and "What byte ordering should we use for cross-platform compatibility?"

The gap between wondering "how does this work" and actually knowing is smaller than it seems. The specifications for WAV files are publicly available. The mathematics of equal temperament tuning is well-documented. The tools (Rust, Cargo, free online resources) are accessible to anyone. What separates someone who wonders from someone who knows is simply the decision to find out.

This project was my first step on that path. Rust was the vehicle, but the real journey was from theoretical knowledge to practical understanding—from knowing about analog-to-digital conversion to actually implementing it, one sample at a time.


The complete source code and example audio files are available on GitHub. Try modifying the JSON inputs to create your own melodies, or extend the synthesizer with new waveforms and effects.

Share this post
Help others discover this post by sharing it on your favorite platform