Rebuilding My Stack for Audio Software (Part 1)

I'm currently in the middle of refactoring Part 5 of the Inside an Audio-to-MIDI Engine series. It's not quite ready yet, but I wanted to keep the momentum going. So I'm starting a new series that's a bit more personal, but still grounded in engineering.

For most of my career, I've worked in large-scale C++ systems in the defense space. Real-time constraints, distributed systems, and high-reliability environments have shaped how I think about software.

But my interest in audio has always been there.

When the pandemic hit and things slowed down, I started exploring the VST3 SDK. That experience was equal parts frustrating and fascinating. I built a small polyrhythmic beat generator (nothing production ready), but it was enough to pull me deeper into audio development.

Fast-forward to late 2024: I started building what eventually became NyxFX. Initially it was just a sandbox for experimenting with real-time visuals and audio-driven effects. Over time, it evolved into something much more serious. I found myself thinking about audio systems constantly, where I was designing, prototyping, and iterating outside of work.

At some point, this stopped being a side project and became a deliberate shift.

Over the past year, I've been rebuilding my understanding of software engineering through the lens of audio: signal processing, timing models, real-time constraints, and perceptual systems. Not just learning concepts, but implementing them in systems like a spectral MIDI engine and VST3 plugins.

This series documents that process.

More specifically:

What carries over from large-scale C++ systems into audio
What breaks down completely
What had to be learned from scratch
And how those lessons show up in real systems

This isn't a guide on how to "get into audio." It's a breakdown of what it looks like to rebuild your engineering stack in a new domain (at least from my own perspective).

What Carries Over and What Doesn't

Coming from large-scale C++ systems, I thought I understood real-time constraints and message passing. I was wrong on a few key points.

In most systems I've worked on, real-time meant:

bounded latency
predictable scheduling (often backed by RTOS concepts)
careful resource management

There are strategies for dropped messages. Systems degrade gracefully. Buffers exist. Retries exist.

In audio, none of that really applies.

If your code misses its deadline, the result isn't a delayed response but rather an audible glitch.

What Broke

A lot of my initial instincts didn't translate, especially as I tried to find workarounds for the Controller-Processor paradigm, which I found frustrating to be confined to at first.

Some examples:

Message passing (ZeroMQ)
I tried bringing in ZeroMQ for cross-thread and cross-component communication. It worked… until it didn't. Connections dropped intermittently, cross-process communication became unreliable, and switching protocols tanked performance. What worked well in my day-to-day distributed systems didn't map cleanly into a DAW-hosted, real-time environment.
Separating UI into a standalone app
In other domains, offloading UI to a separate process can simplify architecture. I tried doing this by connecting a standalone desktop UI to the plugin via named pipes. It made sense conceptually, but it didn't align with how DAWs expect plugins to behave. Portability and host integration immediately became problems pretty quickly.
Assuming the host would manage backpressure
At one point, I pushed large FFT updates from the processor to the UI, assuming the DAW would buffer and handle the burden. In other systems, you can monitor queue depth or tune middleware. In a DAW, you don't control that layer. The result was a complete breakdown in terms of both UI and processor (they became hilariously unresponsive).
UI-owned state
I initially built my UI as a self-contained system where the data model lived inside the view. That works in many embedded and immediate-mode UI systems. In VST3, the controller lifecycle invalidates that assumption. State was getting wiped on UI teardown, forcing a rethink of ownership and persistence. Now I think in terms of independent data models and state serialization/deserialization.

None of these were bad ideas in isolation. They were just misaligned with the constraints of audio systems.

What Carried Over

At the same time, a background in low-level and real-time systems does translate in meaningful ways.

Message passing as a mental model
Even though the implementation details change, thinking in terms of data flow and decoupled systems still applies.
Lock-free and concurrency awareness
Understanding when locks are dangerous and how to avoid them is critical in audio. That intuition carries over directly.
Memory discipline
Avoiding allocations in hot paths isn't new (in many embedded systems, you aren't allowed to make any allocations anywhere).
Testing and simulation mindset
Prior experience building controlled test environments, generating synthetic signals, and validating behavior under constraints pays off immediately. Because you're working on systems that are difficult to trace or have visibility into, you start to build your components in controlled environments with controlled data for fast iteration and testing before doing integration and deployment tests.

The biggest takeaway for me has been

Some instincts transfer cleanly. Others fail in subtle but important ways. And figuring out which is which has been one of the most interesting parts of this transition.

If you made it this far, then thanks for reading. I have a few ideas on how I might expand this series.