Why we built multi-track recording into Podcastr, and how it changes everything

Three months before launch, a creator emailed us asking a single question: "Can I record my guest's voice separately from mine?" It was the kind of question that sounds simple until you realise how many podcast apps ignore it completely.

The problem we kept hearing

Most podcasters we spoke to were trapped in the same workflow. Record a Zoom or Riverside call, get a compressed mess of audio, spend hours trying to isolate voices, and give up halfway through. Or they'd use two apps simultaneously, jumping between tools, syncing files by hand, and praying nothing crashed.

One creator told us she'd spent 45 minutes simply aligning three separate audio files before she could even start editing. Another said she'd ditched remote guests entirely because the audio quality made editing feel impossible.

We realised the issue wasn't that good multi-track recording didn't exist. It was that podcasters had to leave the app to get it. Local recording on your device gives you clean, uncompressed audio. Remote recording from a guest over the internet gives you their voice separately. But stitching those together, keeping them in sync, and shipping them out for editing? That was someone else's problem.

How Podcastr captures both sides at once

When you hit record in Podcastr, here's what happens. Your voice records locally on your device, captured at the highest quality your microphone and system can deliver. Completely uncompressed. No internet bottleneck. Your guest's voice also records locally on their device at the same time. Meanwhile, a backup stream goes through the internet connection so you're both hearing each other in real-time, just like a normal call.

The magic is that everything stays in sync without you thinking about it. You're not managing three different timelines or worrying about drift. When you finish the episode, Podcastr knows exactly which audio file belongs to which person because it was captured separately from the moment you pressed record.

Most podcast creators we've worked with expected this to be complicated. Instead, it just works. You record. You stop. The files are there, organised, ready to go.

What this means for editing

Once you have separate audio tracks, editing becomes a different game entirely. You're not trying to surgically isolate one voice from a muddy mix. You can adjust the volume of your guest independently of your own voice. If someone coughs, you can mute just that track. If there's background noise on one end, you tackle it without touching the other person's audio.

We built this straight into Podcastr because we knew creators would want to stay in one place. You record, and then you can export individual tracks for deeper work in other tools if you need to. Or you can use our transcription powered by OpenAI Whisper, sync everything automatically, and move straight to generating clips for social media.

The freedom is in the choice. You're not forced into a fragmented workflow. Local and remote multi-track recording gives you the foundation; what you do next is yours to decide.

The technical side, without the headache

I should be honest about what goes on behind the scenes. Syncing two separate recordings that started at slightly different times, over different networks, from two different devices, is a real problem. It's the kind of thing that sounds impossible until you've sat with the engineering for a few weeks.

We solved it by anchoring everything to timestamps and the audio frames themselves. The moment you hit record, both devices start their local clock. When your guest's audio comes through the internet connection, we're listening for reference points. After the call ends, the app reconstructs the exact offset between the two files and brings them into alignment automatically. You don't see any of this. You just see two tracks that line up perfectly.

Is there ever drift? Rarely. And if there is, it's milliseconds. Modern devices and networks are stable enough that this works reliably. We've tested it hundreds of times, across different network conditions, different devices, different countries. The system holds.

Where this fits in your actual workflow

A lot of podcast tools promise integration. What that usually means is they bolt on someone else's API and call it a day. We built multi-track recording as the foundation of Podcastr because we knew you weren't going to stop recording just because the transcription was good or the clips were sharp. You're going to keep improving, keep experimenting, and you'd want your recording to support that.

That's why we included the OpenAI Whisper transcription alongside it. Your separate audio tracks get transcribed separately, which means you have a clean transcript where it's obvious who said what. Your auto-generated show notes pull from that transcript, so they're accurate. If you want to make a short clip for social, you're grabbing from high-quality source material instead of compressing something that's already been compressed.

Everything points back to that moment when you hit record. Get that right, and the rest of the workflow becomes easier, faster, and more professional.

A small thing that changes a lot

When I say multi-track recording was a requirement from day one, I mean we could have launched without it. Plenty of apps do. But we kept thinking about that creator who'd asked the original question, and everyone else who was doing the workaround dance. They deserved better.

Building it right took time. But the result is that you're not choosing between convenience and quality anymore. You get both. You hit record on your device and share a link with your guest. They hit record on theirs. You talk. When you're done, your audio is there, separate, clean, and ready for whatever comes next. No sync issues. No lost files. No jumping between tools.

It's the kind of feature that feels invisible when it's working properly. Which is exactly how it should be.

If you've been stitching together podcast audio from multiple sources, what's been the biggest pain point in that process? I'd genuinely like to know what we got right, and what we're still missing.