The conversation that changed how we built Scribr

A customer emailed us three weeks after launch: 'I almost deleted your app because I thought my audio was being sent somewhere.' It wasn't. It never would be. But that fear told us something important about the market we'd entered, and why we'd made the right bet.

The moment we realised privacy wasn't a feature, it was table stakes

When we started building Scribr in late 2023, cloud transcription seemed like the obvious route. Every competitor we looked at was doing it. Faster, more accurate, easier to scale. We sketched out the architecture, ran the cost models, and felt the familiar pull: build what's easiest to ship first, iterate later.

Then we started talking to actual users. A therapist told us she wouldn't record sessions at all if the audio had to leave her device. A legal consultant said her clients would never agree to cloud-based transcription, regardless of encryption claims. A researcher working with sensitive interview data was blunt: 'If I can't guarantee the audio never leaves my phone, I'm using a pen and paper.'

That's when it clicked. For knowledge workers whose conversations contain confidential information, on-device transcription isn't a nice-to-have. It's a dealbreaker. Either Scribr kept audio on your phone, or it wasn't a product worth using.

Why Whisper and Apple Speech won the bet

We could have built our own transcription model. We could have licensed from a third party. Instead, we chose to rely on two existing technologies: Whisper for iOS devices, and Apple's native Speech framework as a fallback. Both run entirely on your phone. Your audio never touches our servers, full stop.

This choice shaped everything else. It meant we had to build mobile-first from day one. It meant we couldn't offer infinite transcription for free. But it also meant we could offer something competitors can't: a guarantee. On the free tier, your audio stays yours. Period.

The trade-off is real. On-device transcription is slower than cloud. It's constrained by device hardware. On a 2-hour recording, you're waiting minutes, not seconds. But that wait is the price of privacy, and for our users, it's a price worth paying.

The architecture decision that meant we had to stay simple

Once we committed to on-device transcription for the free tier, the whole product structure followed. We couldn't offer unlimited uploads, because transcription would choke the device. We couldn't rely on cloud sync out of the box, because syncing encrypted audio files across devices costs money and complexity we didn't have. We had to be thoughtful about what we built, not just what we could build.

That constraint turned into clarity. The free tier became truly minimal: on-device transcription, the Quick Record widget, Siri shortcuts, biometric lock. No cloud, no summaries, no action-item extraction. All of that lives in Pro, where cloud transcription via Deepgram is explicit and optional.

Some founders hate constraints like this. We loved it. It meant every feature we added had to solve a real problem, not just sit there looking shiny.

What we learned when people started using it

Launch week was instructive. We got feedback from sales teams, therapists, university researchers, and freelance consultants. The common thread: on-device transcription attracted people who'd never trust a cloud-only tool. The therapist who almost deleted the app came back a week later and upgraded to Pro specifically because she trusted the foundation. Trust compounds.

We also learned that people don't always need AI summaries or action-item extraction right away. They need the transcript. They need to search it. They need it locked with a biometric password. That's the job. The AI comes next, once you're confident in the tool.

That shaped our roadmap. We've spent as much time improving the reliability of on-device transcription as we have adding AI features. Better Whisper integration. More reliable sync. Faster biometric unlock. Unglamorous work, but it's what keeps users.

The real question isn't whether on-device transcription is technically superior

It isn't always. Cloud transcription is faster, more accurate on noisy audio, and scales infinitely. If you don't care about privacy, it's objectively better. But that's not the bet we made with Scribr.

The real question is: who should own your conversations? If you're a knowledge worker whose value lives in what you know, and what you know comes from private conversations, then you should own the transcripts. Not a vendor. Not a cloud service. You.

That philosophy doesn't sell itself. It's not a feature you can demo in sixty seconds. It's a commitment. It means we'll never sell your data, never train models on your audio, never surprise you with a privacy policy update. It means the free tier will always be private, even though private transcription costs us money and limits our growth.

Some of our early users get that immediately. Some take a while. The ones who do tend to stay, because they know they're not the product.

If you're the kind of person who thinks about who owns your data before you hand it over, Scribr might be worth ten minutes of your time. If you've never thought about it, maybe now is the moment to start.