Long audio needs a different approach

Three weeks before launch, a beta user sent me a 47-minute client call recording. Our on-device transcription finished in seconds. The output was unusable. That's when I knew we'd made the right call building cloud transcription into Scribr.

The wall you hit with on-device transcription

When we started Scribr, we made a deliberate choice. Free users get on-device transcription via Whisper and Apple Speech. No servers. No audio leaving the phone. No privacy compromise. That's real, and we're proud of it.

But here's what we learned in testing: on-device works beautifully for voice notes, quick recordings, even most phone calls. Short, intimate, immediate. The transcription is fast and private. For a lot of people, that's enough.

Long meetings are different. Your average sales call or client consultation runs 30, 40, sometimes 90 minutes. On-device transcription struggles with length. The accuracy drifts. The processing heats up your phone. Battery drains. And if you're working with multiple long recordings in a week, the friction compounds.

We could have pretended this wasn't a problem. Some apps do. We decided instead to build properly for the use case. If you're a consultant logging a two-hour discovery call, or a therapist recording a session, or a researcher capturing an interview, you need transcription that scales without compromise.

Why Deepgram, and not just any cloud service

We evaluated three approaches before launch. Build our own transcription pipeline (years of work, vast infrastructure spend). Use a general-purpose API (good accuracy, slower processing, higher costs per minute). Partner with Deepgram.

Deepgram won because of three things. First, speed. Their API processes long audio quickly enough that a 45-minute call transcribes while you're still wrapping up notes. That matters psychologically. You're not waiting hours for text. Second, accuracy. We tested their model against competitors on sales calls, client meetings, even recordings with accents and technical jargon. Deepgram held up consistently. Third, the pricing model made sense for us. We could build Pro and Team tiers with meaningful call allowances - 500 calls per month for Pro, 1,500 for Team - without the unit economics blowing the entire product apart.

And there was something else. Deepgram's API is straightforward to integrate. We didn't need a six-month engineering project. We could ship it in weeks and iterate based on what users actually needed.

When privacy and capability have to coexist

This is the part that matters most, so I want to be direct about it. Adding cloud transcription means audio leaves your phone. That's a boundary we don't cross lightly.

On the Free tier, you get full privacy. On-device only. Your audio never touches our servers. That's not a compromise. That's a feature.

When you move to Pro and enable cloud transcription, you're making an active choice. You're trading some privacy for capability. We make that choice obvious. You have to opt in. You can see exactly what's being sent. And if you need it, our Vault Mode encrypts your notes with AES-GCM before they sync to Scribr Cloud. The transcription itself is processed, returned, and not stored on our side longer than necessary.

Some people will choose to stay on-device only, even on Pro. That's fine. We support it. Others will choose cloud transcription because they need it for their work. Both paths exist.

The hard part was being honest about the trade-off, not dressing it up in language that pretends it doesn't exist.

What we've learned from real users

The 47-minute call I mentioned at the start came from a freelance consultant. She'd been using voice memos and transcribing by hand. Thirty minutes of transcription work per call. She tried Scribr with cloud transcription enabled. One minute later, she had text. Imperfect, but searchable. Usable. She went Pro that week.

We've heard similar stories from sales teams logging calls, researchers capturing interviews, legal professionals documenting consultations. The pattern is always the same: long audio needs a different tool. On-device transcription is great for one-offs and quick notes. For serious volume and length, you need cloud processing.

We've also learned that people care about where their data goes. More than we expected, honestly. Pro users regularly ask if Deepgram stores their audio (it doesn't, beyond the immediate processing window). They ask if we can see what they're saying (we can't). They ask if there's an encrypted option (Vault Mode handles that for the notes themselves). These aren't paranoid questions. They're normal questions from people who value privacy but also need their work done properly.

The practical limits of choice

We could have launched Scribr with only on-device transcription. Simpler product. Lower costs. Easier to market - 'totally private meeting app' is a clean pitch. But it would have been a lie by omission. Half our users would have bought Pro expecting to transcribe their long calls, hit a wall, and felt cheated.

Instead, we built both. On-device for those who want it. Cloud transcription for those who need it. Action-item extraction and summaries, which require cloud processing, only for Pro and above. The Free tier is genuinely useful, genuinely private, and genuinely doesn't pretend to be something it's not.

The complexity is real. It means more engineering. More decisions about encryption and data retention and user consent. More conversations with people who care about privacy. But it also means Scribr works for a therapist doing 50-minute sessions the same way it works for a student recording a lecture, or a researcher capturing interviews, or a salesperson logging deals.

That's not a trade-off. That's the whole point.

When you're building a tool for conversations, you're building for the full spectrum of how people actually talk. Short and long, private and shared, structured and rambling. The question isn't whether to support long audio. It's whether you'll be honest about what it takes to support it well.

Want to try Scribr?

Visit Scribr →