Why we built Scribr's long-audio transcription around Deepgram

A consultant messaged us in week three of Scribr's public launch. 'Your app is brilliant for my 20-minute client calls,' she wrote, 'but I have a two-hour annual review recorded on my phone. It won't transcribe.' That message sat with me for two days. We had built Scribr on-device transcription first (Whisper and Apple Speech), and it works beautifully for quick voice notes and short meetings. But the moment we shipped, we realised we'd left out a whole category of work: the long-form audio that matters most.

The gap between what people record and what we could process

On-device transcription has real advantages. Your audio stays on your phone. No upload. No privacy concerns. It's why we made it free, and why we still push it as the default for anyone recording a quick thought or a short standup. But there's a ceiling. Whisper and Apple Speech work on your phone's hardware, which means they're fast for 15 minutes, slow for 90 minutes, and frankly risky for anything longer. Battery drain. Device heat. The transcription starts to stall.

We began asking ourselves: what if someone needs to transcribe a recorded podcast episode they appeared on? A therapy session archive? A conference panel they attended? A deposition? These aren't edge cases among our users. They're part of the job. A legal professional, a researcher, a journalist, a therapist - they all carry audio files that matter, sometimes hours long, that they need searchable notes from. We couldn't ignore it.

Why cloud transcription needed to be an explicit choice, not a default

The moment we decided to offer cloud transcription, we made a decision that shaped everything after it. We would not force it. Free users would keep on-device transcription, their audio would stay their audio, and they'd get exactly what they expected: privacy, no surprises, no unexpected uploads. If someone wanted to send audio to the cloud for faster, longer transcription, they would opt in. They would pay for it. They would know what they were consenting to.

That's why cloud transcription lives in the Pro tier and above. It's deliberate. It's not gatekeeping a 'better' feature. It's recognising that sending your voice to the cloud is a different contract than keeping it local. Pro users get 500 AI calls per month, which translates to roughly 80 hours of transcription, depending on audio quality and length. That's real capacity for real work. Team tier gets 1,500 calls. Enterprise unlocks more. The point is: you're buying into it. You know what you're getting.

How Deepgram changed what long audio could be

We tested three cloud transcription services before launch. Most were built for web products, designed around upload buttons and dashboard playback. Scribr is mobile-first. We needed something that could handle iOS integration smoothly, return results fast, and handle variable audio quality without choking. Deepgram fitted. Their API is clean. The transcription engine scales. And they do something crucial: they ship support for very long audio without asking you to break it into chunks yourself.

When a Pro user taps the cloud transcription option on an audio file, Scribr uploads it to Deepgram's service, not to our servers. The transcription happens remotely. Once Deepgram returns the full transcript, Scribr processes it for summaries and action item extraction if the user has enabled those features. The whole flow takes minutes, not hours. A two-hour call that would have frozen a phone for an afternoon comes back clean and searchable within 15 minutes.

What matters here is that it's abstracted. Users don't care that it's Deepgram. They care that they can record or upload a long audio file, tap a button, and get a transcript they can search, highlight, and turn into notes. The infrastructure should be invisible.

The contact intelligence layer makes long audio actually useful

Cloud transcription alone would be half the story. We could have stopped at 'you upload, we transcribe, you get words back.' But Team users get something more: Contact Intelligence. When you transcribe a call with a contact, Scribr learns from your notes over time. It starts to recognise who's in your calls, what topics matter to you, what commitments you've made to specific people. Action items attached to contacts. Notes grouped by the people involved. Your archive of calls becomes queryable not just by keyword but by relationship.

This is where long-form audio transcription moves from convenience into strategy. A sales team member reviewing six months of client calls. A therapist tracking conversation patterns. A consultant building a knowledge base of client feedback. These workflows require the transcript to be more than text; it needs to be connected to the people and the work.

Encryption and compliance: cloud without compromise

The moment we started uploading audio to the cloud, we inherited a responsibility. Pro users get Vault Mode: AES-GCM encrypted notes, so even we can't read them. Team users get GDPR Compliance Modes, audit logs, and the ability to control where data sits and how long it lives. This matters for therapists, legal professionals, anyone handling sensitive conversations.

We don't market encryption as a feature. It's table stakes. The same applies to compliance. If you're on Team or Enterprise, you're likely working in a regulated space. We built the tooling so you can transcribe with confidence, knowing that your cloud transcription doesn't mean your data ends up in some generic SaaS archive.

A workflow that starts on your phone and doesn't make you choose

The beauty of building this into Scribr is that the choice is local to the moment. You're in a meeting. You hit record via the Quick Record Widget. The call happens. You stop. The audio lands on your phone. If it's 20 minutes, you might leave it for on-device transcription; it's done in seconds, it's private, it's instant. If it's 90 minutes, or if it's an upload from a file, you can tap 'Transcribe' and send it to Deepgram. Pro users see that option. Free users don't. You're not forced into a cloud-first paradigm. You're given options based on what you've paid for, and what you're comfortable with.

That consultant who messaged us in week three? We followed up after she upgraded to Pro. She transcribed her two-hour annual review, got a searchable transcript back in 12 minutes, extracted her action items, and said it saved her three hours of manual note-taking. That's the gap we closed.

Long audio is messy. It lives in phones, in email inboxes, in cloud storage, in half-forgotten voice memo apps. When someone asks you 'what was decided on that call two weeks ago?', the answer often means digging. What if your phone could just tell you?