Why we picked Apple's on-device speech for free users
Three weeks before launch, a pastor in Manchester sent us a message. He'd been testing Clipr and asked a single question: 'Will my sermon recordings leave my phone?' That question kept me awake for two nights.
The obvious choice would have been easier
When we built the first version of Clipr, we used a server-side transcription service like everyone else does. It was the path of least resistance. Send audio to the cloud, get back text, run our engagement scoring, ship clips. Simple infrastructure. Predictable costs. We'd seen dozens of startups do it that way.
The problem arrived when we started thinking about pricing. Our users are church social media managers and pastors, not production studios with fat budgets. Many work part-time. Some volunteer. A two-clip-per-month free tier sounded generous until we did the maths on what server transcription would cost us if we scaled it to thousands of users. The unit economics were brutal. We'd either have to charge from day one, or burn cash until we couldn't.
But there was something else bothering me. Every transcription that left the device meant storing it somewhere. That sermon about a congregant's private grief. The podcast episode that touches on mental health. These are sensitive recordings. The legal surface area alone made my stomach turn. GDPR compliance, data residency, retention policies, audit trails. We're a small studio. We don't want to be in the business of holding onto people's spiritual content on servers.
Apple's speech recognition was already in the phone
Then we realized something obvious but overlooked: iOS has built-in speech-to-text that runs entirely on-device. Apple's Speech framework does transcription without ever touching a server. It's been there for years, tucked into the OS, used by apps like Notes and Voice Memos. Fast. Private. No cost to us per transcription.
The first time we tested it, we were skeptical. On-device models are usually slower, less accurate than cloud-based alternatives. But we fed it a 45-minute sermon and waited. Six minutes later we had the transcript. Accuracy was genuinely solid for spoken English, especially the kind of clear, articulate speech you get from a pulpit or microphone.
The real win wasn't speed or accuracy though. It was the moment the pastor's question made sense. His sermon never left his iPhone. Not in transit. Not in storage. Not in some third-party's data centre. The transcription happened locally, got used to power our moment-scoring engine, and the audio stayed on his device unless he explicitly chose to export it.
We had to be honest about what on-device means
We didn't pretend on-device transcription was perfect. It isn't. Background noise, overlapping voices, heavy accents, regional dialect features - all of these can trip up Apple's model. A podcast recorded in a room with echo will sometimes produce garbled passages. A loud church with reverb requires a decent microphone to transcribe cleanly.
But here's what we discovered: for the content Clipr is built for, it works. Pastors who preach into a lapel mic. Podcasters in quiet rooms. Content creators who've already done the work to produce good audio. These users get fast, private transcription with no monthly bill hanging over their head.
For users who need more precision, our Creator tier adds access to our AI moment-scoring service running server-side. That's where we can re-rank and refine clips based on engagement signals. But the base layer, the transcription itself, stays on-device. Free users get genuinely useful features without asking us to hold their data.
The cost of staying small and honest
There's a reason most apps chase server infrastructure. It scales differently. It's easier to add features when your model lives in the cloud. We've traded some of that flexibility for something we think matters more: a free tier that doesn't require us to monetize our users' content or privacy.
That decision has real constraints. We can't offer transcription for iOS versions before a certain point. We can't support every language. We can't magically improve accuracy by throwing more compute at the problem. But we also don't have to send a sermon about grief to a third party. We don't have to manage data retention. We don't have to build a privacy policy that reads like a hostage negotiation.
The pastor in Manchester took the free tier. His sermon stayed on his phone. He's been using Clipr ever since, and he's never asked us where his recordings are stored. I think he already knew the answer.
Privacy and business model design are tangled together in ways people don't often discuss openly. When you choose where your code runs, you're not just making a technical decision. You're choosing who you answer to when users ask where their voice went. What would it mean for the tools you use every day if that question became the first one engineers asked instead of the last?
Ready to try Clipr by MRVL?
One tap to download. No sign-up wall.