The quiet case for on-device transcription

Three weeks before we shipped Clipr, our lead developer found a message in Slack from a pastor in rural Cornwall. He'd tried our beta on a Sunday afternoon, uploaded a 90-minute sermon, and then panicked. His internet connection had dropped halfway through. The video was stuck in the cloud. He had a service on the following Sunday and no way to salvage clips from his recording. That one moment changed how we thought about transcription.

The privacy conversation we didn't expect to have

When we started building Clipr, we knew pastors and podcasters needed their long-form content turned into clips for social media. That was clear. What surprised us was how often creators mentioned privacy in early conversations. Not because they had anything to hide, but because a sermon, a podcast episode, or a teaching recording is often sensitive in ways that other video isn't. A pastor might discuss a congregant's prayer request. A podcast host might reference a personal story they haven't published elsewhere yet. A faith teacher might work through theological ground they're still processing.

Cloud transcription meant sending that audio to a server somewhere. It meant a transcript living in a vendor's database. It meant hoping the vendor's terms of service were friendly to religious content, to UK data law, to the idea that this wasn't just content but potentially pastoral material.

On-device Apple Speech transcription solved that without making creators feel like they were choosing a worse tool. The transcript never leaves the phone. It runs locally. The only thing that goes to our servers is the text result, which we use for AI moment scoring. The audio itself stays yours.

The speed problem nobody talks about

You don't notice latency until you're standing in a church office waiting for a server to finish work. A pastor had just finished recording a 60-minute service and wanted to pull a single highlight clip before the church's 6pm social media post. On-device transcription means the text is ready by the time the export is done. Cloud transcription means waiting. It means queuing. It means checking your phone every 30 seconds.

We timed it. On a typical sermon recording, Apple Speech finishes within 5 to 10 minutes on device, depending on device speed and audio quality. The alternative would have been 15 to 30 minutes waiting for cloud processing, plus whatever queue time our servers were under. For a pastor working between services or a social media manager on a tight deadline, that difference mattered more than we expected.

Speed also meant reliability. If our servers go down, your transcription still works. If Apple's Speech processing has an issue, it's something you can troubleshoot on your own device. We're not a single point of failure in someone's workflow.

What we gave up and why it was worth it

On-device transcription isn't magic. It's accurate for English, particularly clear audio, and structured speech like sermons and podcasts. It's less accurate than some cloud services for heavily accented speech, very noisy recordings, or multiple speakers talking over each other. That's the real trade-off. We spent weeks deciding whether to support cloud transcription as a fallback for edge cases.

In the end, we chose depth over breadth. We optimised Clipr for the thing it does: turning long-form sermons and teaching content into short clips. That content is usually well-recorded, usually single-speaker or speaker-plus-congregation. The accuracy is good enough. The privacy and speed wins were too important to sacrifice for marginal gains in handling poor-quality audio.

That decision cascaded through the whole product. It meant we could ship a free tier with meaningful functionality. It meant we didn't need to charge for transcription separately. It meant creators in areas with poor internet could still use Clipr. It meant a pastor in a rural church wasn't dependent on cloud capacity, on server uptime, on us having the infrastructure budget to handle a spike in usage during a denominational conference.

The moment it proved itself

Launch week. A creator uploaded five videos in a batch (that's a Pro feature we're particularly proud of, though we'll get to that another time). One was a podcast episode recorded in a noisy kitchen, one was a clarity-sharp sermon from a well-miked church, one was an outdoor teaching event with wind noise, one was a very soft-spoken Bible study, and one was a recording from a laptop speaker. Three of them transcribed cleanly. One was rough but usable. One came back with significant errors.

The creator messaged us asking if we had a better option. We offered no excuses, just honesty. If you're dealing with heavily degraded audio, cloud transcription would probably be better. But you'd lose the speed and privacy. Most of them chose to stick with on-device, re-record the problematic section at higher quality, and move on. One bought a better microphone. None of them asked us to switch to cloud transcription.

That told us we'd made the right call. People would rather have control and privacy and speed than offload their work to someone else's black box, even if that box was theoretically more powerful.

What on-device really means for your workflow

Here's what actually happens when you use Clipr. You record your sermon or podcast on your phone or you import a file you've already recorded. You hit the transcribe button. Your phone does the work locally. When transcription finishes, our AI moment-scoring service (that's the Creator+ and Pro feature) analyzes the text and identifies the moments most likely to work as standalone clips. The highest-scoring moments get surfaced first. You review them, pick the ones you want, and export. Captions are baked in. The format is 9:16 vertical, ready for TikTok or Reels. You download the clip, share it yourself.

That workflow works because everything that can stay on-device does stay on-device. The moment it needed to be cloud-dependent, the whole thing breaks down in poor connectivity, becomes dependent on our server capacity, introduces privacy questions. On-device Apple Speech isn't a technical decision. It's a philosophical one. It means we're building a tool for creators, not building creators as users of our infrastructure.

When you're evaluating tools for your church or podcast, ask where your words actually go. That question will tell you more about a tool's design than any feature list ever could.