We chose on-device transcription. Here's why that matters.

Three weeks before Clipr's first TestFlight build, our lead developer messaged the team with a single question: 'Do we really need to send anyone's sermon to our servers?' The answer stopped us cold. We didn't. And that decision shaped everything that came after.

The moment we realised privacy wasn't a selling point, it was the point

When we first talked to pastors and church social media managers about what they needed, the conversation never started with privacy. It started with time. 'I record a 40-minute sermon every Sunday,' one pastor told us. 'I want clips. I don't have four hours to edit them.'

Fair enough. So we built a tool to find the best moments automatically and turn them vertical. But then, almost casually, someone asked: 'Where does the video go?' We said the transcription would run on the server. They went quiet. Not because they were worried about us, but because they were thinking about their church. Some of their footage had personal moments, people praying by name, confessional conversations they'd never intended for a cloud database somewhere.

That conversation changed our roadmap. We weren't going to force churches to upload their footage to transcribe it. We'd let their phone do the work.

On-device transcription is slower. We did it anyway.

Let's be honest: Apple's on-device Speech framework isn't faster than sending audio to a server and getting results back in seconds. The first time someone imports a 45-minute podcast into Clipr, the transcription happens locally on their device. It takes time. There's no way around that physics.

But here's what we realised: slower is a feature when the alternative is uploading every bit of your content to the internet. A creator waits five minutes for their phone to transcribe. Their data never leaves their device. The moment scoring still happens on our servers (and it's quick), but the raw transcript, the full audio, the unedited dialogue? All of it stays where it was recorded.

When you're working with faith creators, with pastors who are often speaking candidly about real struggles in their congregation, that matters. Not as a marketing angle. As the right thing to do.

The real win: you control when to export

There's a subtle shift that happens when transcription runs on-device. You're not handing over your content to a system and waiting to see what it does with it. You're in charge of the moment you decide to process something.

You can import a video, let your phone transcribe it, review what Clipr's found, and then decide: do I actually want to turn this into clips? Or is this too raw, too personal, not meant for social? That's a human decision, and it happens before anything goes anywhere.

We know creators who've imported footage, read the transcript, and realised a particular moment shouldn't be clipped because the context would be lost or the person speaking wouldn't want it public. That's their choice to make first. Not something a server decides for them.

It shaped how we built everything else

Starting with on-device transcription meant we couldn't build Clipr as a generic video tool. We had to be specific about what we were solving: taking long-form spoken content (sermons, podcasts, teachings) and finding the moments worth sharing.

The moment scoring that runs on our servers works because it only has to rank the best 30 seconds out of 40 minutes. It's not processing video. It's working from a transcript and metadata your phone already created. That's why it's fast. Why it's accurate enough to actually be useful. Why the free tier can score clips and suggest them without charging you a penny.

The captions get baked into the output. The reformat to 9:16 happens automatically. All of it stems from one core decision: respect the data, do the heavy lifting on the device, and only handle what you actually need on the server.

This is what we're shipping from June

When Clipr's TestFlight build goes live on the 10th, that's what you're testing. A tool built around the assumption that your footage is yours first, your phone is capable, and a server should only help, not own.

Two clips a month free, with a watermark. Creator tier gives you 30 clips and removes the watermark. Pro adds batch processing and deeper scoring explanations. But regardless of which tier you're on, the transcription runs the same way: on your device, before anything else.

We spent weeks on that choice. Not because it's clever technically (it isn't, really). But because when you're building for faith creators, for people who are speaking truth in community, that respect has to come first.

If you've been looking for a way to turn your long-form content into clips without losing control of it, that's what Clipr is built for. The question worth asking yourself isn't whether the tool works. It's whether you trust where your content goes. What would change about how you approach social media if you knew it never had to leave your phone?

Ready to try Clipr by MRVL?

One tap to download. No sign-up wall.

Get it on the App Store

Want to try Clipr?

Visit Clipr →