The caption problem we solved for sermon clips
A pastor messaged us in week two of Clipr's beta: "I love the clips, but I have to add captions in another app before posting. That's three apps now." She was right. We'd automated the hard part - finding the best 60 seconds of a two-hour sermon - but left creators fumbling with captions.
The moment we realised captions couldn't wait
When Clipr first launched, the workflow was incomplete. You'd export a clip, vertical and ready for TikTok, but it arrived caption-free. Most creators then opened CapCut or another tool, burned in text by hand or let an app add automated subtitles, and only then posted. That friction bothered us. Not because it was a minor inconvenience, but because it sat at the exact moment when momentum dies. Someone gets five perfect clips from a sermon. They're excited. Then they remember: "Right, I need to subtitle these." Two of them never get posted.
We decided captions should be baked into the export itself, not a separate step. The question was how to do it well and not slow down the app.
Why we put it on-device, not in the cloud
Building Clipr, we'd already committed to on-device speech transcription via Apple Speech. The sermon audio is transcribed locally on your iPhone or iPad, not sent to a server. That decision gave us speed and privacy in one move. When we designed auto-captions, we kept the same philosophy. The transcript that powers the caption timing lives on your device. The caption rendering happens on-device. Nothing leaves your phone unless you export and upload it yourself.
This matters more than it sounds. Cloud captions mean waiting, latency, potential privacy questions. On-device means you hit export and thirty seconds later you have a subtitle-ready video clip sitting in your Photos app. No network call. No transcription service queue. Just a local process that's been tested on thousands of sermons.
What baked captions actually look like
The captions sit at the bottom of your vertical clip, white sans-serif text on a semi-transparent dark bar. They sync to the speech timing pulled from the on-device transcript. When someone speaks, the text appears. When they pause, the text clears. The styling is intentionally plain. Not trendy. Not competing for attention. The words matter; the sermon matters. The caption is there to make sure someone scrolling on mute still understands what's being said.
We sized the text so it reads on a phone screen without squinting, and the timing respects natural speech rhythm. Captions don't jump ahead of the speaker or lag behind. They flow.
This is a Creator tier feature, which means unlimited captions on as many clips as you need. It's not locked behind a per-clip paywall or turned off on the free tier. We decided early that if captions mattered enough to build, they mattered enough to make widely available. Free users still get two clips a month, but those clips ship without captions. The moment someone moves to Creator, captions are part of every export.
Why this saves you more than time
The practical payoff is obvious: you save the step of opening another app. But there's a secondary benefit that became clear after the first month of shipping this. Captions mean your clips reach people who watch with sound off. That's a huge chunk of social media consumption. Someone scrolling through Reels at lunch with the phone on silent. A commuter on a train. Someone in a waiting room. Without captions, your sermon clip doesn't land for them. With captions baked in, it lands every time.
We also noticed something in the feedback: creators felt more confident posting directly. The clip looked finished. Professional. It wasn't a rough cut waiting to be polished in post-production. It was done. That confidence matters because it leads to posting more clips, which leads to more people actually encountering your church's teaching.
The technical trade-off we made
Nothing comes free. On-device caption rendering means the export file is slightly larger and the export itself takes a few extra seconds. For most clips under two minutes, we're talking about five to eight seconds of additional processing. We could have shaved that time by moving captions to the cloud, but that would have meant delay, a network dependency, and the complexity of managing transcription state across servers. We chose the local option and accepted the extra seconds.
This is a choice we think about every time we ship a feature. Do we optimise for speed or for keeping everything local and private? Most of the time, the on-device version wins even when it's harder to build.
What's next
We're listening to feedback on caption styling. Some creators have asked about font options or the ability to change the background colour. We're exploring that carefully, because we want captions to stay readable and uncluttered. A serif font or a neon background might look trendy for a week, but it also makes the words harder to read at phone size and distances. The default we've chosen works across nearly every church, teaching style, and sermon length we've tested. But we'll keep that door open if a real need emerges.
For now, captions are baked. Automatic. Part of what makes Clipr finish the job for you.
If you're posting sermon clips to TikTok or Reels without captions, how many viewers do you think you're missing on a Tuesday morning when someone has their phone on mute?