The caption decision we didn't expect to matter
Three months into Clipr's beta, a pastor from Manchester sent a message that stopped me in my tracks. He'd exported a clip, uploaded it to Instagram Reels, and watched the caption layer shift slightly when the platform compressed the video. His quote marks became artifacts. His emphasis became noise. He asked a simple question: 'Why can't the captions just live in the video itself?'
The problem nobody talks about
Here's what most creators don't realise until it's too late. When you add captions as an overlay, they're vulnerable. Every platform handles video compression differently. Instagram squashes it one way. TikTok another. YouTube Shorts has its own rules. Your carefully timed captions start to slip, duplicate, or blur. The text that made the moment land becomes visual noise.
We'd initially built Clipr to handle the heavy lifting: finding the best moments in your sermon, reformatting them to vertical 9:16, scoring them for engagement. But when that Manchester pastor pointed out the caption problem, I realised we'd stopped halfway. We were giving creators the clip, but not protecting the caption.
That's when we decided to bake captions directly into the video file itself. Not as an adjustable layer. Not as something you'd add later in another tool. Baked in, permanent, platform-proof.
Why baking matters more than layering
Think about how people watch short-form video on their phones. They're scrolling fast. Sound's often off. The caption is the lifeline. If it shifts when the video uploads, if it blurs under compression, if it disappears on one platform but shows on another, you've lost the viewer before they've lost interest.
When captions are baked into the file, they're part of the image itself. They don't layer. They don't shift. They don't respond to platform compression because they're not a separate element anymore. They're the video.
For pastors and church social media managers, this matters in ways that go beyond technical neatness. A phrase of scripture, a moment of teaching, a question posed to your congregation. If the caption carries that idea, and the caption survives every platform you post to, then the message survives intact. That's not a feature. That's protection.
The workflow shift this creates
We tested this with a few creators before shipping it as standard in Creator and Pro plans. The feedback was immediate: they stopped using external caption tools for their clips. They stopped adjusting text after export. They stopped guessing whether a caption would hold up on TikTok versus Reels versus YouTube Shorts.
Instead, they took a clip from Clipr, checked that the captions matched what they'd said, and uploaded it directly. That's it. No intermediate steps. No worrying about whether their message would degrade in transit.
One podcaster told us she was spending 40 minutes per clip in post because she was re-layering captions after finding issues on different platforms. With baked captions, that disappeared. Forty minutes became five minutes. And the clips looked the same everywhere.
How the transcription feeds the caption
The captions themselves come from on-device transcription using Apple Speech. It's not perfect, but it's immediate and it lives on your phone, not on someone's server. We tested this approach because pastors in particular often record privately before they share publicly. On-device transcription means your sermon transcript never leaves your device unless you choose to export it.
From there, the moment scoring service identifies which parts of your sermon are worth clipping. And once a clip is selected, the corresponding transcription snippet is formatted and baked directly into the video. The whole thing happens in sequence, in your app, on your terms.
This is why captions aren't an afterthought in Clipr. They're part of the core promise: you record your sermon or podcast, Clipr finds the moments worth sharing, and it outputs clips that are ready to post everywhere without decay.
What this means for the creator who's busy
The original problem we were solving was time. Pastors and podcasters don't have four hours a week to edit short-form content. So we automated the heavy work: finding moments, reformatting to vertical, scoring for engagement.
But we realised that automation only works if the output is genuinely ready to post. And genuinely ready to post means captions that don't degrade. It means you can export on Monday, schedule it for Friday, and know it'll look the same everywhere it lands. Not almost the same. Not close enough. The same.
That's the thing about serving creators who are already running on empty. You can't ask them to fix things on the backend. You have to ship them right the first time.
When you're uploading a clip that carries a message, does it matter more to you that the video is perfectly edited, or that the words actually survive the journey from your phone to someone else's?