The voice capture decision we almost got wrong

Three weeks before launch, our support inbox filled with a single request from beta users. 'Can I just speak my ideas?' they asked, across dozens of messages. We had a voice memo feature sketched in, but we hadn't built it yet.

The problem we heard from creators

Most creators I spoke to during research were doing the same thing. They'd get an idea mid-walk, mid-commute, mid-conversation. They'd open their phone, hunt for a notes app, type a messy fragment, and hope they'd remember what they meant by the time they could flesh it out later.

The friction was tiny, but it happened dozens of times a month. And every time, some ideas got lost. A pastor told me he'd stopped carrying a notepad because his phone was 'better', but then half his sermon notes scattered across Voice Memos, Notes, and whatever else he'd tapped first in a hurry. A podcaster said she'd started voice-recording everything into her memo app, then spending an hour a week manually transcribing and organising them.

These weren't edge cases. This was the actual workflow of the people we built Ideas! for.

The temptation to outsource it

When we finally prioritized voice capture, the easiest path was obvious. Send audio to a cloud service, get back text, store both, move on. Fast to build. Reliable. Lots of companies do exactly that.

But there was a problem. These are often private ideas. A writer capturing a scene from her marriage. A coach thinking through advice he hasn't shared publicly yet. A pastor workshopping language about grief or doubt. These aren't polished thoughts. They're raw material.

Sending that to a remote server felt wrong, even if the service promised encryption. It felt like we'd be asking creators to trust someone else with their half-formed thinking. So we chose to look for another way.

Building on-device, keeping control local

iOS has built its own speech recognition engine into the framework. It's called SFSpeechRecognizer. It does the transcription right on the device, which meant we could offer voice capture that never left the user's phone unless they explicitly synced to the cloud later.

The engineering trade-offs were real. On-device transcription is slower than cloud services. It requires the phone to be connected (though it doesn't need internet). And it's less flexible if you want fancy language models. But the benefit outweighed everything: a creator's raw ideas stay private by default. The transcription happens instantly and locally. We see nothing. Their device sees everything.

Launch week, we watched the feature get used immediately. A lot. Within the first fortnight, thousands of voice ideas were captured. The average length was short - forty five seconds, maybe a minute. Which makes sense. When the friction drops to 'tap record, speak, done', people capture more.

What staying on-device actually means

Privacy gets thrown around a lot in tech. We're not claiming Ideas! is anonymous, or that we're breaking new ground. What we're saying is simpler: if you want to keep an idea private, you can. The voice data and transcription don't have to leave your device. They don't route through anyone else's server. iCloud sync is optional and encrypted end to end, but that's your choice.

For a lot of creators, that's the difference between capturing an idea and not capturing it. A writer friend of mine said she'd never record anything into a generic app because she doesn't know who's listening. With Ideas!, she records. The idea gets transcribed on her phone. If she wants it backed up across her devices, she syncs it. If she wants to delete it, it's gone from her device and nowhere else.

That simplicity shaped how we think about the whole app. On-device transcription meant we could keep Ideas! fast, offline-capable, and low-friction. No waiting for a cloud server. No Wi Fi requirement for the core feature. Tap record, speak, get text back in seconds.

The cost of choosing right over easy

Building voice capture with on-device transcription was slower than the cloud alternative. We had to learn the framework, handle edge cases with accents and background noise, and test on different devices. It took longer to ship.

We also can't offer some features that cloud-based competitors can. We can't do real time language detection, or process multiple speakers, or build a universal transcription model trained on millions of hours of audio. We made a bet that creators cared more about privacy and speed than those features. So far, that's held.

What surprised us most was how few people ask us to change it. When a user reports a transcription that went slightly wrong, they don't ask us to improve the algorithm. They ask if there's a way they can edit it manually. They understand the trade-off. They chose the app partly because of it.

Building for the person, not the platform

Ideas! is built for people who make something intentional. Pastors planning sermons. Podcasters capturing fragments. Writers thinking out loud. These aren't casual note-takers. They're people who believe what they're working on matters, and they want control over how their thinking is stored and shared.

That shaped everything about voice capture. It had to be private by default. It had to work instantly. It had to turn speech into usable text without friction. On-device transcription was the only way to do all three at once.

The alternative would have been faster to build and simpler to scale. But it would have built the app around a different set of priorities. We chose differently because our creators chose differently. That's the decision that kept coming back. Not 'what's easiest to build', but 'what does the person who uses this actually need'.

If you've ever stopped yourself from recording an idea because you weren't sure where it would end up, you already understand why we built voice capture the way we did. The question isn't really about on device versus cloud. It's about whose hands your thinking lives in before you're ready to share it.