Why duplicate detection is harder than it looks

Last month, a photographer emailed to say she'd deleted 340 duplicate photos in a single session. Her camera roll had 8,200 images. She'd been shooting weddings for three years without realising her phone was keeping near-identical frames from rapid bursts, cloud syncs, and accidental re-downloads. That's the moment I understood why duplicate detection can't just be a checkbox feature.

The problem with 'exact match'

When we first built Culr, I assumed duplicates would be straightforward. Find two files with identical byte signatures, flag them, done. Reality hit differently.

A photographer takes a photo on her iPhone. It syncs to iCloud. She downloads it again on the same device (because iCloud confused her, or a backup failed). Now she has two files. Same image, same moment, completely different digital fingerprints because the JPEG encoder adds metadata differently each time. A byte-for-byte comparison would miss it entirely.

Then there are screenshots. You screenshot a WhatsApp message. Two hours later, you open WhatsApp and it auto-downloads the same image from the chat. Your camera roll now has the photo twice, plus a screenshot of it. The byte signatures are completely different, but they're the same moment.

We realised early on that duplicate detection had to work on image content, not file checksums. That meant learning what actually makes two photos "the same" in a way humans understand it.

Vision, hashing, and the middle path

The tempting solution was to offload every comparison to the cloud. Send your photos to a server, run heavy computer vision models, get back a report. Fast for us, terrifying for our users.

We chose not to do that. Everything in Culr runs locally on your device. That's a constraint, but it's also a promise: your photos don't leave your phone.

What we settled on combines local image hashing with on-device vision processing. The phone calculates a perceptual hash for each photo, a sort of digital fingerprint based on the image content rather than the file bytes. Two photos of the same moment will produce very similar hashes even if they've been compressed differently, cropped slightly, or rotated. Then we use on-device vision clustering to group visually similar photos together and flag near-exact matches.

It's not perfect. A photo taken at nearly the same moment with a slightly different angle might be flagged as a duplicate when it's actually a keeper. That's why we give you the final say. You swipe to keep or delete. The detection is the suggestion, not the verdict.

Why the free tier has limits

Culr's free version lets you delete 50 duplicates a month. That's not a dark pattern. It's honesty about computational cost and testing.

Running perceptual hashing and vision clustering on thousands of photos takes processing power. On an older iPhone, it can take a few seconds to scan a large library. We cap the free tier because we want the app to feel fast and responsive for everyone. If someone tries to delete 500 duplicates in one go on a 5-year-old device, the phone would slow to a crawl and they'd leave us a one-star review.

Users who jump to Plus get unlimited duplicate detection because they've told us they're serious about cleaning their library. They're willing to wait a bit longer for a thorough scan. Most Plus subscribers handle it in a background task, maybe while they're making tea.

The cap also reflects what we learned from testing. The median user with a bloated camera roll has between 40 and 150 duplicates. At 50 a month, that's one or two cleaning sessions and you're done. If your library is genuinely pathological, Plus is two quid a month.

The iCloud moment

Here's the decision I'm still proud of: before Culr deletes any photo, it checks whether that photo has synced to iCloud.

We built this because we heard too many stories of people using cleaning apps and discovering later that they'd deleted a photo that never made it to the cloud. A syncing failure, a network glitch, a moment of bad luck. The photo was gone locally and never backed up.

Every deletion in Culr is preceded by a quick check of the iCloud sync status. If the photo hasn't synced yet, we warn you. You can choose to wait, force a sync, or accept the risk. It's one extra second per delete, but it's a second that might save someone's wedding photos.

This check runs even on the free tier. It's not a Plus feature. We think it should be table stakes for any app that touches your camera roll.

What duplicate detection isn't

Culr doesn't use neural networks to guess which photo is "better" and auto-delete the other. We're not running some AI model that ranks your images by aesthetic quality and throws away the ones it thinks are boring. That might sound efficient, but it's presumptuous. A slightly blurry photo of your friend's expression during a toast might be the one that matters to you.

What Culr does is find photos that are visually very similar, usually taken seconds apart, and group them so you can decide which one to keep. On Plus, we rank burst photos by sharpness to help you spot the keeper faster. But the delete is always yours.

That distinction matters because trust is the whole game here. People don't use photo-cleaning apps because they love data management. They use them because they're drowning in photos and they need help they can actually trust. The moment an app starts making irreversible decisions for you, you've lost them.

Duplicate detection is one feature, but it's the one that taught us the most about what Culr should be: fast enough to be useful, thorough enough to catch the mess you didn't see, and cautious enough to never delete something you didn't mean to lose. When was the last time you actually looked at your camera roll? You might be surprised what's living there.