Skill

duplicate-report

Runs perceptual hash duplicate analysis on an Immich photo library to find cross-source and internal duplicates, generating a detailed report with removal recommendations.

Python

data-engineering

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/immich-photo-manager:duplicate-report

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

**Before doing ANYTHING else in this skill, call `ping` on the Immich MCP server.**

SKILL.md

228 lines · ~2.3k tokens

Stats

LanguageHTML

Stars30

Forks8

MaintenanceExcellent

Last CommitJul 20, 2026

Actions

View Source View Plugin View on GitHub View README

Duplicate Report

⚠️ Connection Required — ALWAYS CHECK FIRST

Before doing ANYTHING else in this skill, call ping on the Immich MCP server.

If ping succeeds → proceed with the skill normally.
If ping fails or the MCP tools are not available → STOP. Do not continue. Tell the user:

❌ Immich is not connected. This plugin needs a running Immich MCP server to work.

Run /setup-immich-photo-manager to configure your Immich connection. You'll need:

Your Immich server URL (e.g., http://192.168.1.100:2283)

An Immich API key (how to create one)

The MCP server configured (see /setup-immich-photo-manager)

Nothing in this plugin will work until the connection is configured.

Do NOT skip this check. Do NOT try to run any other tool first. Always ping, always block if it fails.

Generate a comprehensive duplicate analysis of an Immich photo library. Uses perceptual hashing to find visually identical photos even when they have different checksums (common when photos are exported from Apple Photos and Google Photos).

Why Perceptual Hashing?

When users import the same photo library from multiple sources (Apple Photos export, Google Takeout, manual folder copies), the files are often re-encoded by each platform. This means:

Checksums differ — same photo, different binary → SHA/MD5 won't match
Immich's built-in CLIP duplicate detection uses too strict a threshold for re-encoded content
Filename matching catches only a fraction (filenames often differ across platforms)

Perceptual hashing (pHash) computes a fingerprint based on the visual content of the image, not the binary data. Two re-encoded copies of the same photo produce the same perceptual hash.

Prerequisites

The user's machine needs:

pip3 install Pillow imagehash pillow-heif --break-system-packages

Pillow — image loading
imagehash — perceptual hashing
pillow-heif — HEIC/HEIF support (critical for Apple Photos)

Analysis Workflow

Step 0: ML-Based Duplicate Detection (Quick)

Before running the full perceptual hash scan, check Immich's built-in ML duplicate detection:

result = get_duplicates()

This returns groups of visually similar assets detected by Immich's ML engine. Present the count and let the user resolve obvious duplicates immediately using resolve_duplicates.

This is fast (no disk scan needed) but may miss re-encoded copies across import sources. For comprehensive cross-source analysis, proceed to Step 1.

Note: resolve_duplicates handles Immich ML duplicates natively. Perceptual hashing (Steps 1–3 below) catches cross-source re-encoded duplicates that ML may miss.

Step 1: Discover Import Sources

Query Immich to identify distinct import sources from asset paths:

SELECT
  CASE
    WHEN "originalPath" LIKE '%Apple Fotos%' OR "originalPath" LIKE '%Apple Photos%' THEN 'Apple Photos'
    WHEN "originalPath" LIKE '%Google Fotos%' OR "originalPath" LIKE '%Google Photos%' THEN 'Google Photos'
    ELSE split_part("originalPath", '/', 5)  -- or whatever level gives the source folder
  END as source,
  count(*) as total
FROM asset WHERE "deletedAt" IS NULL
GROUP BY source ORDER BY total DESC;

Present the sources to the user and ask which ones to compare.

Step 2: Run Perceptual Hash Scan

For each source directory, scan all image files and compute 256-bit perceptual hashes:

from pillow_heif import register_heif_opener
register_heif_opener()

from PIL import Image
import imagehash

def compute_phash(filepath):
    with Image.open(filepath) as img:
        if img.mode != 'RGB':
            img = img.convert('RGB')
        return str(imagehash.phash(img, hash_size=16))

Key parameters:

hash_size=16 → 256-bit hash (high accuracy, very few false positives)
Use ThreadPoolExecutor (NOT ProcessPoolExecutor — native HEIF libs deadlock on fork)
4 workers is optimal for most machines
Report progress every 500 files

Expected performance: ~500 files/30 seconds on Apple Silicon, ~200 files/30 seconds on Intel.

Step 3: Compute Overlap

Compare hash sets between sources:

common = set(source_a_hashes.keys()) & set(source_b_hashes.keys())
a_only = set(source_a_hashes.keys()) - set(source_b_hashes.keys())
b_only = set(source_b_hashes.keys()) - set(source_a_hashes.keys())

For internal duplicates within a single source:

internal_dupes = sum(len(v) - 1 for v in hashes.values() if len(v) > 1)

Step 4: Generate Report

Present findings in a structured report:

DUPLICATE ANALYSIS REPORT

Library: [total] assets ([photos] photos + [videos] videos)
Sources analyzed: [Source A] ([count] files), [Source B] ([count] files)

CROSS-SOURCE DUPLICATES
  [Source A] <-> [Source B] visual matches:    [count] ([pct]% overlap)

UNIQUE TO EACH SOURCE
  [Source A]-only photos:               [count]
  [Source B]-only photos:               [count]

INTERNAL DUPLICATES
  Within [Source A]:                    [count]
  Within [Source B]:                    [count]

TOTAL REMOVABLE
  Cross-source duplicates:         [count]
  Internal duplicates:             [count]
  TOTAL:                           [count] files

RECOMMENDATION
  Keep: [Source with better metadata/folder structure]
  Remove: [Other source] copies where match exists
  Review: [count] [other]-only photos are NOT duplicates — keep them

Step 5: Removal (User-Approved)

NEVER auto-remove. Always:

Present the report with counts
Ask user which categories to remove
Confirm the exact count
Execute removal in two steps: a. Move to Immich trash: delete_assets(asset_ids=[...], force=False) — safer, recoverable via restore_assets or restore_trash b. Physical file removal from disk (os.remove()) only after user confirms trash is correct c. For permanent deletion (user explicitly requests): delete_assets(asset_ids=[...], force=True) — irreversible
Log everything to a JSON file for audit

Batch Immich deletions in groups of 100 assets per call. For ML-detected duplicates, prefer resolve_duplicates which handles them natively in Immich.

Step 6: Verify

After removal, query Immich statistics to confirm the new count and present before/after comparison.

Report Variations

Quick Report (no disk scan)

Uses only Immich database — checksums, filenames, timestamps. Fast but misses re-encoded duplicates.

-- Exact checksum duplicates
SELECT checksum, count(*) FROM asset
WHERE "deletedAt" IS NULL
GROUP BY checksum HAVING count(*) > 1;

-- Filename overlap between sources
SELECT count(*) FROM (
  SELECT "originalFileName" FROM asset WHERE "originalPath" LIKE '%Source A%'
  INTERSECT
  SELECT "originalFileName" FROM asset WHERE "originalPath" LIKE '%Source B%'
) t;

Full Report (perceptual hash)

Scans actual files on disk. Catches re-encoded duplicates. Requires filesystem access and Python dependencies. Takes 10-20 minutes for ~40K photos on Apple Silicon.

Year-by-Year Breakdown

Shows which source dominates each year — helps users understand their photo ecosystem history:

SELECT year, source_a_count, source_b_count,
  CASE WHEN source_a_count > source_b_count THEN 'Source A' ELSE 'Source B' END as dominant
FROM (
  SELECT extract(year from "localDateTime") as year,
    count(*) FILTER (WHERE "originalPath" LIKE '%Source A%') as source_a_count,
    count(*) FILTER (WHERE "originalPath" LIKE '%Source B%') as source_b_count
  FROM asset WHERE "deletedAt" IS NULL
  GROUP BY year
) t ORDER BY year;

Important Notes

Perceptual hashing has rare false positives — two visually very similar (but different) photos may share a hash. The 256-bit hash size minimizes this, but users should spot-check a few matches before bulk removal.
Videos are excluded from perceptual hashing — they need a different approach (frame extraction + hashing).
HEIC support is essential — without pillow-heif, Apple Photos libraries will have massive error rates (50%+ of files).
ThreadPoolExecutor, not ProcessPoolExecutor — native HEIF libraries deadlock when forked on macOS. Always use threads.
Background Immich scanning may add new assets during analysis. Note this in the report if the post-cleanup count seems off.

duplicate-report

Popularity

Invocation

Context Preview

SKILL.md

duplicate-report

Popularity

Invocation

Context Preview

SKILL.md

Duplicate Report

⚠️ Connection Required — ALWAYS CHECK FIRST

Why Perceptual Hashing?

Prerequisites

Analysis Workflow

Step 0: ML-Based Duplicate Detection (Quick)

Step 1: Discover Import Sources

Step 2: Run Perceptual Hash Scan

Step 3: Compute Overlap

Step 4: Generate Report

Step 5: Removal (User-Approved)

Step 6: Verify

Report Variations

Quick Report (no disk scan)

Full Report (perceptual hash)

Year-by-Year Breakdown

Important Notes

Similar Skills

Duplicate Report

⚠️ Connection Required — ALWAYS CHECK FIRST

Why Perceptual Hashing?

Prerequisites

Analysis Workflow

Step 0: ML-Based Duplicate Detection (Quick)

Step 1: Discover Import Sources

Step 2: Run Perceptual Hash Scan

Step 3: Compute Overlap

Step 4: Generate Report

Step 5: Removal (User-Approved)

Step 6: Verify

Report Variations

Quick Report (no disk scan)

Full Report (perceptual hash)

Year-by-Year Breakdown

Important Notes

Similar Skills