Help us improve
Share bugs, ideas, or general feedback.
From fiftyone
Finds duplicate or near-duplicate images in FiftyOne datasets using brain similarity computation. Use when deduplicating datasets, finding similar images, or removing redundant samples.
npx claudepluginhub anthropics/claude-plugins-official --plugin fiftyoneHow this skill is triggered — by the user, by Claude, or both
Slash command
/fiftyone:fiftyone-find-duplicatesThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**ALWAYS follow these rules:**
Curates FiftyOne datasets: inspect schema, audit annotations, analyze class distributions, find duplicates, create subsets, and build train/val/test splits. Works with any CV dataset type.
Runs perceptual hash duplicate analysis on an Immich photo library to find cross-source and internal duplicates, generating a detailed report with removal recommendations.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Share bugs, ideas, or general feedback.
ALWAYS follow these rules:
set_context(dataset_name="my-dataset")
Brain operators are delegated and require the app:
launch_app()
Wait 5-10 seconds for initialization.
# List all brain operators
list_operators(builtin_only=False)
# Get schema for specific operator
get_operator_schema(operator_uri="@voxel51/brain/compute_similarity")
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={"brain_key": "img_sim", "model": "mobilenet-v2-imagenet-torch"}
)
close_app()
# Set context
set_context(dataset_name="my-dataset")
# Launch app (required for brain operators)
launch_app()
# Check if brain plugin is available
list_plugins(enabled=True)
# If not installed:
download_plugin(
url_or_repo="voxel51/fiftyone-plugins",
plugin_names=["@voxel51/brain"]
)
enable_plugin(plugin_name="@voxel51/brain")
# List all available operators
list_operators(builtin_only=False)
# Get schema for compute_similarity
get_operator_schema(operator_uri="@voxel51/brain/compute_similarity")
# Get schema for find_duplicates
get_operator_schema(operator_uri="@voxel51/brain/find_duplicates")
# Execute operator to compute embeddings
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "img_duplicates",
"model": "mobilenet-v2-imagenet-torch"
}
)
execute_operator(
operator_uri="@voxel51/brain/find_near_duplicates",
params={
"similarity_index": "img_duplicates",
"threshold": 0.3
}
)
Threshold guidelines (distance-based, lower = more similar):
0.1 = Very similar (near-exact duplicates)0.3 = Near duplicates (recommended default)0.5 = Similar images0.7 = Loosely similarThis operator creates two saved views automatically:
near duplicates: all samples that are near duplicatesrepresentatives of near duplicates: one representative from each groupAfter finding duplicates, use set_view to display them in the FiftyOne App:
Option A: Filter by near_dup_id field
# Show all samples that have a near_dup_id (all duplicates)
set_view(exists=["near_dup_id"])
Option B: Show specific duplicate group
# Show samples with a specific duplicate group ID
set_view(filters={"near_dup_id": 1})
Option C: Load saved view (if available)
# Load the automatically created saved view
set_view(view_name="near duplicates")
Option D: Clear filter to show all samples
clear_view()
The find_near_duplicates operator adds a near_dup_id field to samples. Samples with the same ID are duplicates of each other.
Option A: Use deduplicate operator (keeps one representative per group)
execute_operator(
operator_uri="@voxel51/brain/deduplicate_near_duplicates",
params={}
)
Option B: Manual deletion from App UI
set_view(exists=["near_dup_id"]) to show duplicatesclose_app()
| Tool | Description |
|---|---|
set_view(exists=[...]) | Filter samples where field(s) have non-None values |
set_view(filters={...}) | Filter samples by exact field values |
set_view(tags=[...]) | Filter samples by tags |
set_view(sample_ids=[...]) | Select specific sample IDs |
set_view(view_name="...") | Load a saved view by name |
clear_view() | Clear filters, show all samples |
Use list_operators() to discover and get_operator_schema() to see parameters:
| Operator | Description |
|---|---|
@voxel51/brain/compute_similarity | Compute embeddings and similarity index |
@voxel51/brain/find_near_duplicates | Find near-duplicate samples |
@voxel51/brain/deduplicate_near_duplicates | Delete duplicates, keep representatives |
@voxel51/brain/find_exact_duplicates | Find exact duplicate media files |
@voxel51/brain/deduplicate_exact_duplicates | Delete exact duplicates |
@voxel51/brain/compute_uniqueness | Compute uniqueness scores |
For accidentally duplicated files (identical bytes):
set_context(dataset_name="my-dataset")
launch_app()
execute_operator(
operator_uri="@voxel51/brain/find_exact_duplicates",
params={}
)
execute_operator(
operator_uri="@voxel51/brain/deduplicate_exact_duplicates",
params={}
)
close_app()
For visually similar but not identical images:
set_context(dataset_name="my-dataset")
launch_app()
# Compute embeddings
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={"brain_key": "near_dups", "model": "mobilenet-v2-imagenet-torch"}
)
# Find duplicates
execute_operator(
operator_uri="@voxel51/brain/find_near_duplicates",
params={"similarity_index": "near_dups", "threshold": 0.3}
)
# View duplicates in the App
set_view(exists=["near_dup_id"])
# After review, deduplicate
execute_operator(
operator_uri="@voxel51/brain/deduplicate_near_duplicates",
params={}
)
# Clear view and close
clear_view()
close_app()
Find images similar to a specific sample:
set_context(dataset_name="my-dataset")
launch_app()
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={"brain_key": "search"}
)
execute_operator(
operator_uri="@voxel51/brain/sort_by_similarity",
params={
"brain_key": "search",
"query_id": "sample_id_here",
"k": 20
}
)
close_app()
Error: "No executor available"
find_near_duplicates, deduplicate_near_duplicatesError: "Brain key not found"
compute_similarity first with a brain_keyError: "Operator not found"
download_plugin() and enable_plugin()Error: "Missing dependency" (e.g., torch, tensorflow)
missing_package and install_command{
"error_type": "missing_dependency",
"missing_package": "torch",
"install_command": "pip install torch"
}
Similarity computation is slow
mobilenet-v2-imagenet-torchlist_operators() and get_operator_schema() to get current operator names and parametersbrain_keyEmbedding computation time:
Memory requirements: