Skill

file-search

Elasticsearch-based distributed file search across all cluster nodes. Use when searching for files, finding duplicates, or querying storage metadata.

From neely-brain-dump
Install
1
Run in your terminal
$
npx claudepluginhub built-simple/claude-brain-dump-repo --plugin neely-brain-dump
Tool Access

This skill is limited to using the following tools:

Bash
Skill Content

Distributed File Search System

Elasticsearch + FSCrawler deployment for searching files across the entire Proxmox cluster.

Architecture

                    ┌──────────────────────┐
                    │    Elasticsearch     │
                    │  192.168.1.122:9200  │
                    │   (CT501 Giratina)   │
                    └──────────┬───────────┘
           ┌───────────────────┼───────────────────┐
           │                   │                   │
    ┌──────▼──────┐     ┌──────▼──────┐     ┌──────▼──────┐
    │  Giratina   │     │    Talon    │     │   Victini   │
    │  1 Crawler  │     │  3 Crawlers │     │  3 Crawlers │
    │   RAID6     │     │   5.5TB     │     │    29TB     │
    └─────────────┘     └─────────────┘     └─────────────┘
           │                   │                   │
    ┌──────▼──────┐     ┌──────▼──────┐
    │    Hoopa    │     │  Silvally   │
    │  1 Crawler  │     │  1 Crawler  │
    └─────────────┘     └─────────────┘

Quick Reference

Elasticsearch: http://192.168.1.122:9200 Total Storage Indexed: ~18.5TB Total Documents: 3.4M+ files Active Crawlers: 9


Search API Examples

Basic File Search

curl -s "http://192.168.1.122:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {"match": {"file.filename": "document.pdf"}},
  "size": 20
}'

Search by Path

curl -s "http://192.168.1.122:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {"wildcard": {"path.real": "*Legal*"}}
}'

Search by Extension

curl -s "http://192.168.1.122:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {"wildcard": {"file.filename": "*.pdf"}}
}'

Find Large Files (>1GB)

curl -s "http://192.168.1.122:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {"range": {"file.filesize": {"gte": 1073741824}}},
  "sort": [{"file.filesize": {"order": "desc"}}]
}'

Recently Modified (Last 7 Days)

curl -s "http://192.168.1.122:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {"range": {"file.last_modified": {"gte": "now-7d"}}},
  "sort": [{"file.last_modified": {"order": "desc"}}]
}'

Find Duplicate Files by Size

curl -s "http://192.168.1.122:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "duplicate_sizes": {
      "terms": {"field": "file.filesize", "min_doc_count": 2, "size": 100}
    }
  }
}'

Search Across All Nodes

curl -s "http://192.168.1.122:9200/*-storage/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {"match": {"file.filename": "your-search-term"}}
}'

Crawler Inventory

Giratina (192.168.1.100) - Central Node

CrawlerPathIndexDocuments
raid6-storage/mnt/raid6raid6-storage~265 files

Talon (192.168.1.7) - Multi-Drive

CrawlerPathCapacityDocuments
talon-nvme-storage/mnt/nvme-storage931GB (88%)218K+
talon-pmc-data/mnt/pmc_data1.9TB (86%)576K+
talon-t9/mnt/t93.7TB (100%)2.3M+

Victini (192.168.1.115) - Large Storage

CrawlerPathCapacityDocuments
victini-storage/mnt/storage22TB (8.2TB used)253K+
victini-ext4-drive/mnt/storage/ext4_drive3.6TB (2.3TB)Growing
victini-new-volume/mnt/storage/new_volume3.7TB (2.4TB)Growing

Hoopa (192.168.1.79)

CrawlerPathCapacityDocuments
hoopa-storage/mnt/network_transfer393GB (90GB)750+

Silvally (192.168.1.52)

CrawlerPathCapacityDocuments
silvally-storage/mnt/raid6832GB3 folders

Index Management

List All Indices

curl "http://192.168.1.122:9200/_cat/indices?v"

Get Storage Indices

curl "http://192.168.1.122:9200/_cat/indices/*-storage*?v&h=index,docs.count,store.size&s=index"

Check Cluster Health

curl "http://192.168.1.122:9200/_cluster/health?pretty"

Document Count for Index

curl "http://192.168.1.122:9200/talon-t9/_count?pretty"

Service Management

Check Crawler Status

ssh root@192.168.1.X "systemctl status fscrawler*"

Restart a Crawler

ssh root@192.168.1.X "systemctl restart fscrawler-NAME"

View Logs

ssh root@192.168.1.X "journalctl -u fscrawler-NAME -f"

Document Schema

Each indexed file has this metadata:

{
  "file": {
    "filename": "example.pdf",
    "extension": "pdf",
    "filesize": 1048576,
    "indexing_date": "2025-12-05T08:00:00.000Z",
    "last_modified": "2025-12-01T10:30:00.000Z"
  },
  "path": {
    "real": "/mnt/storage/expansion/Legal/example.pdf",
    "root": "/mnt/storage",
    "virtual": "/expansion/Legal/example.pdf"
  },
  "meta": {
    "title": "Example Document",
    "author": "John Doe"
  }
}

Adding a New Crawler

1. Create Configuration

mkdir -p /root/.fscrawler/new-crawler-name
cat > /root/.fscrawler/new-crawler-name/_settings.yaml << 'EOF'
---
name: "new-crawler-name"
fs:
  url: "/path/to/storage"
  update_rate: "30m"
  indexed_chars: "0"
  add_filesize: true
  continue_on_error: true
  remove_deleted: true
  excludes:
    - "*/node_modules/*"
    - "*/.git/*"
    - "*/.cache/*"
    - "*.tmp"
    - "*.log"
  ocr:
    enabled: false
elasticsearch:
  nodes:
    - url: "http://192.168.1.122:9200"
  index: "node-name-storage"
  bulk_size: 500
EOF

2. Create Systemd Service

cat > /etc/systemd/system/fscrawler-new-name.service << 'EOF'
[Unit]
Description=FSCrawler - New Storage Indexer
After=network.target

[Service]
Type=simple
User=root
WorkingDirectory=/opt/fscrawler
ExecStart=/opt/fscrawler/bin/fscrawler new-crawler-name --loop 999
Restart=on-failure
RestartSec=30
Environment="FS_JAVA_OPTS=-Xms512m -Xmx1g"

[Install]
WantedBy=multi-user.target
EOF

3. Enable and Start

systemctl daemon-reload
systemctl enable --now fscrawler-new-name

Troubleshooting

Yellow Cluster Status

Normal for single-node. Fix with:

curl -X PUT "http://192.168.1.122:9200/_all/_settings" -H 'Content-Type: application/json' -d'
{"index": {"number_of_replicas": 0}}'

Reindex from Scratch

systemctl stop fscrawler-NAME
curl -X DELETE "http://192.168.1.122:9200/index-name"
rm -rf /root/.fscrawler/crawler-name/.fscrawler
systemctl start fscrawler-NAME

System Info

  • Elasticsearch Version: 7.17.29
  • FSCrawler Version: 2.9
  • Cluster Name: filesearch-cluster
  • JVM Heap: 8GB
  • Security: None (internal network only)

Last Updated: December 5, 2025

Stats
Parent Repo Stars0
Parent Repo Forks0
Last CommitDec 5, 2025