Skill

mm-cli-skill

Uses mm CLI to index multimodal directories, explore contents, find files by type/size, extract PDF text/images/video keyframes, search across files, count tokens, view trees.

Python

Bash

cli-tools

npx claudepluginhub vlm-run/skills --plugin vlmrun-cli-skill

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/vlmrun-skills:mm-cli-skill

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

`mm` is a high-performance multimodal context management CLI. It indexes directories instantly (~60ms for 700 files), then exposes Unix-style commands for exploring, querying, and extracting content from images, videos, PDFs, code, and other files.

SKILL.md

480 lines · ~5k tokens(exceeds 5k compaction limit)

Similar Skills

look-at

Analyzes media files (PDFs, images, diagrams, screenshots) using a vision backend to extract structured data, descriptions, or summaries instead of literal file reading.

9 files

workflows

mmx-cli

11.9k

Generates text, images, video, speech, and music via MiniMax AI CLI. Chats with models, web searches, manages API resources from terminal.

minimax-skills

mmx-cli

40.4k

Generates text, images, video, speech, and music via the MiniMax AI platform's mmx CLI. Use for media creation, chat with MiniMax models, web search, or managing API resources.

antigravity-awesome-skills

Stats

Stars11

MaintenanceExcellent

Last CommitApr 16, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

mm CLI

mm is a high-performance multimodal context management CLI. It indexes directories instantly (~60ms for 700 files), then exposes Unix-style commands for exploring, querying, and extracting content from images, videos, PDFs, code, and other files.

Always use --format json for machine-readable output when parsing results programmatically.

Installation

# First run `mm --help` or `mm --version` to confirm mm isn't already installed
pip install mm-ctx

# Alternative: shell installer
# macOS / Linux
curl -LsSf https://vlm-run.github.io/mm/install/install.sh | sh

# Windows (PowerShell)
irm https://vlm-run.github.io/mm/install/install.ps1 | iex

Commands

Command	Purpose
`find`	Locate/list files by name/kind/ext/size, tabular listing, tree view, schema
`cat`	Content extraction (auto-detected by file type × mode)
`grep`	Content search — text and semantic (via embeddings)
`wc`	Count files, bytes, lines, tokens
`bench`	Benchmark suite with statistical analysis
`config`	Extraction mode settings (show, init, set, reset-db, reset-profiles, reset)
`profile`	Manage LLM provider profiles (list, add, update, use, remove)

Workflow

Start with mm find <dir> --tree --depth 1 to see the directory structure.
Use mm wc <dir> --by-kind to estimate token counts for LLM context budgeting.
Explore with find, grep, cat as needed.
Use mm cat <file> -m accurate for LLM-powered descriptions.

find — locate files, tabular listing, tree view, schema

mm find <dir> --kind image                           # all images
mm find <dir> --kind video                           # all videos
mm find <dir> --kind document                        # all PDFs/docs
mm find <dir> --kind audio                           # audio files
mm find <dir> --name "test_.*\.py"                   # filter by name (regex)
mm find <dir> -n config                              # filter by name (substring)
mm find <dir> --ext .png,.webp                       # by extension
mm find <dir> --min-size 1mb --max-size 10mb         # by size range
mm find <dir> --kind image --limit 5 --format json   # JSON output, capped
mm find <dir> --sort size --reverse --limit 10       # largest files

~63ms via Rust fast path. Piped output is one path per line. --format json returns full metadata.

# Tabular listing (default)
mm find <dir>                                         # all files
mm find <dir> --columns name,kind,size --limit 10     # select columns
mm find <dir> --sort size --reverse --format json     # sorted JSON

# Tree view
mm find <dir> --tree                                  # full tree with sizes
mm find <dir> --tree --depth 1                        # top-level dirs only
mm find <dir> --tree --kind image                     # only image files
mm find <dir> --tree --format json                    # JSON tree structure

# Schema inspection
mm find <dir> --schema                                # Rich table with column docs
mm find <dir> --schema --format json                  # machine-readable

# Include gitignored files
mm find <dir> --no-ignore                             # bypass .gitignore rules
mm find <dir> --no-ignore --kind video                # gitignored videos
mm find <dir> --no-ignore --tree                      # tree including ignored dirs

Columns in the files table:

Column	Type	Description
path	string	Relative path from scan root
name	string	File name with extension
stem	string	File name without extension
ext	string	Extension including dot (`.png`, `.pdf`)
size	uint64	File size in bytes
modified	timestamp	Last modification time
created	timestamp	Creation time
mime	string	MIME type (`image/png`, `application/pdf`)
kind	string	`image`, `video`, `document`, `code`, `audio`, `data`, `config`, `text`, `other`
is_binary	bool	Whether file is binary
depth	uint16	Directory depth (0 = top-level)
parent	string	Parent directory path
width	uint32	Pixel width (images from header, videos via native parsing). Null for non-media.
height	uint32	Pixel height (images from header, videos via native parsing). Null for non-media.

cat — content extraction (pipeline-driven)

Behaviour is auto-detected from file type. Default --mode fast runs local extraction (no LLM). Use -m accurate for LLM-powered descriptions.

# Fast mode (default) — local extraction, no LLM
mm cat <file>                                       # text/metadata extraction
mm cat photo.png                                    # image metadata (dims, MIME, hash, EXIF)
mm cat video.mp4                                    # video metadata (resolution, duration, codecs)
mm cat paper.pdf                                    # text extraction via pypdfium2

# Accurate mode — LLM-powered descriptions
mm cat photo.png -m accurate                        # VLM caption
mm cat video.mp4 -m accurate                        # mosaic → VLM description
mm cat audio.mp3 -m accurate                        # transcript → LLM summary
mm cat paper.pdf -m accurate                        # text → LLM summary

# Head / tail
mm cat <file> -n 20                                 # first 20 lines
mm cat <file> -n -10                                # last 10 lines

# Cache control
mm cat <file> --no-cache                            # bypass cache, force fresh run

# Output formats
mm cat <file> --format json                          # JSON output

Fast mode behavior by file type (<100ms target):

PDF (.pdf): text extraction via pypdfium2. Scanned/image-only PDFs return empty.
Document (.docx, .pptx): text extraction.
Image (.png/.jpg/.webp/.gif/.bmp/.tiff/.svg): dimensions, MIME, xxh3 hash, EXIF data.
Video (.mp4/.mkv/.webm/.avi/.mov): resolution, duration, FPS, codecs (metadata only, no ffmpeg).
Audio (.mp3/.wav/.flac/.aac/.ogg/.m4a): duration, codec, bitrate (metadata only).

cat -p — named encoders and pipeline YAMLs

The -p / --pipeline flag accepts either a registered encoder name or a YAML file path.

# Named encoder (encodes media into VLM-ready JSON messages)
mm cat photo.png -p image-resize              # Fit to 1024px, base64 encode
mm cat photo.png -p image-tile                # Resized overview + all tiles in one Message
mm cat video.mp4 -p video-frame-sample        # Extract frames at 1fps
mm cat video.mp4 -p video-chunk               # Chunk into 60s segments
mm cat doc.pdf  -p document-rasterize         # Render pages as images
mm cat doc.pdf  -p document-rasterize-text    # Rasterize + extract text

# YAML pipeline file
mm cat photo.png -p custom-pipeline.yaml

# Multiple pipelines (dispatched by kind field in YAML)
mm cat *.jpg *.mp4 -p image.yaml -p video.yaml

# List available encoders and pipelines
mm cat --list-pipelines

Built-in encoders

Use either the bare name or the kind-prefixed display name.

Name	Media	Description
`image-resize`	image	Default. Fit to 1024px bounding box
`image-tile`	image	Resized overview + tile crops in one Message
`video-frame-sample`	video	Extract frames at fps (requires ffmpeg)
`video-frames-transcript`	video	Frames + Whisper transcript (accurate mode default)
`video-chunk`	video	Chunk into time-based segments with overlap
`video-mosaic`	video	Build mosaic grids from sampled frames
`video-shot-frames`	video	Scene detection → representative frames per shot
`video-shot-mosaic`	video	Scene detection → mosaic grid per shot
`video-gemini`	video	Pass video file as a Gemini Part
`video-gemini-chunked`	video	Chunk video into Gemini Parts
`audio-transcribe`	audio	Transcribe audio via Whisper (fast/accurate default)
`audio-gemini`	audio	Pass audio file as a Gemini Part
`document-page-text`	document	Extract text per page from PDF/DOCX/PPTX
`document-rasterize`	document	Render PDF pages as images (requires pypdfium2)
`document-rasterize-text`	document	Rasterize + extract text, interleaved
`document-gemini`	document	Pass document file as a Gemini Part

Writing custom encoders

Create a .py file in python/mm/encoders/image/, python/mm/encoders/video/ (auto-discovered) or ~/.config/mm/encoders/. The name is optional — it defaults to the function name with underscores replaced by hyphens:

from pathlib import Path
from mm.encoders import register_encoder

@register_encoder(media_types=("image",))
def my_custom(path: Path, **kw):
    """Registered as 'my-custom' (auto-named from function)."""
    import base64, io
    from PIL import Image
    img = Image.open(path)
    img.thumbnail((1024, 1024))
    buf = io.BytesIO()
    img.save(buf, "JPEG", quality=90)
    b64 = base64.b64encode(buf.getvalue()).decode()
    yield {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64}"}}
    ]}

Python API

from mm import process_image, process_image_tiled, process_video, process_document
from pathlib import Path

msg = process_image(Path("photo.png"), max_width=1024)       # Single Message dict
tiles = list(process_image_tiled(Path("scan.png"), tile_size=1024))  # Multiple Messages
chunks = list(process_video(Path("video.mp4")))               # Multiple Messages
pages = list(process_document(Path("doc.pdf")))               # Multiple Messages

# Via Context
from mm import Context
ctx = Context("~/data")
messages = ctx.encode("photo.png", strategy="resize")

cat — pipeline overrides

Pipelines are 2-stage YAMLs: encode (convert to LLM-ready parts) → generate (LLM call). Key parameters can be overridden from the CLI.

Encode overrides (--encode.*)

mm cat photo.png -m accurate --encode.strategy image-tile      # override encoder
mm cat photo.png -m accurate --encode.pyfunc ~/my_filter.py    # custom transform

Generate overrides (--generate.*)

mm cat photo.png -m accurate --generate.max-tokens 1024          # increase token limit
mm cat photo.png -m accurate --generate.temperature 0.5           # higher temperature
mm cat photo.png -m accurate --generate.prompt "List 3 main objects in this image."
mm cat photo.png -m accurate --generate.json-mode true            # request JSON response

Combining overrides

mm cat photo.png -m accurate \
  --encode.strategy image-tile \
  --generate.max-tokens 512 \
  --generate.prompt "Analyze this architecture diagram."

cat -p — explicit pipeline YAML

Load custom pipeline configurations from YAML files. The YAML's kind field dispatches the pipeline to the correct media type. The generate field is optional — omit it for encode-only pipelines.

Single pipeline

# custom-image.yaml
kind: image
mode: accurate
encode:
  strategy: resize
  max_width: 512
generate:
  prompt: "What is in this image? One sentence only."
  max_tokens: 64

mm cat photo.png -p custom-image.yaml

Encode-only pipeline (no LLM)

# encode-only.yaml
kind: document
mode: fast
encode:
  strategy: null
# generate omitted = encode-only, no LLM call

Multi-document YAML

kind: image
mode: accurate
encode:
  strategy: image-tile
  max_width: 2048
generate:
  prompt: "Describe this image in detail."
  max_tokens: 512
---
kind: video
mode: accurate
encode:
  mosaic_tile: "8x6"
  mosaic_count: 2
  frame_selection: scene
generate:
  prompt: "Summarize this video."
  max_tokens: 1024

mm cat *.jpg *.mp4 -p multi-pipeline.yaml

CLI overrides layer on top of -p

mm cat photo.png -p my-pipeline.yaml --generate.max-tokens 128

TOML pipeline path overrides

Override default pipeline paths in ~/.config/mm/mm.toml:

[pipelines]
image.fast = "~/.config/mm/pipelines/image/fast.yaml"
video.accurate = "/path/to/my-video-accurate.yaml"

cat — custom Python transforms (pyfunc)

The --encode.pyfunc flag runs a custom Python transform on the encoded content parts before they are sent to the LLM.

File-based pyfunc

# my_transform.py:
# def transform(parts, context):
#     extra = {"type": "text", "text": "Focus on the data flow."}
#     return parts + [extra]

mm cat photo.png -m accurate --encode.pyfunc ~/my_transform.py

Pyfunc in pipeline YAML

kind: image
mode: accurate
encode:
  strategy: resize
  max_width: 512
  pyfunc: ~/my_transforms/filter.py
generate:
  prompt: "Analyze this image."
  max_tokens: 128

Inline def syntax in YAML:

encode:
  pyfunc: |
    def transform(parts, context):
        return [p for p in parts if p.get("type") == "image_url"]

wc — count files, bytes, lines, tokens

mm wc <dir>                      # summary totals
mm wc <dir> --by-kind            # breakdown by file kind
mm wc <dir> --kind document      # only documents
mm wc <dir> --format json        # machine-readable

Estimates LLM tokens (~chars/4 for text, tile-based for images). ~65ms.

grep — content search (text + semantic)

# Text search (regex matching)
mm grep "pattern" <dir>                                # search all files
mm grep "attention" <dir> --kind document              # search only documents
mm grep "TODO" <dir> --kind code                       # search code files
mm grep "invoice" <dir> --kind document --format json  # JSON output
mm grep "error" <dir> -C 2                             # 2 context lines
mm grep "invoice" <dir> --count                        # match counts per file
mm grep "Quantum Phase" <dir> -i                       # case-insensitive search
mm grep "TODO" <dir> --ignore-case --kind code         # case-insensitive in code
mm grep "secret" <dir> --no-ignore                     # search gitignored files too

# Semantic search (vector similarity via embeddings)
mm grep "financial projections" <dir> -s               # semantic search across all files
mm grep "architecture overview" <dir> -s --format json # JSON with distances
mm grep "revenue forecast" <dir> -s --index            # auto-index unindexed files before search

Warning: grep runs extraction on every matching file. On large document directories (500+ PDFs), this can take minutes. Prefer --kind code or --kind text for fast text searches.

bench — benchmark suite

mm bench <dir>                          # full benchmark
mm bench <dir> --rounds 5               # more measurement rounds
mm bench <dir> --mode accurate          # include accurate-mode benchmarks
mm bench <dir> --format json            # JSON output for archival

config — extraction mode settings

mm config show                                  # show current config
mm config init                                  # create config with default profile
mm config init --force                          # overwrite existing config
mm config set mode.fast.whisper_model tiny       # set a config value
mm config set mode.accurate.beam_size 5          # set a config value
mm config reset-db                              # delete all databases and caches
mm config reset-profiles                        # restore profiles to defaults
mm config reset                                 # reset everything (db + profiles)

profile — LLM provider management

Provider settings are managed through profiles stored in ~/.config/mm/mm.toml.

mm profile list                                            # list all profiles (● = active)
mm profile add openrouter --base-url https://openrouter.ai/api/v1 --model vlm-1
mm profile update openrouter --model gemma4:e2b
mm profile use openrouter                                  # switch active profile
mm profile remove openrouter

Per-command profile selection:

mm --profile openrouter cat photo.png -m accurate    # one-off override
MM_PROFILE=openrouter mm cat photo.png -m accurate   # env override

Output formats

TTY: Rich formatted tables/panels (human-friendly).
Piped / non-TTY: plain TSV/text or one-path-per-line (machine-readable, no ANSI codes).
--format json: JSON output. Always use this when parsing results programmatically.
--format tsv: Tab-separated values. Maximum token efficiency.
--format csv: Comma-separated values.
--format dataset-jsonl: JSONL for dataset export.
--format dataset-hf: HuggingFace Datasets format.

Pipe composability

mm find <dir> --kind image | mm cat               # find images, extract metadata
mm find <dir> --kind document --min-size 10mb | wc -l  # count large PDFs
mm find <dir> --kind video --format json | jq '.[].name'  # extract video names

Tips

All metadata commands (find, wc) run in ~60ms via the Rust fast path.
Start with find --tree --depth 1 then wc --by-kind for the fastest directory overview.
Use --format json when you need to parse output programmatically.
find returns paths only when piped, else it returns full metadata rows.
For PDFs, cat extracts text in fast mode; if empty, the PDF contains scanned images only.
For videos, mm cat video.mp4 -m accurate auto-generates keyframe mosaics and sends to LLM.
Use --mode fast for quick metadata/text extraction (default), --mode accurate for LLM descriptions.
Use --no-cache with -m accurate to force a fresh LLM call.
Use -p to load custom pipeline YAMLs or named encoders; CLI overrides layer on top.
Use --encode.pyfunc to inject custom Python transforms.
Use --list-pipelines to see all available encoders and built-in pipelines.

mm-cli-skill

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

mm-cli-skill

Popularity

Invocation

Context Preview

SKILL.md

mm CLI

Installation

Commands

Workflow

find — locate files, tabular listing, tree view, schema

cat — content extraction (pipeline-driven)

cat -p — named encoders and pipeline YAMLs

Built-in encoders

Writing custom encoders

Python API

cat — pipeline overrides

Encode overrides (--encode.*)

Generate overrides (--generate.*)

Combining overrides

cat -p — explicit pipeline YAML

Single pipeline

Encode-only pipeline (no LLM)

Multi-document YAML

CLI overrides layer on top of -p

TOML pipeline path overrides

cat — custom Python transforms (pyfunc)

File-based pyfunc

Pyfunc in pipeline YAML

wc — count files, bytes, lines, tokens

grep — content search (text + semantic)

bench — benchmark suite

config — extraction mode settings

profile — LLM provider management

Output formats

Pipe composability

Tips

Similar Skills

Help us improve

mm CLI

Installation

Commands

Workflow

find — locate files, tabular listing, tree view, schema

cat — content extraction (pipeline-driven)

cat -p — named encoders and pipeline YAMLs

Built-in encoders

Writing custom encoders

Python API

cat — pipeline overrides

Encode overrides (--encode.*)

Generate overrides (--generate.*)

Combining overrides

cat -p — explicit pipeline YAML

Single pipeline

Encode-only pipeline (no LLM)

Multi-document YAML

CLI overrides layer on top of -p

TOML pipeline path overrides

cat — custom Python transforms (pyfunc)

File-based pyfunc

Pyfunc in pipeline YAML

wc — count files, bytes, lines, tokens

grep — content search (text + semantic)

bench — benchmark suite

config — extraction mode settings

profile — LLM provider management

Output formats

Pipe composability

Tips