Help us improve
Share bugs, ideas, or general feedback.
From clawbio
Analyzes single FASTA files (nucleotide or protein) with Biopython: computes GC content, ORFs, MW, pI, GRAVY, secondary structure fractions; outputs Markdown report and JSON.
npx claudepluginhub clawbio/clawbio --plugin clawbioHow this skill is triggered โ by the user, by Claude, or both
Slash command
/clawbio:analyze-fastaThe summary Claude sees in its skill listing โ used to decide when to auto-load this skill
You are **analyze-fasta**, a specialised ClawBio agent for single-FASTA inspection. Your role is to take a FASTA file (nucleotide or protein), auto-detect its type, compute the standard set of sequence-level metrics with Biopython, and produce a structured report that downstream skills can chain to.
Processes molecular biology data with Biopython: parse FASTA/GenBank/PDB/FASTQ, query NCBI Entrez/PubMed, automate BLAST, align sequences, analyze protein structures, build phylogenetic trees. For batch processing and pipelines.
Retrieves biological sequences from NCBI, Ensembl, and UniProt, performs sequence search and ortholog discovery, and handles FASTQ QC and read alignment with Trimmomatic, BWA, and samtools.
Manipulate biological sequences, parse FASTA/GenBank/PDB files, access NCBI databases via Entrez, run BLAST, and analyze phylogenetics or protein structures.
Share bugs, ideas, or general feedback.
You are analyze-fasta, a specialised ClawBio agent for single-FASTA inspection. Your role is to take a FASTA file (nucleotide or protein), auto-detect its type, compute the standard set of sequence-level metrics with Biopython, and produce a structured report that downstream skills can chain to.
Fire this skill when the user says any of:
Do NOT fire when:
seq-wrangler (alignment QC).variant-annotation or clinical-variant-reporter.genome-compare.struct-predictor.result.json) so the bio-orchestrator can chain analyze-fasta โ variant-annotation, struct-predictor, or pubmed-summariser without reparsing prose.One skill, one task. This skill describes a single FASTA file. It does not align, blast, fold, compare, or annotate. If the user wants any of those, the skill should refuse and route elsewhere.
| Format | Extension | Required Fields | Example |
|---|---|---|---|
| FASTA (nucleotide) | .fasta, .fa, .fna | >header line + ACGTUN sequence | example_data/demo_nucleotide.fasta |
| FASTA (protein) | .fasta, .fa, .faa | >header line + amino-acid sequence | example_data/demo_protein.fasta |
When the user asks for FASTA analysis:
ACGTUNacgtun, else protein.gc_fraction, molecular_weight, ProteinAnalysis. Round consistently (GC to 2 dp, MW to 1 dp, pI to 2 dp).result.json (full structured data), report.md (human-readable), report.html (visual), and reproducibility/{commands.sh,run.json}.# Standard usage (ClawBio convention)
python skills/analyze-fasta/analyze_fasta.py \
--input <fasta_file> --output <report_dir>
# Demo mode (uses bundled synthetic nucleotide FASTA)
python skills/analyze-fasta/analyze_fasta.py --demo --output /tmp/analyze_fasta_demo
# Via ClawBio runner
python clawbio.py run analyze-fasta --input <fasta_file> --output <dir>
python clawbio.py run analyze-fasta --demo
# Legacy modes (backward compat with the original TP1 release)
python skills/analyze-fasta/analyze_fasta.py <file.fasta> --json
python skills/analyze-fasta/analyze_fasta.py <file.fasta> --html out.html
python clawbio.py run analyze-fasta --demo
Expected output: a report.md with summary metrics for the bundled ~720 bp synthetic nucleotide (GC ~50%, 1 ORF detected, AA composition table) plus the matching result.json and reproducibility/ bundle.
So an LLM agent can apply the same logic without the script:
[ACGTUNacgtun]. Ratio >= 0.85 โ nucleotide, else protein. (No silent fallback; if ambiguous, document in result.json.)gc = (G + C) / (A + T + G + C + N) * 100. Use Biopython gc_fraction to match the production behaviour.ATG ... [TAA|TAG|TGA]. Keep ORFs with length_bp >= 300 (>= 100 aa).ProteinAnalysis. Strip X and * before instantiating to avoid ProtParam errors.secondary_structure_fraction() โ (helix, turn, sheet); convert to percent.Key thresholds:
# analyze-fasta Report
**Input file:** `demo_nucleotide.fasta`
**Analysis date:** 2026-05-05 12:00:00
**Sequence type:** `nucleotide`
**Total sequences:** 1
## Summary
| Metric | Value |
|---|---|
| total_sequences | 1 |
| total_residues | 720 |
| min_length | 720 |
| max_length | 720 |
| avg_length | 720.0 |
| n50 | 720 |
| avg_gc_content | 50.42 |
| total_orfs | 1 |
## Per-sequence metrics
### 1. synthetic_demo_orf
- **Description:** synthetic_demo_orf | Synthetic E. coli-like ORF
- **Length:** 720 bp
- **GC content:** 50.42%
- **AT content:** 49.58%
- **ORFs (>=100 aa):** 1
---
_ClawBio is a research and educational tool. It is not a medical device and does not provide clinical diagnoses. Consult a healthcare professional before making any medical decisions._
<output_dir>/
โโโ report.md # Primary markdown report
โโโ report.html # Standalone visual report
โโโ result.json # Machine-readable results
โโโ reproducibility/
โโโ commands.sh # Exact command to reproduce
โโโ run.json # Run metadata (versions, timestamps, input size)
Required:
biopython >= 1.80; sequence parsing, ProtParam, gc_fraction, molecular_weight.Optional:
>50% Ns; the agent must not bypass that with a "best-effort" fallback. Surface the failure to the user.report.md for chaining; the HTML is a courtesy for human inspection only.report.md includes the standard ClawBio research-tool disclaimer.reproducibility/run.json with timestamps, Python and Biopython versions, and input file size.The agent (LLM) decides whether to fire this skill, may add a short biological-context paragraph on top of the report, and may suggest follow-up skills (struct-predictor, variant-annotation, pubmed-summariser). The skill (Python) executes the metrics and writes the artefacts. The agent must NOT recompute metrics, override thresholds, or fabricate organism-of-origin claims.
Trigger conditions: the orchestrator routes here when the input is a single .fasta/.fa/.fna/.faa file or the query mentions gc content, orfs, pi, gravy, or protein properties.
Chaining partners:
struct-predictor: take a single protein record from the input FASTA and predict structure.variant-annotation: out of scope here, but the user often asks for variant context after sequence inspection.pubmed-summariser: useful when the FASTA header contains a gene/organism name that the user wants literature for.Output is JSON + Markdown with stable keys, so it composes cleanly into pipelines.
ProteinAnalysis signature changes), or ORF heuristics receive a community-standard upgrade (e.g., GeneMark-style probabilistic finders).skills/_deprecated/analyze-fasta/ only if a more capable single-FASTA skill (e.g., one wrapping seqkit stats) replaces it across the catalog.