Help us improve
Share bugs, ideas, or general feedback.
From tooluniverse
Retrieves DNA/RNA/protein sequences from NCBI and ENA with disambiguation. Handles RefSeq, GenBank, and EMBL accessions with quality hierarchy.
npx claudepluginhub mims-harvard/tooluniverse --plugin tooluniverseHow this skill is triggered — by the user, by Claude, or both
Slash command
/tooluniverse:tooluniverse-sequence-retrievalThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Retrieve DNA, RNA, and protein sequences with proper disambiguation and cross-database handling.
Retrieves biological sequences from NCBI, Ensembl, and UniProt, performs sequence search and ortholog discovery, and handles FASTQ QC and read alignment with Trimmomatic, BWA, and samtools.
Queries the European Nucleotide Archive for sequences, reads, assemblies, and annotations via REST APIs. Searches studies/samples, retrieves FASTA/EMBL, lists FASTQ/BAM file URLs, and resolves taxonomy or cross-references.
Retrieves DNA/RNA sequences, raw reads (FASTQ), genome assemblies, and metadata from the European Nucleotide Archive via REST APIs and FTP for genomics and bioinformatics pipelines.
Share bugs, ideas, or general feedback.
Retrieve DNA, RNA, and protein sequences with proper disambiguation and cross-database handling.
IMPORTANT: Always use English terms in tool calls. Only try original-language terms as fallback. Respond in the user's language.
LOOK UP DON'T GUESS: Never assume accession numbers or sequence versions. Always retrieve and verify from NCBI or ENA.
Sequence quality hierarchy: RefSeq (NM_/NP_ = curated) > RefSeq predicted (XM_/XP_) > GenBank (submitted). Prefer the MANE Select transcript for human canonical isoforms. Check version numbers -- annotations improve across versions.
Phase 0: Clarify (if needed) → Phase 1: Disambiguate Gene/Organism → Phase 2: Search & Retrieve → Phase 3: Report
Ask ONLY if: gene exists in multiple organisms, sequence type unclear, or strain matters. Skip for: specific accessions, clear organism+gene combos, complete genome requests with organism.
| Prefix | Type | Use With |
|---|---|---|
| NC_/NM_/NR_/NP_/XM_ | RefSeq | NCBI only |
| U*/M*/K*/X*/CP*/NZ_ | GenBank | NCBI or ENA |
| EMBL format | EMBL | ENA preferred |
CRITICAL: Never try ENA tools with RefSeq accessions -- they return 404.
Retrieve silently. Do NOT narrate the search process.
# Search NCBI Nucleotide
result = tu.tools.NCBI_search_nucleotide(
operation="search", organism=organism, gene=gene,
strain=strain, keywords=keywords, seq_type=seq_type, limit=10
)
# Get accessions from UIDs
accessions = tu.tools.NCBI_fetch_accessions(operation="fetch_accession", uids=result["data"]["uids"])
# Retrieve sequence (FASTA or GenBank format)
sequence = tu.tools.NCBI_get_sequence(operation="fetch_sequence", accession=accession, format="fasta")
# ENA alternative (non-RefSeq accessions only)
entry = tu.tools.ena_get_entry(accession=accession)
fasta = tu.tools.ena_get_sequence_fasta(accession=accession)
| Primary | Fallback | Notes |
|---|---|---|
| NCBI_get_sequence | ENA (if GenBank format) | NCBI unavailable |
| ENA_get_entry | NCBI_get_sequence | ENA doesn't have RefSeq |
| NCBI_search_nucleotide | Try broader keywords | No results |
Present as a Sequence Profile Report. Hide search process. Include:
| Tier | Prefix | Description |
|---|---|---|
| RefSeq Reference (best) | NC_, NM_, NP_ | NCBI-curated, gold standard |
| RefSeq Predicted | XM_, XP_, XR_ | Computationally predicted |
| GenBank Validated | Various | Submitted, some curation |
| GenBank Direct | Various | Direct submission |
| Third Party | TPA_ | Third-party annotation |
Sequence quality: Prefer RefSeq over GenBank. Check version numbers. Sequences with "PREDICTED" in definition are not experimentally validated.
Accession guidance: RefSeq = NCBI-only. GenBank = mirrored in ENA/EMBL. Default to RefSeq mRNA (NM_) for human/model organisms; most complete genome assembly for microbial queries.
Cross-database reconciliation: Same sequence may have different accessions (e.g., GenBank U00096 = RefSeq NC_000913 for E. coli K-12). Always report both when available. Discrepancies between GenBank/RefSeq typically indicate RefSeq curation corrected submission errors.
| Error | Response |
|---|---|
| "No search criteria provided" | Add organism, gene, or keywords |
| "ENA 404 error" | Likely RefSeq -- use NCBI only |
| "No results found" | Broaden search, check spelling, try synonyms |
| "Sequence too large" | Note size, provide download link instead |
NCBI Tools: NCBI_search_nucleotide (search), NCBI_fetch_accessions (UID→accession), NCBI_get_sequence (retrieve)
ENA Tools (GenBank/EMBL only): ena_get_entry (metadata), ena_get_sequence_fasta (FASTA), ena_get_entry_summary (summary)
NCBI_search_nucleotide: operation="search", organism (scientific name), gene (symbol), strain, keywords, seq_type (complete_genome/mrna/refseq), limit
NCBI_get_sequence: operation="fetch_sequence", accession, format (fasta/genbank)