Skill

kegg-database

Queries KEGG REST API (academic use only) for biological pathways, genes, compounds, enzymes, diseases, drugs using info/list/find/get/conv/link/ddi operations. Supports NCBI/UniProt/PubChem ID conversion. Python requests examples.

Python

REST API

database

api-development

Popularity

Stars

200

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/sciagent-skills:kegg-database

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

KEGG (Kyoto Encyclopedia of Genes and Genomes) is a comprehensive bioinformatics resource for biological pathway analysis, molecular interaction networks, and cross-database ID conversion. Access is via a direct REST API with no authentication — all operations use simple HTTP GET requests returning tab-delimited text.

SKILL.md

539 lines · ~4.6k tokens

Stats

LanguagePython

Stars200

Forks21

MaintenanceExcellent

Last CommitJun 15, 2026

Actions

View Source View Plugin View on GitHub View README

KEGG Database — Biological Pathway & Molecular Network Queries

Overview

When to Use

Mapping genes to biological pathways (e.g., "which pathways involve TP53?")
Retrieving metabolic pathway details, gene lists, or compound structures
Converting identifiers between KEGG, NCBI Gene, UniProt, and PubChem
Checking drug-drug interactions from KEGG's pharmacological database
Building pathway enrichment context (all genes per pathway for an organism)
Cross-referencing compounds, reactions, enzymes, and pathways
For Python-native multi-database queries (KEGG + UniProt + Ensembl in one script), prefer bioservices instead
For pathway visualization, use KEGG Mapper (https://www.kegg.jp/kegg/mapper/) directly

Prerequisites

pip install requests

API constraints:

Academic use only — commercial use requires a separate KEGG license
Max 10 entries per get/list/conv/link/ddi call (image/kgml/json: 1 entry only)
No explicit rate limit, but add time.sleep(0.5) between batch requests to avoid server-side throttling
Base URL: https://rest.kegg.jp/

Quick Start

import requests
import time

BASE = "https://rest.kegg.jp"

def kegg_get(operation, *args):
    """Generic KEGG REST API caller."""
    url = f"{BASE}/{operation}/{'/'.join(args)}"
    resp = requests.get(url)
    resp.raise_for_status()
    return resp.text

# Find pathways linked to human gene TP53
pathways = kegg_get("link", "pathway", "hsa:7157")
print(pathways[:200])
# hsa:7157	path:hsa04010
# hsa:7157	path:hsa04110
# ...

# Get pathway details
detail = kegg_get("get", "hsa04110")
print(detail[:300])

Core API

1. Database Information — `kegg_info`

Retrieve metadata and statistics about KEGG databases.

import requests

BASE = "https://rest.kegg.jp"

# Database-level info
info = requests.get(f"{BASE}/info/pathway").text
print(info[:200])
# pathway          Pathway
#                  Release 112.0, Dec 2025
#                  Kanehisa Laboratories
#                  ...

# Organism-level info
hsa_info = requests.get(f"{BASE}/info/hsa").text
print(hsa_info[:200])

Common databases: kegg, pathway, module, brite, genes, genome, compound, glycan, reaction, enzyme, disease, drug

2. Listing Entries — `kegg_list`

List entry identifiers and names from any KEGG database.

import requests

BASE = "https://rest.kegg.jp"

# All human pathways
hsa_pathways = requests.get(f"{BASE}/list/pathway/hsa").text
for line in hsa_pathways.strip().split("\n")[:5]:
    pathway_id, name = line.split("\t")
    print(f"{pathway_id}: {name}")
# path:hsa00010: Glycolysis / Gluconeogenesis - Homo sapiens (human)
# ...

# Specific entries (max 10, joined with +)
genes = requests.get(f"{BASE}/list/hsa:10458+hsa:10459").text
print(genes)

Common organism codes: hsa (human), mmu (mouse), dme (fruit fly), sce (yeast), eco (E. coli)

3. Keyword Search — `kegg_find`

Search databases by keywords or molecular properties.

import requests
import time

BASE = "https://rest.kegg.jp"

# Keyword search in genes
results = requests.get(f"{BASE}/find/genes/p53").text
print(f"Found {len(results.strip().split(chr(10)))} entries")
time.sleep(0.5)

# Chemical formula search (exact match)
compounds = requests.get(f"{BASE}/find/compound/C7H10N4O2/formula").text
print(compounds[:200])
time.sleep(0.5)

# Molecular weight range search
drugs = requests.get(f"{BASE}/find/drug/300-310/exact_mass").text
print(drugs[:200])

Search options: append /formula (exact match), /exact_mass (range), /mol_weight (range) to compound/drug queries.

4. Entry Retrieval — `kegg_get`

Retrieve complete database entries or specific data formats.

import requests
import time

BASE = "https://rest.kegg.jp"

# Full pathway entry (text format)
pathway = requests.get(f"{BASE}/get/hsa00010").text
print(pathway[:500])
time.sleep(0.5)

# Multiple entries (max 10, joined with +)
genes = requests.get(f"{BASE}/get/hsa:10458+hsa:10459").text

# Protein sequence (FASTA)
fasta = requests.get(f"{BASE}/get/hsa:10458/aaseq").text
print(fasta[:200])
time.sleep(0.5)

# Compound structure (MOL format)
mol = requests.get(f"{BASE}/get/cpd:C00002/mol").text  # ATP

# Pathway image (PNG, single entry only)
img_resp = requests.get(f"{BASE}/get/hsa05130/image")
with open("pathway.png", "wb") as f:
    f.write(img_resp.content)
print(f"Saved pathway image: {len(img_resp.content)} bytes")

Output formats: aaseq (protein FASTA), ntseq (nucleotide FASTA), mol (MOL), kcf (KCF), image (PNG), kgml (XML), json (pathway JSON). Image/KGML/JSON accept one entry only.

5. ID Conversion — `kegg_conv`

Convert identifiers between KEGG and external databases.

import requests
import time

BASE = "https://rest.kegg.jp"

# KEGG gene → NCBI Gene ID (specific gene)
ncbi = requests.get(f"{BASE}/conv/ncbi-geneid/hsa:10458").text
print(ncbi.strip())
# hsa:10458	ncbi-geneid:10458
time.sleep(0.5)

# KEGG gene → UniProt
uniprot = requests.get(f"{BASE}/conv/uniprot/hsa:10458").text
print(uniprot.strip())
time.sleep(0.5)

# Bulk conversion: all human genes → NCBI Gene IDs
all_conv = requests.get(f"{BASE}/conv/ncbi-geneid/hsa").text
lines = all_conv.strip().split("\n")
print(f"Total conversions: {len(lines)}")

# Reverse: NCBI Gene ID → KEGG
reverse = requests.get(f"{BASE}/conv/hsa/ncbi-geneid:7157").text
print(reverse.strip())  # TP53

Supported external databases: ncbi-geneid, ncbi-proteinid, uniprot, pubchem, chebi

6. Cross-Referencing — `kegg_link`

Find related entries within and between KEGG databases.

import requests
import time

BASE = "https://rest.kegg.jp"

# Genes in glycolysis pathway
genes = requests.get(f"{BASE}/link/genes/hsa00010").text
gene_list = [line.split("\t")[1] for line in genes.strip().split("\n") if line]
print(f"Glycolysis genes: {len(gene_list)}")
time.sleep(0.5)

# Pathways containing a specific gene
pathways = requests.get(f"{BASE}/link/pathway/hsa:7157").text  # TP53
print(pathways[:300])
time.sleep(0.5)

# Compounds in a pathway
compounds = requests.get(f"{BASE}/link/compound/hsa00010").text
print(f"Compounds in glycolysis: {len(compounds.strip().split(chr(10)))}")

# Map genes to KO (orthology) groups
ko = requests.get(f"{BASE}/link/ko/hsa:10458").text
print(ko.strip())

Common links: genes ↔ pathway, pathway ↔ compound, pathway ↔ enzyme, genes ↔ ko (orthology)

7. Drug-Drug Interactions — `kegg_ddi`

Check pharmacological interactions between drugs.

import requests

BASE = "https://rest.kegg.jp"

# Single drug — all known interactions
interactions = requests.get(f"{BASE}/ddi/D00001").text
print(f"Interactions: {len(interactions.strip().split(chr(10)))}")

# Pairwise check (max 10 drugs, joined with +)
pair = requests.get(f"{BASE}/ddi/D00001+D00002+D00003").text
print(pair[:300])

Key Concepts

Identifier Formats

Type	Format	Example
Reference pathway	`map#####`	`map00010` (Glycolysis, generic)
Organism pathway	`{org}#####`	`hsa00010` (Glycolysis, human)
Gene	`{org}:{number}`	`hsa:7157` (TP53)
Compound	`cpd:C#####`	`cpd:C00002` (ATP)
Drug	`dr:D#####`	`dr:D00001`
Enzyme	`ec:{EC_number}`	`ec:1.1.1.1`
KO (orthology)	`ko:K#####`	`ko:K00001`

Pathway Categories

KEGG organizes pathways into seven major categories:

Metabolism — map001xx (Glycolysis, TCA cycle, amino acid metabolism)
Genetic Information Processing — map030xx (Ribosome, Spliceosome, DNA repair)
Environmental Information Processing — map040xx (MAPK signaling, ABC transporters)
Cellular Processes — map041xx (Autophagy, Apoptosis, Cell cycle)
Organismal Systems — map046xx (Immune, Endocrine, Nervous)
Human Diseases — map052xx (Cancer, Neurodegenerative, Infectious)
Drug Development — Chronological and target-based classifications

Common Workflows

Workflow: Gene to Pathway Mapping

Find all pathways associated with a gene of interest.

import requests
import time

BASE = "https://rest.kegg.jp"

# Step 1: Find gene by keyword
results = requests.get(f"{BASE}/find/genes/BRCA1+homo+sapiens").text
print("Gene search results:")
for line in results.strip().split("\n")[:5]:
    print(f"  {line}")
time.sleep(0.5)

# Step 2: Get pathways linked to BRCA1
pathways = requests.get(f"{BASE}/link/pathway/hsa:672").text
pathway_ids = [line.split("\t")[1].replace("path:", "") for line in pathways.strip().split("\n") if line]
print(f"\nBRCA1 is in {len(pathway_ids)} pathways:")
time.sleep(0.5)

# Step 3: Get pathway names
for pid in pathway_ids[:5]:
    info = requests.get(f"{BASE}/get/{pid}").text
    # Extract NAME field
    for line in info.split("\n"):
        if line.startswith("NAME"):
            print(f"  {pid}: {line.replace('NAME', '').strip()}")
            break
    time.sleep(0.5)

Workflow: Pathway Enrichment Context

Build a gene-set collection for all pathways of an organism.

import requests
import time

BASE = "https://rest.kegg.jp"

# Step 1: List all human pathways
pathways_text = requests.get(f"{BASE}/list/pathway/hsa").text
pathways = {}
for line in pathways_text.strip().split("\n"):
    pid, name = line.split("\t", 1)
    pathways[pid.replace("path:", "")] = name
print(f"Total human pathways: {len(pathways)}")
time.sleep(0.5)

# Step 2: Get genes for each pathway (sample first 3 for demo)
gene_sets = {}
for pid in list(pathways.keys())[:3]:
    genes_text = requests.get(f"{BASE}/link/genes/{pid}").text
    gene_ids = [line.split("\t")[1] for line in genes_text.strip().split("\n") if line]
    gene_sets[pid] = gene_ids
    print(f"  {pid}: {len(gene_ids)} genes")
    time.sleep(0.5)

# Step 3: Convert to NCBI Gene IDs for enrichment tools
# (use kegg_conv for bulk conversion)

Workflow: Compound-Pathway-Reaction Analysis

Trace a compound through metabolic reactions and pathways.

import requests
import time

BASE = "https://rest.kegg.jp"

# Step 1: Search for compound
results = requests.get(f"{BASE}/find/compound/glucose").text
print("Compound search:")
for line in results.strip().split("\n")[:3]:
    print(f"  {line}")
time.sleep(0.5)

# Step 2: Find reactions involving glucose (C00031)
reactions = requests.get(f"{BASE}/link/reaction/cpd:C00031").text
rxn_ids = [line.split("\t")[1] for line in reactions.strip().split("\n") if line]
print(f"\nReactions involving glucose: {len(rxn_ids)}")
time.sleep(0.5)

# Step 3: Find pathways for a specific reaction
pathways = requests.get(f"{BASE}/link/pathway/rn:R00299").text
print(f"\nPathways for R00299:")
print(pathways[:300])
time.sleep(0.5)

# Step 4: Get pathway detail
detail = requests.get(f"{BASE}/get/map00010").text
print(f"\nGlycolysis pathway detail (first 500 chars):")
print(detail[:500])

Workflow: Cross-Database ID Integration

Map KEGG identifiers to UniProt, NCBI, and PubChem for multi-database workflows.

import requests
import time

BASE = "https://rest.kegg.jp"

# Step 1: Convert gene to multiple external IDs
gene = "hsa:7157"  # TP53

uniprot = requests.get(f"{BASE}/conv/uniprot/{gene}").text.strip()
print(f"UniProt: {uniprot}")
time.sleep(0.5)

ncbi = requests.get(f"{BASE}/conv/ncbi-geneid/{gene}").text.strip()
print(f"NCBI Gene: {ncbi}")
time.sleep(0.5)

# Step 2: Get protein sequence from KEGG
fasta = requests.get(f"{BASE}/get/{gene}/aaseq").text
print(f"\nProtein sequence (first 200 chars):\n{fasta[:200]}")
time.sleep(0.5)

# Step 3: Convert compounds to PubChem CIDs
cpd_conv = requests.get(f"{BASE}/conv/pubchem/cpd:C00002").text.strip()  # ATP
print(f"\nATP PubChem: {cpd_conv}")

Key Parameters

Parameter	Function/Endpoint	Default	Options	Effect
`organism`	`list`, `link`, `conv`	None	3-4 letter code	Filter by organism (e.g., `hsa`, `mmu`)
`option`	`find`	None	`formula`, `exact_mass`, `mol_weight`	Search mode for compounds/drugs
`format`	`get`	text	`aaseq`, `ntseq`, `mol`, `kcf`, `image`, `kgml`, `json`	Output format
`+` separator	`get`, `list`, `ddi`	—	Max 10 entries	Batch query (join IDs with `+`)
`target_db`	`conv`	—	`ncbi-geneid`, `uniprot`, `pubchem`, `chebi`	External database for ID conversion
`target_db`	`link`	—	`pathway`, `genes`, `compound`, `ko`, `enzyme`	Related KEGG database

Best Practices

Add delays between batch requests: No explicit rate limit, but time.sleep(0.5) between requests prevents throttling and is courteous to the shared academic resource.
Anti-pattern — fetching all entries without filtering: Use kegg_list to enumerate IDs first, then kegg_get for specific entries. Avoid downloading entire databases when you need a subset.
Parse tab-delimited output consistently: All KEGG responses use \t as field separator and \n as record separator. Always .strip() before splitting.
Respect the 10-entry batch limit: kegg_get, kegg_list, kegg_conv, kegg_link, kegg_ddi accept max 10 entries (joined with +). Image/KGML/JSON formats accept only 1.
Use organism-specific pathway IDs: hsa00010 (human glycolysis) returns organism-specific gene mappings; map00010 (reference) returns generic entries. Always prefer organism-specific when analyzing a known organism.
Cache frequently-used conversions: Full organism ID conversions (kegg_conv('ncbi-geneid', 'hsa')) return large results. Cache locally rather than repeating.

Common Recipes

Recipe: Parse KEGG Flat-File Entry

def parse_kegg_entry(text):
    """Parse a KEGG flat-file entry into a dictionary."""
    entry = {}
    current_key = None
    for line in text.split("\n"):
        if line.startswith("///"):
            break
        if line[:12].strip():  # New field
            current_key = line[:12].strip()
            entry[current_key] = line[12:].strip()
        elif current_key:  # Continuation
            entry[current_key] += "\n" + line[12:].strip()
    return entry

import requests
pathway = requests.get("https://rest.kegg.jp/get/hsa00010").text
parsed = parse_kegg_entry(pathway)
print(f"Name: {parsed.get('NAME', 'N/A')}")
print(f"Description: {parsed.get('DESCRIPTION', 'N/A')[:200]}")

Recipe: Organism Comparison

import requests
import time

BASE = "https://rest.kegg.jp"

organisms = {"hsa": "Human", "mmu": "Mouse", "sce": "Yeast"}
pathway = "00010"  # Glycolysis

for org, name in organisms.items():
    genes = requests.get(f"{BASE}/link/genes/{org}{pathway}").text
    count = len([l for l in genes.strip().split("\n") if l])
    print(f"{name} ({org}): {count} genes in Glycolysis")
    time.sleep(0.5)
# Human (hsa): 68 genes in Glycolysis
# Mouse (mmu): 67 genes in Glycolysis
# Yeast (sce): 31 genes in Glycolysis

Recipe: Build Gene-to-Pathway Mapping Table

import requests
import time

BASE = "https://rest.kegg.jp"

# Get all human gene-pathway links
links = requests.get(f"{BASE}/link/pathway/hsa").text
gene_pathways = {}
for line in links.strip().split("\n"):
    if not line:
        continue
    gene, pathway = line.split("\t")
    gene_pathways.setdefault(gene, []).append(pathway.replace("path:", ""))

print(f"Genes with pathway annotations: {len(gene_pathways)}")
# Show top genes by pathway count
top = sorted(gene_pathways.items(), key=lambda x: -len(x[1]))[:5]
for gene, paths in top:
    print(f"  {gene}: {len(paths)} pathways")

Troubleshooting

Problem	Cause	Solution
`404 Not Found`	Entry or database doesn't exist	Verify ID format and organism code; use `kegg_list` to check valid IDs
`400 Bad Request`	Malformed API URL	Check URL path: `/{operation}/{arg1}/{arg2}`; no query params
Empty response	Search term too specific or no matches	Broaden keywords; try partial matches; check organism code
Image/KGML returns error	Batch query with image/kgml/json format	These formats accept one entry only — remove `+` joins
`403 Forbidden`	Server-side rate limiting	Add `time.sleep(1)` between requests; reduce batch frequency
Wrong gene IDs returned	Using reference pathway (`map`) instead of organism-specific	Use organism prefix: `hsa00010` not `map00010` for gene links
ID conversion returns empty	External DB doesn't cover that entry	Not all KEGG entries have UniProt/NCBI mappings; check with `kegg_list` first
Response encoding issues	Non-ASCII characters in compound names	Use `resp.encoding = 'utf-8'` or `resp.text` (requests auto-detects)

Related Skills

gget-genomic-databases — unified Python interface to Ensembl, NCBI, UniProt; use for gene-level queries when KEGG pathway context isn't needed
biopython-molecular-biology — BioPython's Bio.KEGG module provides an alternative Python API for KEGG parsing
pubchem-compound-search — for compound property lookups beyond KEGG's structural data; use kegg_conv('pubchem', ...) to bridge IDs

References

KEGG REST API documentation — official API specification
KEGG website — pathway browser, KEGG Mapper, BlastKOALA
KEGG organism codes — full list of 3-4 letter organism codes
Kanehisa, M. et al. (2023) "KEGG for taxonomy-based analysis of pathways and genomes" Nucleic Acids Research 51:D483-D489

kegg-database

Popularity

Invocation

Context Preview

SKILL.md

kegg-database

Popularity

Invocation

Context Preview

SKILL.md

KEGG Database — Biological Pathway & Molecular Network Queries

Overview

When to Use

Prerequisites

Quick Start

Core API

1. Database Information — kegg_info

2. Listing Entries — kegg_list

3. Keyword Search — kegg_find

4. Entry Retrieval — kegg_get

5. ID Conversion — kegg_conv

6. Cross-Referencing — kegg_link

7. Drug-Drug Interactions — kegg_ddi

Key Concepts

Identifier Formats

Pathway Categories

Common Workflows

Workflow: Gene to Pathway Mapping

Workflow: Pathway Enrichment Context

Workflow: Compound-Pathway-Reaction Analysis

Workflow: Cross-Database ID Integration

Key Parameters

Best Practices

Common Recipes

Recipe: Parse KEGG Flat-File Entry

Recipe: Organism Comparison

Recipe: Build Gene-to-Pathway Mapping Table

Troubleshooting

Related Skills

References

Similar Skills

KEGG Database — Biological Pathway & Molecular Network Queries

Overview

When to Use

Prerequisites

Quick Start

Core API

1. Database Information — kegg_info

2. Listing Entries — kegg_list

3. Keyword Search — kegg_find

4. Entry Retrieval — kegg_get

5. ID Conversion — kegg_conv

6. Cross-Referencing — kegg_link

7. Drug-Drug Interactions — kegg_ddi

Key Concepts

Identifier Formats

Pathway Categories

Common Workflows

Workflow: Gene to Pathway Mapping

Workflow: Pathway Enrichment Context

Workflow: Compound-Pathway-Reaction Analysis

Workflow: Cross-Database ID Integration

Key Parameters

Best Practices

Common Recipes

Recipe: Parse KEGG Flat-File Entry

Recipe: Organism Comparison

Recipe: Build Gene-to-Pathway Mapping Table

Troubleshooting

Related Skills

References

Similar Skills

1. Database Information — `kegg_info`

2. Listing Entries — `kegg_list`

3. Keyword Search — `kegg_find`

4. Entry Retrieval — `kegg_get`

5. ID Conversion — `kegg_conv`

6. Cross-Referencing — `kegg_link`

7. Drug-Drug Interactions — `kegg_ddi`

1. Database Information — `kegg_info`

2. Listing Entries — `kegg_list`

3. Keyword Search — `kegg_find`

4. Entry Retrieval — `kegg_get`

5. ID Conversion — `kegg_conv`

6. Cross-Referencing — `kegg_link`

7. Drug-Drug Interactions — `kegg_ddi`