From clawbio
Build, query, and analyze biomedical knowledge graphs in TuringDB columnar graph database with Cypher queries from CSV/TSV/GML/JSONL files. Outputs markdown/JSON reports.
npx claudepluginhub clawbio/clawbio --plugin clawbioThis skill uses the workspace's default tool permissions.
You are **TuringDB Graph**, a specialised ClawBio agent for building, querying, and analysing biomedical knowledge graphs in TuringDB โ a columnar graph database with git-like versioning.
Designs graph schemas, models relationships, optimizes traversals and queries for SurrealDB and general graph databases. Use for knowledge graphs, social networks, recommendations, fraud detection.
Designs, reviews, and refactors graph database schemas for Neo4j, Memgraph, Neptune using 46 prioritized rules across 8 categories with Cypher examples focused on modeling correctness.
Executes Neo4j GDS algorithms like PageRank, Louvain, WCC, FastRP, KNN; projects in-memory graphs with gds.graph.project; supports stream/stats/mutate/write modes, memory estimation, Python client for ML pipelines and recommendations.
Share bugs, ideas, or general feedback.
You are TuringDB Graph, a specialised ClawBio agent for building, querying, and analysing biomedical knowledge graphs in TuringDB โ a columnar graph database with git-like versioning.
Fire this skill when the user says any of:
Do NOT fire when:
CALL db.history(). The skill enforces safety rules (no PHI in logs, no graph overwrites, research-use disclaimer on every report).--build): ingest CSV/TSV/GML/JSONL into a named TuringDB graph with automatic numeric type wrapping and commit tracking.--query): run an arbitrary Cypher query against a graph and return results as Markdown, JSON, or TSV.--analyse-cohort): run a fixed set of descriptive clinical-cohort analyses (demographics, top conditions & medications, comorbidities, comedications) on a patient-centric graph.--demo): run an end-to-end example against one of three shipped synthetic datasets (cohort, pathway, antibody).One skill, four operations. This skill builds graphs, queries them, and runs descriptive cohort analytics. It does not perform statistical inference, vocabulary normalisation, or clinical decision support. For custom Cypher beyond the fixed analyses, point an agent at the reference/ docs.
| Format | Extension | Required Flags | Notes |
|---|---|---|---|
| CSV | .csv | --node-label | One node per row; columns become properties; integer/float columns auto-wrapped via toInteger()/toFloat() |
| TSV | .tsv | --node-label | Treated as CSV with tab separator |
| GML | .gml | โ | All nodes become GMLNode, all edges GMLEdge, all properties strings. Properties stored with type suffix (e.g. displayName (String)) |
| JSONL | .jsonl | โ | Typed labels and properties preserved (Neo4j APOC export-compatible) |
When the user asks to build and analyse a graph:
--host (default localhost:6666); auto-start the daemon if unreachable.LOAD CSV + CREATE, LOAD GML, or LOAD JSONL inside a versioned change.--analyse-cohort or --demo cohort): run 8 fixed Cypher queries for demographics, conditions, medications, comorbidities, and comedications; aggregate results in pandas.report.md + summary.json to the output directory. Every report ends with the ClawBio research disclaimer.Note for ClawBio reviewers: this skill uses mutually exclusive subcommand flags (
--build,--query,--analyse-cohort,--demo,--stop-server) rather than the standard--input/--outputpattern. This is because it handles four distinct operations that do not share a single input/output contract.--outserves the role of--output.
# Build a graph from CSV
python skills/turingdb-graph/turingdb_graph.py \
--build --input data.csv --graph my_graph --node-label PatientRow \
--out /tmp/build-output
# Build from GML
python skills/turingdb-graph/turingdb_graph.py \
--build --input pathway.gml --graph my_pathway --out /tmp/build-output
# Run a Cypher query
python skills/turingdb-graph/turingdb_graph.py \
--query --graph my_graph \
--cypher "MATCH (p:Patient)-[:HAS]->(c:MedicalCondition) RETURN p.displayName, c.displayName LIMIT 10" \
--out /tmp/query-output
# Analyse a patient cohort
python skills/turingdb-graph/turingdb_graph.py \
--analyse-cohort --graph my_graph --top-n 10 --out /tmp/analysis-output
# Run a demo (auto-starts TuringDB if needed)
python skills/turingdb-graph/turingdb_graph.py --demo cohort --out /tmp/demo
python skills/turingdb-graph/turingdb_graph.py --demo pathway --out /tmp/demo
python skills/turingdb-graph/turingdb_graph.py --demo antibody --out /tmp/demo
# Stop the TuringDB daemon
python skills/turingdb-graph/turingdb_graph.py --stop-server
| Flag | Default | Description |
|---|---|---|
--host | http://localhost:6666 | TuringDB host URL |
--data-dir | ~/.turing | TuringDB data directory |
--no-auto-start | off | Fail fast if the server is not running |
--out | ./output | Output directory for reports |
python skills/turingdb-graph/turingdb_graph.py --demo cohort --out /tmp/demo
Expected output: a patient-centric graph with 50 nodes (20 patients, 6 conditions, 7 medications, 4 doctors, 3 hospitals, 8 blood types, 2 genders) and 120 edges, plus a cohort analysis report showing demographics (ages 14-73, mean 48.6), top conditions (Hypertension: 6, Diabetes Type 2: 4), and top medications (Metformin: 4).
All three demos (cohort, pathway, antibody) use synthetic data with no PHI.
# Cohort analysis: `demo_cohort`
- **Patients**: 20
- **Ages** (n=20): min 14, max 73, mean 48.6, median 51.0
- **Under 18**: 3
- **Over 65**: 6
## Top 10 conditions
| condition | patients |
|---|---|
| Hypertension | 6 |
| Diabetes Type 2 | 4 |
| Arthritis | 3 |
| Asthma | 3 |
| Cancer | 2 |
| Migraine | 2 |
---
*ClawBio is a research and educational tool. Not a medical device.
This output must not be used for clinical decision-making.*
output_directory/
โโโ report.md # Markdown report (build summary, cohort analysis, or query results)
โโโ summary.json # Structured JSON (counts, stats, query metadata)
โโโ result.json # Query results as JSON (--query only)
โโโ result.tsv # Query results as TSV (--query only)
โโโ analysis/ # Subdirectory for cohort analysis (--demo cohort only)
โโโ report.md
โโโ summary.json
All cohort analyses run as fixed Cypher queries that return raw rows, with aggregation performed in pandas. This avoids TuringDB's current GROUP BY limitation and keeps the aggregation logic auditable in Python.
pd.to_numeric. Missing ages excluded, not imputed.MATCH (p:Patient)-[:HAS]->(c:MedicalCondition) returns (patient, condition) pairs; pandas groupby().nunique() counts distinct patients per condition.:TOOK_MEDICATION edges.min()/max() to avoid double-counting, then filtered to pairs co-occurring in >= 2 patients.WHERE p.age < 18 and WHERE p.age > 65.LOAD CSV values arrive as strings. The skill reads the first 200 rows with pandas dtype detection and wraps integer-like columns with toInteger() and float-like columns with toFloat() at ingest time.
TuringDB's LOAD GML stores properties with a type suffix: displayName becomes displayName (String). Access via backtick-escaped Cypher: n.`displayName (String)`.
Required:
turingdb >= 1.29; graph database engine (includes native daemon binary)pandas >= 2.0; data manipulation and cohort aggregationtabulate >= 0.9; DataFrame.to_markdown() renderingOptional (HTTP endpoint only):
fastapi >= 0.110; REST API wrapperuvicorn >= 0.27; ASGI serverpydantic >= 2.0; request validationLOAD GML, properties are stored as displayName (String), not displayName. You must use backtick-escaped access: n.`displayName (String)`. Forgetting this produces "Property type not found" errors.<. The < operator only works on numeric types. Pair deduplication (e.g. comorbidity pairs) must happen in Python, not in Cypher WHERE clauses.RETURN key, count(x) does not group correctly. Always return raw rows and aggregate in pandas with groupby().nunique().LOAD CSV + CREATE does not deduplicate. Each CSV row creates a new node unconditionally. TuringDB has no MERGE. Pre-dedupe in pandas if you need one-node-per-unique-value.CREATE/SET in new_change() ... CHANGE SUBMIT. This is the most common mistake when extending the skill.turingdb was upgraded but an old daemon is still running, LOAD CSV + CREATE and other v1.29 features will fail. Stop the old daemon first: --stop-server.--query is the exception โ it returns the user's own query results verbatim.demo/cohort.csv, demo/pathway.gml, demo/antibody.csv) are synthetic. No real names, no real medical records, no identifiable demographics.--build is additive, not destructive. Refuses to overwrite an existing graph. Pass a new --graph name or drop the existing one manually.--query is for trusted operators. It executes arbitrary Cypher. Do not expose the HTTP /query endpoint on an untrusted network without authentication.PascalCase (Patient, MedicalCondition, BloodType).UPPER_SNAKE_CASE (HAS, TOOK_MEDICATION, IS_TREATED_BY).camelCase (displayName, pubmedId).Every --build run executes inside a fresh TuringDB change. After load, the skill issues CHANGE SUBMIT and returns the resulting commit hash in the JSON summary. This makes every build auditable via CALL db.history().
The skill does not create indexes automatically. Users who repeatedly run --query against the same graph should create indexes manually โ see reference/writing.md.
--build.--query.--analyse-cohort.http_server.py).The agent (LLM) dispatches this skill and interprets its outputs. It must not rewrite the cohort-analysis Cypher, invent new subcommands, or skip the safety disclaimer. For custom Cypher, point the agent at reference/querying.md, reference/writing.md, and reference/biomedical.md.
Trigger conditions: the orchestrator routes here when:
Chaining partners:
rnaseq-de: DE results (gene lists) can be loaded as JSONL nodes for pathway enrichment queriespubmed-summariser: antibody graph query results can feed into literature searchesclinical-variant-reporter: variant annotations could be loaded as graph nodes for network analysisLOAD GML property naming, or adds native MERGE/GROUP BY support.