From sciagent-skills
Queries ClinicalTrials.gov API v2 for clinical trial data by condition, intervention, location, sponsor, phase, or NCT ID. Filters by status, paginates results, exports CSV for research, patient matching, and portfolio analysis.
npx claudepluginhub jaechang-hits/sciagent-skills --plugin sciagent-skillsThis skill uses the workspace's default tool permissions.
Query the ClinicalTrials.gov API v2 (public, no authentication) to search and retrieve clinical trial data worldwide. Supports searching by condition, intervention, location, sponsor, and status; retrieving detailed study information by NCT ID; paginating large result sets; and exporting to CSV.
Queries ClinicalTrials.gov API v2 to search trials by condition, drug, location, status, or phase. Retrieves details by NCT ID and exports data for clinical research or patient matching.
Queries ClinicalTrials.gov API v2 to search trials by condition, drug, location, status, phase. Retrieves details by NCT ID and exports data for clinical research and patient matching.
Finds clinical trials for genes, variants (rsIDs via GWAS), or conditions from ClinicalTrials.gov + EUCTR. Outputs FHIR R4 bundles, Markdown/HTML reports, charts, CSV/JSON.
Share bugs, ideas, or general feedback.
Query the ClinicalTrials.gov API v2 (public, no authentication) to search and retrieve clinical trial data worldwide. Supports searching by condition, intervention, location, sponsor, and status; retrieving detailed study information by NCT ID; paginating large result sets; and exporting to CSV.
uv pip install requests pandas
API details:
https://clinicaltrials.gov/api/v2import requests
import time
CT_API = "https://clinicaltrials.gov/api/v2"
def ct_search(params):
"""Reusable helper for ClinicalTrials.gov searches."""
response = requests.get(f"{CT_API}/studies", params=params, timeout=30)
response.raise_for_status()
return response.json()
# Search for recruiting breast cancer trials
results = ct_search({
"query.cond": "breast cancer",
"filter.overallStatus": "RECRUITING",
"pageSize": 10,
"sort": "LastUpdatePostDate:desc"
})
print(f"Found {results['totalCount']} trials")
for study in results['studies'][:3]:
nct = study['protocolSection']['identificationModule']['nctId']
title = study['protocolSection']['identificationModule']['briefTitle']
print(f" {nct}: {title}")
ClinicalTrials.gov returns deeply nested JSON. Key navigation paths:
| Data | Path |
|---|---|
| NCT ID | study['protocolSection']['identificationModule']['nctId'] |
| Title | study['protocolSection']['identificationModule']['briefTitle'] |
| Status | study['protocolSection']['statusModule']['overallStatus'] |
| Phase | study['protocolSection']['designModule']['phases'] |
| Enrollment | study['protocolSection']['designModule']['enrollmentInfo']['count'] |
| Eligibility | study['protocolSection']['eligibilityModule'] |
| Locations | study['protocolSection']['contactsLocationsModule']['locations'] |
| Interventions | study['protocolSection']['armsInterventionsModule']['interventions'] |
| Results | study.get('resultsSection') (None if no results posted) |
| Status | Description |
|---|---|
RECRUITING | Currently recruiting participants |
NOT_YET_RECRUITING | Approved but not yet open |
ENROLLING_BY_INVITATION | Invitation-only enrollment |
ACTIVE_NOT_RECRUITING | Active, enrollment closed |
SUSPENDED | Temporarily halted |
TERMINATED | Stopped prematurely |
COMPLETED | Study concluded |
WITHDRAWN | Withdrawn before enrollment |
| Phase | Description |
|---|---|
EARLY_PHASE1 | Early Phase 1 (formerly Phase 0) |
PHASE1 | Phase 1 — safety and dosing |
PHASE2 | Phase 2 — efficacy and side effects |
PHASE3 | Phase 3 — large-scale efficacy |
PHASE4 | Phase 4 — post-market surveillance |
NA | Not applicable (non-drug studies) |
| Parameter | Type | Description | Example |
|---|---|---|---|
query.cond | string | Condition/disease | lung cancer |
query.intr | string | Intervention/drug | Pembrolizumab |
query.locn | string | Geographic location | New York |
query.spons | string | Sponsor name | National Cancer Institute |
query.term | string | General full-text search | immunotherapy |
filter.overallStatus | string | Status filter (comma-separated) | RECRUITING,COMPLETED |
filter.phase | string | Phase filter | PHASE2,PHASE3 |
filter.ids | string | NCT ID filter | NCT04852770 |
sort | string | Sort order | LastUpdatePostDate:desc |
pageSize | int | Results per page (max 1000) | 100 |
pageToken | string | Pagination token | (from previous response) |
format | string | Response format | json or csv |
Sort options: LastUpdatePostDate, EnrollmentCount, StartDate, StudyFirstPostDate — each with :asc or :desc.
results = ct_search({
"query.cond": "type 2 diabetes",
"filter.overallStatus": "RECRUITING",
"pageSize": 20,
"sort": "LastUpdatePostDate:desc"
})
print(f"Found {results['totalCount']} recruiting diabetes trials")
for study in results['studies'][:5]:
proto = study['protocolSection']
nct = proto['identificationModule']['nctId']
title = proto['identificationModule']['briefTitle']
print(f" {nct}: {title}")
# Find Phase 3 trials testing Pembrolizumab
results = ct_search({
"query.intr": "Pembrolizumab",
"filter.overallStatus": "RECRUITING,ACTIVE_NOT_RECRUITING",
"filter.phase": "PHASE3",
"pageSize": 50
})
print(f"Phase 3 Pembrolizumab trials: {results['totalCount']}")
results = ct_search({
"query.cond": "cancer",
"query.locn": "New York",
"filter.overallStatus": "RECRUITING",
"pageSize": 20
})
# Extract location details
for study in results['studies'][:3]:
locs = study['protocolSection'].get('contactsLocationsModule', {}).get('locations', [])
for loc in locs:
if 'New York' in loc.get('city', ''):
print(f" {loc.get('facility')}: {loc['city']}, {loc.get('state', '')}")
results = ct_search({
"query.spons": "National Cancer Institute",
"pageSize": 20
})
for study in results['studies'][:5]:
sponsor_mod = study['protocolSection']['sponsorCollaboratorsModule']
lead = sponsor_mod['leadSponsor']['name']
collabs = [c['name'] for c in sponsor_mod.get('collaborators', [])]
print(f" Lead: {lead}, Collaborators: {collabs}")
nct_id = "NCT04852770"
response = requests.get(f"{CT_API}/studies/{nct_id}", timeout=30)
response.raise_for_status()
study = response.json()
# Extract key information
proto = study['protocolSection']
print(f"Title: {proto['identificationModule']['briefTitle']}")
print(f"Status: {proto['statusModule']['overallStatus']}")
# Eligibility criteria
elig = proto.get('eligibilityModule', {})
print(f"Ages: {elig.get('minimumAge')} - {elig.get('maximumAge')}")
print(f"Sex: {elig.get('sex')}")
print(f"Criteria:\n{elig.get('eligibilityCriteria', 'N/A')[:300]}")
all_studies = []
page_token = None
max_pages = 10
for page in range(max_pages):
params = {
"query.cond": "cancer",
"filter.overallStatus": "RECRUITING",
"pageSize": 1000,
}
if page_token:
params["pageToken"] = page_token
results = ct_search(params)
all_studies.extend(results['studies'])
page_token = results.get('nextPageToken')
if not page_token:
break
time.sleep(1.5) # respect rate limits
print(f"Retrieved {len(all_studies)} studies across {page + 1} pages")
response = requests.get(f"{CT_API}/studies", params={
"query.cond": "heart disease",
"filter.overallStatus": "RECRUITING",
"format": "csv",
"pageSize": 1000
}, timeout=60)
with open("heart_disease_trials.csv", "w") as f:
f.write(response.text)
print("Exported to heart_disease_trials.csv")
import requests, time
CT_API = "https://clinicaltrials.gov/api/v2"
def ct_search(params):
response = requests.get(f"{CT_API}/studies", params=params, timeout=30)
response.raise_for_status()
return response.json()
# Step 1: Search with multiple filters
results = ct_search({
"query.cond": "lung cancer",
"query.intr": "immunotherapy",
"query.locn": "California",
"filter.overallStatus": "RECRUITING,NOT_YET_RECRUITING",
"pageSize": 100,
"sort": "LastUpdatePostDate:desc"
})
print(f"Total matches: {results['totalCount']}")
# Step 2: Filter by phase
phase23 = [
s for s in results['studies']
if any(p in ['PHASE2', 'PHASE3']
for p in s['protocolSection'].get('designModule', {}).get('phases', []))
]
print(f"Phase 2/3 trials: {len(phase23)}")
# Step 3: Extract summaries
for study in phase23[:5]:
proto = study['protocolSection']
nct = proto['identificationModule']['nctId']
title = proto['identificationModule']['briefTitle']
enrollment = proto.get('designModule', {}).get('enrollmentInfo', {}).get('count', 'N/A')
print(f" {nct}: {title} (n={enrollment})")
# Step 1: Find completed trials with posted results
results = ct_search({
"query.cond": "alzheimer disease",
"filter.overallStatus": "COMPLETED",
"pageSize": 100,
"sort": "LastUpdatePostDate:desc"
})
with_results = [s for s in results['studies'] if s.get('hasResults', False)]
print(f"Completed with results: {len(with_results)} / {len(results['studies'])}")
# Step 2: Get detailed results for top trial
if with_results:
nct = with_results[0]['protocolSection']['identificationModule']['nctId']
detail = requests.get(f"{CT_API}/studies/{nct}", timeout=30).json()
if 'resultsSection' in detail:
outcomes = detail['resultsSection'].get('outcomeMeasuresModule', {})
measures = outcomes.get('outcomeMeasures', [])
for m in measures[:3]:
print(f" Outcome: {m.get('title')}")
print(f" Type: {m.get('type')}")
sponsors = ["Pfizer", "Novartis", "Roche"]
for sponsor in sponsors:
results = ct_search({
"query.spons": sponsor,
"filter.overallStatus": "RECRUITING",
"pageSize": 1
})
print(f"{sponsor}: {results['totalCount']} recruiting trials")
time.sleep(1.5)
def ct_search_with_retry(params, max_retries=3):
for attempt in range(max_retries):
try:
response = requests.get(f"{CT_API}/studies", params=params, timeout=30)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as e:
if e.response.status_code == 429:
wait = 60
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
else:
raise
except requests.exceptions.RequestException:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt)
raise Exception("Max retries exceeded")
def extract_summary(study):
proto = study.get('protocolSection', {})
ident = proto.get('identificationModule', {})
status = proto.get('statusModule', {})
design = proto.get('designModule', {})
return {
'nct_id': ident.get('nctId'),
'title': ident.get('officialTitle') or ident.get('briefTitle'),
'status': status.get('overallStatus'),
'phases': design.get('phases', []),
'enrollment': design.get('enrollmentInfo', {}).get('count'),
'last_update': status.get('lastUpdatePostDateStruct', {}).get('date')
}
# Usage
for study in results['studies'][:3]:
s = extract_summary(study)
print(f"{s['nct_id']}: {s['status']} | Phase: {s['phases']} | n={s['enrollment']}")
def safe_get(study, *keys, default='N/A'):
"""Navigate nested study JSON safely."""
current = study
for key in keys:
if isinstance(current, dict):
current = current.get(key)
else:
return default
if current is None:
return default
return current
# Usage — handles missing fields gracefully
nct = safe_get(study, 'protocolSection', 'identificationModule', 'nctId')
phases = safe_get(study, 'protocolSection', 'designModule', 'phases', default=[])
enrollment = safe_get(study, 'protocolSection', 'designModule', 'enrollmentInfo', 'count')
| Parameter | Endpoint | Default | Description |
|---|---|---|---|
query.cond | search | — | Condition/disease search term |
query.intr | search | — | Intervention/drug search term |
query.locn | search | — | Geographic location filter |
query.spons | search | — | Sponsor/organization filter |
query.term | search | — | General full-text search |
filter.overallStatus | search | all | Comma-separated status values |
filter.phase | search | all | Comma-separated phase values |
pageSize | search | 10 | Results per page (max 1000) |
sort | search | relevance | {field}:{asc|desc} |
format | both | json | json or csv |
timeout | (client) | 30s | Set in requests call |
| Problem | Cause | Solution |
|---|---|---|
| 429 Too Many Requests | Rate limit exceeded (~50/min) | Wait 60s; use max pageSize=1000; implement exponential backoff |
| Empty studies array | No trials match filters | Broaden search (remove status/phase filters); check spelling |
| 400 Bad Request | Invalid parameter value | Verify status/phase values match enumeration exactly (e.g., RECRUITING not recruiting) |
Missing resultsSection | Trial has no posted results | Check study['hasResults'] before accessing results |
| KeyError on nested field | Not all trials have all modules | Use .get() with defaults or safe_get helper (see Recipes) |
| Pagination stops early | nextPageToken absent | All results retrieved; check totalCount vs collected count |
| CSV format differs from JSON | Different field structure | CSV flattens nested structure; use JSON for programmatic access |
| Timeout on large exports | CSV with many results | Increase timeout; paginate with pageSize=1000 instead |
hasResults before accessing resultsSection — most trials have no posted results.get() chains — not all trials populate all modules (especially contactsLocationsModule, armsInterventionsModule)RECRUITING,NOT_YET_RECRUITING) — don't make separate requests per statussort=LastUpdatePostDate:desc by default — returns most recently updated trials firstlastUpdatePostDateStruct.date is ISO 8601 string; type field indicates ACTUAL vs ESTIMATEDpubmed-database — Published literature search complementary to trial registry datachembl-database-bioactivity — Compound bioactivity data for drugs under investigationbioservices-multi-database — Alternative database access via unified Python interfaceSelf-contained entry. Original total: 866 lines (SKILL.md 507 + api_reference.md 359). Scripts: 216 lines (query_clinicaltrials.py).
Original file disposition:
SKILL.md (507 lines) → Core API modules 1-7 (condition, intervention, location, sponsor, details, pagination, CSV export). "Core Capabilities" sections 1-10 consolidated: Search by Condition → Module 1, Search by Intervention → Module 2, Geographic Search → Module 3, Search by Sponsor → Module 4, Retrieve Detailed Study → Module 5, Pagination → Module 6, Data Export → Module 7, Combined Query → Workflow 1, Extract Summary → Recipe. "Resources" section stub → removed, content consolidated inline. Per-use-case disposition: Patient Matching → When to Use bullet + Workflow 1; Research Analysis → When to Use + Workflow 2; Drug Tracking → When to Use + Module 2; Geographic Search → Module 3; Sponsor Tracking → Module 4 + Workflow 3; Data Export → Module 7; Trial Monitoring → When to Use bullet; Eligibility Screening → Module 5references/api_reference.md (359 lines) → Fully consolidated inline: endpoint parameters → Key Concepts "Query Parameters Reference" table; status/phase values → Key Concepts tables; response structure → Key Concepts "Response Data Structure" table; HTTP error codes → Troubleshooting table; rate limit guidance → Prerequisites + Best Practices; use cases → duplicated main SKILL.md examples, absorbed into Core API; data standards (ISO 8601, CommonMark) → Prerequisites note. Error handling patterns → Recipes "Rate-Limited Bulk Search"scripts/query_clinicaltrials.py (216 lines) → Helper function pattern: search_studies() → Quick Start ct_search() helper; get_study_details() → Module 5 inline; search_with_all_results() → Module 6 pagination pattern; extract_study_summary() → Recipe "Extract Study Summary". Thin-wrapper shortcut applied — each function was a thin wrapper around requests.get()Retention: ~465 lines / 866 original (excl. scripts) = ~54%.