Process genome assembly datasets for VEuPathDB resources
Curates genome assembly datasets for VEuPathDB by fetching NCBI metadata, BioProject details, and PubMed publications. Used when processing new genome assemblies to generate dataset presenter XML files for ApiCommonPresenters repository.
/plugin marketplace add VEuPathDB/dataset-curator/plugin install curation-skills@dataset-curatorThis skill inherits all available tools. When active, it can use any tool Claude has access to.
TODO.mdresources/curator-branching.mdresources/editing-large-xml.mdresources/step-1-fetch-ncbi.mdresources/step-2-fetch-bioproject.mdresources/step-3-fetch-pubmed.mdresources/step-4-curate-contacts.mdresources/step-5-update-presenter.mdresources/valid-projects.jsonscripts/check-repos.shscripts/fetch-bioproject.jsscripts/fetch-pubmed.jsscripts/generate-presenter-xml.jsThis skill guides processing of genome assembly datasets for VEuPathDB resources.
This workflow requires the following repositories in veupathdb-repos/:
First, run the repository status check to verify repositories are present:
Note: this script is located in the skill directory
bash scripts/check-repos.sh ApiCommonPresenters EbrcModelCommon
If repositories are missing, the script will provide clone instructions.
Branch Confirmation: After verifying repositories exist, check their current branches and status using git -C <path>, then confirm with the user before proceeding. Users typically create dataset-specific branches (see curator branching guidelines).
Example:
git -C veupathdb-repos/ApiCommonPresenters branch --show-current
git -C veupathdb-repos/ApiCommonPresenters status -sb
IMPORTANT: All commands in this workflow must be run from your curation workspace directory (the directory that contains veupathdb-repos/ as a subdirectory).
For Claude Code:
cd commands to change into veupathdb-repos/ subdirectoriesgit -C <path> for git operations in subdirectoriesgit -C veupathdb-repos/ApiCommonPresenters status instead of cd veupathdb-repos/ApiCommonPresenters && git statusThe workflow will create a tmp/ subdirectory in the curation workspace directory for intermediate files.
Gather the following before starting:
GCA_000988875.2 including version)Fetch assembly metadata from NCBI using the GenBank accession.
Command:
curl -X GET "https://api.ncbi.nlm.nih.gov/datasets/v2/genome/accession/<ASSEMBLY_ACCESSION>/dataset_report" \
-H "Accept: application/json" > tmp/<ASSEMBLY_ACCESSION>_dataset_report.json
Detailed instructions: Step 1 - Fetch NCBI Metadata
Extract the BioProject accession from the assembly report and fetch additional details.
Command:
node scripts/fetch-bioproject.js <BIOPROJECT_ACCESSION>
This retrieves the BioProject title and description, saved to tmp/<BIOPROJECT>_bioproject.json.
Detailed instructions: Step 2 - Fetch BioProject
Find and fetch publications for the genome assembly.
Command:
node scripts/fetch-pubmed.js <ASSEMBLY_ACCESSION>
Results saved to tmp/<ASSEMBLY_ACCESSION>_pubmed.json.
Detailed instructions: Step 3 - Fetch PubMed
Identify and curate contact entries for the genome submission.
Contact identification priority:
Actions:
veupathdb-repos/EbrcModelCommon/Model/lib/xml/datasetPresenters/contacts/allContacts.xmlDetailed instructions: Step 4 - Curate Contacts
Generate the datasetPresenter XML and insert it into the appropriate presenter file.
Command:
node scripts/generate-presenter-xml.js <ASSEMBLY_ACCESSION> <PROJECT> <PRIMARY_CONTACT_ID> [ADDITIONAL_CONTACT_IDS...]
Target file: veupathdb-repos/ApiCommonPresenters/Model/lib/xml/datasetPresenters/<PROJECT>.xml
Detailed instructions: Step 5 - Update Presenter Files
After completing this workflow:
scripts/fetch-bioproject.js - Fetches BioProject metadata from NCBI (esearch + esummary)scripts/fetch-pubmed.js - Fetches PubMed records linked to a BioProject (elink + esummary)scripts/generate-presenter-xml.js - Generates datasetPresenter XML from fetched metadatascripts/check-repos.sh - Validates veupathdb-repos/ repository setup (synced from shared/)Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.