From datafusion-skills
Searches Apache DataFusion documentation, user guide, SQL functions, and Rust code examples via GitHub CLI for questions or keywords.
npx claudepluginhub datafusion-contrib/datafusion-skills --plugin datafusion-skillsThis skill is limited to using the following tools:
You are helping the user find relevant Apache DataFusion documentation.
Fetches dbt docs in markdown via .md URLs or llms.txt index; searches full content with bash script. For dbt Core, Cloud, Semantic Layer queries.
Searches DuckDB and DuckLake documentation and blog posts via full-text search on locally cached indexes. Returns relevant doc chunks for questions or keywords on DuckDB features.
Fetches up-to-date GitHub documentation for libraries, frameworks, and APIs via search and chunk/TOC retrieval for usage, references, configs.
Share bugs, ideas, or general feedback.
You are helping the user find relevant Apache DataFusion documentation.
Query: $@
Follow these steps in order.
If the input is a natural language question (e.g. "how do I create an external table"), extract the key technical terms: nouns, function names, SQL keywords. Drop stop words.
If the input is already a function name or technical term (e.g. APPROX_PERCENTILE_CONT, CREATE EXTERNAL TABLE), use it as-is.
Use the extracted terms as SEARCH_QUERY in the next steps.
The DataFusion user guide is in the GitHub repo under docs/. Search it using gh:
Important: Do NOT quote multi-word search terms as a single string. Pass each word
as a separate token so gh search code matches broadly. For example, use
EXTERNAL TABLE not "EXTERNAL TABLE".
gh search code $SEARCH_QUERY --repo apache/datafusion --language markdown --limit 10
If gh is not available, fall back to the GitHub API:
gh api "search/code?q=$SEARCH_QUERY+repo:apache/datafusion+extension:md&per_page=10" --jq '.items[:10][] | "\(.path)"'
DataFusion's built-in functions are documented in docs/source/user-guide/sql/. Check specifically:
gh search code "$SEARCH_QUERY" --repo apache/datafusion --language markdown --limit 5 -- path:docs/source/user-guide/sql/
Also list the available SQL doc files so you can fetch the most relevant one directly:
gh api "repos/apache/datafusion/contents/docs/source/user-guide/sql" --jq '.[].name' 2>/dev/null
If the query is about API usage or implementation patterns, search Rust source code:
gh search code "$SEARCH_QUERY" --repo apache/datafusion --language rust --limit 5
For the most relevant results (top 2-3), fetch the actual content:
gh api "repos/apache/datafusion/contents/<path>" --jq '.content' | base64 -d
If the file is too large, fetch just the relevant section. Look for the search terms in the content and extract the surrounding context (heading + content under that heading).
Organize the results by relevance:
For each result, provide:
If the search didn't find exactly what the user needed:
You can also check the DataFusion user guide at https://datafusion.apache.org/user-guide/ or the API docs at https://docs.rs/datafusion/latest/datafusion/
If the query is about a specific SQL function:
Try running
datafusion-cli -c "SELECT * FROM information_schema.df_settings WHERE name LIKE '%<keyword>%';"to see related configuration options.
For faster lookups, here are paths to key documentation sections:
| Topic | Path in repo |
|---|---|
| SQL Reference | docs/source/user-guide/sql/ |
| Scalar Functions | docs/source/user-guide/sql/scalar_functions.md |
| Aggregate Functions | docs/source/user-guide/sql/aggregate_functions.md |
| Window Functions | docs/source/user-guide/sql/window_functions.md |
| CREATE EXTERNAL TABLE | docs/source/user-guide/sql/ddl.md |
| Data Types | docs/source/user-guide/sql/data_types.md |
| Configuration | docs/source/user-guide/configs.md |
| Python Bindings | docs/source/user-guide/python/ |
| Library Usage | docs/source/library-user-guide/ |