From asi
Queries ExoPriors Scry API with Postgres SQL for lexical BM25 search across 229M+ entities in forums, papers, social media, government records, and prediction markets. Enables author identity resolution and OpenAlex academic graph navigation.
npx claudepluginhub plurigrid/asi --plugin asiThis skill uses the workspace's default tool permissions.
Scry gives you read-only SQL access to the ExoPriors public corpus (229M+ entities)
Queries OpenAlex API for 240M+ scholarly works: search papers by topic/author/institution, track citations, analyze trends, bibliometrics, open access.
Queries OpenAlex API for 240M+ scholarly works: search papers by title/topic, find author/institution outputs, track citations, analyze trends, discover open access. For bibliometric analysis.
Compose semantic vectors in Scry: embed concepts as @handles, search by cosine distance, debias with vector algebra, diagnose signal loss. For semantic search and embedding tasks.
Share bugs, ideas, or general feedback.
Scry gives you read-only SQL access to the ExoPriors public corpus (229M+ entities)
via a single HTTP endpoint. You write Postgres SQL against a curated scry.* schema
and get JSON rows back. There is no ORM, no GraphQL, no pagination token -- just SQL.
Skill generation: 2026031701
Use this skill when:
Do NOT use this skill when:
Context handshake first. At session start, call
GET /v1/scry/context?skill_generation=2026031701.
This endpoint is public; you do not need a key for the handshake itself.
Use the returned offerings block for the current product summary
budgets, canonical env var, default skill, and specialized skill catalog.
If you need a concise shareable bootstrap prompt for another agent, use
offerings.public_agent_prompt.copy_text instead of paraphrasing your own.
If you need deeper docs, use offerings.canonical_doc_path, each skill's
repo_path, and reference_paths instead of guessing where the maintained
docs live.
If you cache descriptive bootstrap context across turns or sessions, also
track surface_context_generation and refresh when it changes.
Read lexical_search.status as well: if it is not healthy, stop assuming
global scry.search* is reliable and pivot to source-local scry.* /
mv_* surfaces or semantic retrieval while the canonical BM25 index
recovers.
If should_update_skill=true, tell the user to run npx skills update.
If the response reports client_skill_generation: null while you're using
packaged skills, or if local instructions still mention
api.exopriors.com or exopriors.com/console, treat the install as stale
and ask the user to run npx skills update before more debugging.
Schema first. ALWAYS call GET /v1/scry/schema before writing SQL.
Never guess column names or types. The schema endpoint returns live
column metadata and row-count estimates for every view.
Check operational status when search looks wrong. If lexical search,
materialized-view freshness, or corpus behavior seems off, call
GET /v1/scry/index-view-status before assuming the query or schema is wrong.
Clarify ambiguous intent before heavy queries. If the request is vague ("search Reddit for X", "find things about Y"), ask one short clarification question about the goal/output format before running expensive SQL.
Start with a cheap probe. Before any query likely to run >5s, run
/v1/scry/estimate and/or a tight exploratory query (LIMIT 20 plus scoped
source/window filters), then scale only after confirming relevance.
Choose lexical vs semantic explicitly. Use lexical (scry.search*) for
exact terms and named entities. For conceptual intent ("themes", "things like",
"similar to"), route to scry-vectors first, then optionally hybridize.
LIMIT always. Every query MUST include a LIMIT clause. Max 10,000 rows. Queries without LIMIT are rejected by the SQL validator.
Prefer canonical surfaces with tight filters. scry.entities has 229M+
rows, so do not scan it blindly. Use scry.search* for lexical retrieval,
scry.chunk_embeddings for chunk-level semantic retrieval, scry.entity_embeddings
or scry.entities_with_embeddings only when you want one entity-level vector
row per entity, scry.embedding_coverage to inspect public vs staged vs ready
source/kind coverage, and
source-native tables such as scry.hackernews_items,
scry.wikipedia_articles, scry.pubmed_papers, scry.repec_records, scry.openalex_works, scry.bluesky_posts,
scry.mailing_list_messages, and scry.openlibrary_* when a corpus no
longer lives canonically in scry.entities. Reach for a specific mv_*
convenience view only when /v1/scry/schema confirms it is healthy and
useful for the task.
Cross-table composition is normal. If the best records live in multiple
source-native tables, combine them in one SQL statement with CTEs,
UNION ALL, and joins through scry.source_records. This is the intended
contract, not a workaround.
Filter dangerous content. Always include
WHERE content_risk IS DISTINCT FROM 'dangerous' unless the user explicitly
asks for unfiltered results. Dangerous content contains adversarial
prompt-injection content.
Raw SQL, not JSON. POST /v1/scry/query takes Content-Type: text/plain
with raw SQL in the body. Not JSON-wrapped SQL.
File rough edges promptly. If Scry blocks the task, misses an obvious
result set, or exposes a rough edge, submit a brief note to
POST /v1/feedback?feedback_type=suggestion|bug|other&channel=scry_skill
using Content-Type: text/plain by default (text/markdown also works). Do not silently work
around it. Logged-in users can review their submissions with GET /v1/feedback.
For full tier limits, timeout policies, and degradation strategies, see Shared Guardrails.
Recommended default for less-technical users: in the directory where you launch the agent, store SCRY_API_KEY in .env so skills and copied prompts use the same place.
Canonical key naming for this skill:
SCRY_API_KEYscry_anon_* from POST /v1/scry/anonymous-keyX-Scry-Client-Tag: <short-stable-tag>printf '%s\n' 'SCRY_API_KEY=<your key>' >> .env
set -a && source .env && set +a
Verify:
echo "$SCRY_API_KEY"
Anonymous bootstrap flow when the user wants immediate public access without signup:
CLIENT_TAG="${SCRY_CLIENT_TAG:-dev-laptop}"
ANON_KEY="$(curl -s https://api.scry.io/v1/scry/anonymous-key -X POST -H "X-Scry-Client-Tag: $CLIENT_TAG" | python3 -c 'import json,sys; print(json.load(sys.stdin)[\"api_key\"])')"
curl -s https://api.scry.io/v1/scry/schema \
-H "Authorization: Bearer $ANON_KEY" \
-H "X-Scry-Client-Tag: $CLIENT_TAG"
curl -s https://api.scry.io/v1/scry/query \
-H "Authorization: Bearer $ANON_KEY" \
-H "X-Scry-Client-Tag: $CLIENT_TAG" \
-H "Content-Type: text/plain" \
--data "SELECT 1 LIMIT 1"
Use this for fast trial access only. The anonymous bootstrap lane is intentionally generous for the first few queries and then degrades. For sustained usage, prefer a personal Scry API key.
Keep the same X-Scry-Client-Tag value on the same device when staying anonymous so the backend can distinguish a real first-use session from abuse behind shared IPs.
If using packaged skills, keep them current:
npx skills add exopriors/skills
npx skills update
POST /v1/scry/query still supports standard x402, but it is now an explicit
paid path rather than the default no-auth bootstrap path. Use x402 when the
user already has an x402-capable wallet/client and only needs direct paid query
execution. For public trial use, use POST /v1/scry/anonymous-key. For
schema/context, shares, judgements, feedback, or repeated multi-endpoint usage,
prefer a personal Scry API key.
If the user wants wallet-native durable identity plus a reusable key, use
POST /v1/auth/agent/signup first. That binds the wallet to a user and returns
a session token plus API key in one flow.
Minimal client shape:
import { wrapFetchWithPayment } from 'x402-fetch';
const paidFetch = wrapFetchWithPayment(fetch, walletClient);
const resp = await paidFetch('https://api.scry.io/v1/scry/query', {
method: 'POST',
headers: { 'content-type': 'text/plain' },
body: 'SELECT 1 LIMIT 1',
});
One end-to-end example: find recent high-scoring LessWrong posts about RLHF.
Step 1: Get dynamic context + update advisory
GET https://api.scry.io/v1/scry/context?skill_generation=2026031701
Authorization: Bearer $SCRY_API_KEY
Step 2: Get schema
GET https://api.scry.io/v1/scry/schema
Authorization: Bearer $SCRY_API_KEY
Step 3: Run query
POST https://api.scry.io/v1/scry/query
Authorization: Bearer $SCRY_API_KEY
Content-Type: text/plain
WITH hits AS (
SELECT id FROM scry.search('RLHF reinforcement learning human feedback',
kinds=>ARRAY['post'], limit_n=>100)
)
SELECT e.uri, e.title, e.original_author, e.original_timestamp, e.score
FROM hits h
JOIN scry.entities e ON e.id = h.id
WHERE e.source = 'lesswrong'
ORDER BY e.score DESC NULLS LAST, e.original_timestamp DESC
LIMIT 20
Response shape:
{
"columns": ["uri", "title", "original_author", "original_timestamp", "score"],
"rows": [["https://...", "My RLHF Post", "author", "2025-01-15T...", 142], ...],
"row_count": 20,
"duration_ms": 312,
"truncated": false
}
Source-native cross-table example:
WITH hn AS (
SELECT 'hackernews'::text AS source, hn_id::text AS external_id, score
FROM scry.search_hackernews_items('interpretability', kinds => ARRAY['post'], limit_n => 20)
),
wiki AS (
SELECT 'wikipedia'::text AS source, page_id::text AS external_id, score
FROM scry.search_wikipedia_articles('interpretability', limit_n => 20)
),
hits AS (
SELECT * FROM hn
UNION ALL
SELECT * FROM wiki
)
SELECT h.source, r.uri, r.title, h.score
FROM hits h
JOIN scry.source_records r
ON r.source = h.source
AND r.external_id = h.external_id
ORDER BY h.score DESC
LIMIT 20;
User wants to search the ExoPriors corpus?
|
+-- Ambiguous / conceptual ask? --> Clarify intent first, then use
| scry-vectors for semantic search (optionally hybridize with lexical)
|
+-- By keywords/phrases? --> scry.search() (BM25 lexical over canonical content_text)
| +-- Specific forum? --> join/filter `source` explicitly (or use a healthy source-local view if schema confirms it)
| +-- Reddit? --> START with scry.reddit_subreddit_stats /
| scry.reddit_clusters() / scry.reddit_embeddings
| and trust /v1/scry/schema status before
| using direct retrieval helpers
| +-- Large result? --> scry.search_ids() (up to 2000 lexical IDs; join for fields)
|
+-- By structured filters (source, date, author)? --> Direct SQL on MVs
|
+-- By semantic similarity? --> (scry-vectors skill, not this one)
|
+-- Hybrid (keywords + semantic rerank)? --> scry.hybrid_search() or
| lexical CTE + JOIN scry.chunk_embeddings
|
+-- Author/people lookup? --> scry.actors, scry.people, scry.person_accounts
|
+-- Academic graph (OpenAlex)? --> scry.openalex_find_authors(),
| scry.openalex_find_works(), etc. (see schema-guide.md)
|
+-- Need to share results? --> POST /v1/scry/shares
|
+-- Need to emit a structured observation? --> POST /v1/scry/judgements
|
+-- Scry blocked / missing obvious results? --> POST /v1/feedback
curl -s "https://api.scry.io/v1/scry/context?skill_generation=2026031701" \
-H "Authorization: Bearer $SCRY_API_KEY"
If response includes "should_update_skill": true, ask the user to run:
npx skills update.
If the response shows "client_skill_generation": null while the session is
using packaged Scry skills, or if local instructions still point at
api.exopriors.com / exopriors.com/console, stop and ask the user to run
npx skills update before deeper debugging.
If response includes "lexical_search": {"status": "rebuilding"|"degraded"|"stale"|...},
prefer source-local scry.* surfaces or scry.entities_with_embeddings and use
/v1/scry/index-view-status for detailed rebuild timing before blaming the query.
curl -s "https://api.scry.io/v1/feedback?feedback_type=bug&channel=scry_skill" \
-H "Authorization: Bearer $SCRY_API_KEY" \
-H "Content-Type: text/plain" \
--data $'## What happened\n- Query: ...\n- Problem: ...\n\n## Why it matters\n- ...\n\n## Suggested fix\n- ...'
Success response includes a receipt id. Logged-in users can review their own
submissions with:
curl -s "https://api.scry.io/v1/feedback?limit=10" \
-H "Authorization: Bearer $SCRY_API_KEY"
WITH c AS (
SELECT id FROM scry.search('your query here',
kinds=>ARRAY['post'], limit_n=>100)
)
SELECT e.uri, e.title, e.original_author, e.original_timestamp
FROM c JOIN scry.entities e ON e.id = c.id
WHERE e.content_risk IS DISTINCT FROM 'dangerous'
LIMIT 50
Default kinds if omitted: ['post','paper','document','webpage','twitter_thread','grant'].
scry.search() broadens once to kinds=>ARRAY['comment'] if that default returns 0 rows.
Pass explicit kinds for strict scope (for example comment-only or tweet-only).
For source scoping, join back to scry.entities and filter source explicitly.
Healthy source-specific MVs can still be useful for source-native score fields
such as base_score, but they are optional convenience slices rather than the default path.
SELECT subreddit, total_count, latest
FROM scry.reddit_subreddit_stats
WHERE subreddit IN ('MachineLearning', 'LocalLLaMA')
ORDER BY total_count DESC
For semantic Reddit retrieval over the embedding-covered subset, use
scry.reddit_embeddings or scry.search_reddit_posts_semantic(...).
Direct retrieval helpers (scry.reddit_posts, scry.reddit_comments,
scry.mv_reddit_*, scry.search_reddit_posts(...),
scry.search_reddit_comments(...)) are currently degraded on the public
instance. Check /v1/scry/schema status before using them.
SELECT entity_id, uri, title, original_author, score, original_timestamp
FROM scry.mv_arxiv_papers
WHERE original_timestamp >= '2025-01-01'
ORDER BY original_timestamp DESC
LIMIT 50
score is NULL for arXiv papers on the public surface. Sort by
original_timestamp, category, or downstream citation proxies instead.
SELECT e.source::text, COUNT(*) AS docs, MAX(e.original_timestamp) AS latest
FROM scry.entities e
WHERE e.original_author ILIKE '%yudkowsky%'
AND e.content_risk IS DISTINCT FROM 'dangerous'
GROUP BY e.source::text
ORDER BY docs DESC
LIMIT 20
SELECT kind::text, COUNT(*)
FROM scry.hackernews_items
WHERE original_timestamp >= '2025-01-01'
GROUP BY kind::text
ORDER BY 2 DESC
LIMIT 20
Source-native corpora follow the same pattern:
SELECT kind::text, COUNT(*)
FROM scry.wikipedia_articles
WHERE original_timestamp >= '2025-01-01'
GROUP BY kind::text
ORDER BY 2 DESC
LIMIT 20
Removing the date bound turns this into a large base-table aggregation. Run
/v1/scry/estimate first or prefer source-specific MVs when they already cover
the question.
WITH c AS (
SELECT id FROM scry.search('deceptive alignment',
kinds=>ARRAY['post'], limit_n=>200)
)
SELECT e.uri, e.title, e.original_author,
emb.embedding_voyage4 <=> @p_deadbeef_topic AS distance
FROM c
JOIN scry.entities e ON e.id = c.id
JOIN scry.chunk_embeddings emb ON emb.entity_id = c.id AND emb.chunk_index = 0
WHERE e.content_risk IS DISTINCT FROM 'dangerous'
ORDER BY distance
LIMIT 50
Requires a stored embedding handle (@p_deadbeef_topic). See scry-vectors
skill for creating handles.
curl -s -X POST https://api.scry.io/v1/scry/estimate \
-H "Authorization: Bearer $SCRY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT id, title FROM scry.mv_arxiv_papers LIMIT 1000"}'
Returns EXPLAIN (FORMAT JSON) output. Use this for expensive queries before committing.
It does not prove BM25 helper health: if scry.search* fails, check
/v1/scry/index-view-status and /v1/scry/schema status as well.
The /v1/scry/context handshake now also exposes lexical_search.status for
cheap degraded-mode detection before you start issuing lexical helpers.
# 1. Run query and capture results
# 2. POST share
curl -s -X POST https://api.scry.io/v1/scry/shares \
-H "Authorization: Bearer $SCRY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"kind": "query",
"title": "Top RLHF posts on LessWrong",
"summary": "20 highest-scored LW posts mentioning RLHF.",
"payload": {
"sql": "...",
"result": {"columns": [...], "rows": [...]}
}
}'
Kinds: query, rerank, insight, chat, markdown.
Progressive update: create stub immediately, then PATCH /v1/scry/shares/{slug}.
Rendered at: https://scry.io/scry/share/{slug}.
curl -s -X POST https://api.scry.io/v1/scry/judgements \
-H "Authorization: Bearer $SCRY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"emitter": "my-agent",
"judgement_kind": "topic_classification",
"target_external_ref": "arxiv:2401.12345",
"summary": "Paper primarily about mechanistic interpretability.",
"payload": {"primary_topic": "mech_interp", "confidence_detail": "title+abstract match"},
"confidence": 0.88,
"tags": ["arxiv", "mech_interp"],
"privacy_level": "public"
}'
Exactly one target required: target_entity_id, target_actor_id,
target_judgement_id, or target_external_ref.
Judgement-on-judgement: use target_judgement_id to chain observations.
-- Per-source author grouping
SELECT a.handle, a.display_name, a.source::text, COUNT(*) AS docs
FROM scry.entities e
JOIN scry.actors a ON a.id = e.author_actor_id
WHERE e.source = 'twitter'
GROUP BY a.handle, a.display_name, a.source::text
ORDER BY docs DESC
LIMIT 50
-- Find all replies to a root post
SELECT id, uri, title, original_author, original_timestamp
FROM scry.entities
WHERE anchor_entity_id = 'ROOT_ENTITY_UUID'
ORDER BY original_timestamp
LIMIT 100
anchor_entity_id is the root subject; parent_entity_id is the direct parent.
Avoid COUNT(*) on large tables. Instead, use schema endpoint row estimates or:
SELECT reltuples::bigint AS estimated_rows
FROM pg_class
WHERE relname = 'mv_lesswrong_posts'
LIMIT 1
Note: pg_class access is blocked on the public Scry SQL surface. Use /v1/scry/schema instead.
See references/error-reference.md for the full catalogue. Key patterns:
| HTTP | Code | Meaning | Action |
|---|---|---|---|
| 400 | invalid_request | SQL parse error, missing LIMIT, bad params | Fix query |
| 401 | unauthorized | Missing or invalid API key | Check key |
| 402 | insufficient_credits | Token budget exhausted | Notify user |
| 429 | rate_limited | Too many requests | Respect Retry-After header |
| 503 | service_unavailable | Scry pool down or overloaded | Wait and retry |
Auth + timeout diagnostics for CLI users:
000, that is client-side timeout/network abort, not a server HTTP status. Check --max-time and retry with /v1/scry/estimate first.401 with "Invalid authorization format", check for whitespace/newlines in the key:
KEY_CLEAN="$(printf '%s' \"$SCRY_API_KEY\" | tr -d '\\r\\n')"
then use Authorization: Bearer $KEY_CLEAN.Quota fallback strategy:
Retry-After seconds, retry once.When this skill completes a query task, return a consistent structure:
## Scry Result
**Query**: <natural language description>
**SQL**: ```sql <the SQL that ran> ```
**Rows returned**: <N> (truncated: <yes/no>)
**Duration**: <N>ms
<formatted results table or summary>
**Share**: <share URL if created>
**Caveats**: <any data quality notes, e.g., "score is NULL for arXiv">
Produces: JSON with columns, rows, row_count, duration_ms, truncated
Feeds into:
rerank: ensure SQL returns id and content_text columns for candidate setsscry-vectors: save entity IDs for embedding lookup and semantic reranking
Receives from: none (entry point for SQL-based corpus access)For detailed schema documentation, see references/schema-guide.md.
For the full pattern library, see references/query-patterns.md.
For error codes and quota details, see references/error-reference.md.