From workflows
This skill should be used when the user asks to "query WRDS", "access Compustat", "get CRSP data", "pull Form 4 insider data", "query ISS compensation", "download SEC EDGAR filings", "get ExecuComp data", "access Capital IQ", "write SAS code for WRDS", "SAS ETL", "SAS hash merge", "SGE array job", "qsas", "qsub SAS", "TAQ data", "trades and quotes", "NBBO", "intraday data", "millisecond data", "closing auction", "VWAP TAQ", "order imbalance", "FJC database", "federal court cases", "securities litigation data", "Form D data", "Reg D data", "private placements data", "private offering data", "Regulation D filings", "IPO data", "SEO data", "new issues", "equity offerings", "SDC new issues", "SDC Platinum", "SDC M&A", "mergers acquisitions data", "M&A database", "FISD", "bond issuances", "144A offerings", "high yield bonds", "investment grade bonds", "Mergent bond data", "fund formation data", "hedge fund registrations", "private equity fund data", "closed-end fund filings", "Form ADV data", "investment adviser registrations", or needs WRDS PostgreSQL query patterns or SAS ETL performance patterns.
npx claudepluginhub edwinhu/workflows --plugin workflowsThis skill uses the workspace's default tool permissions.
- [Query Enforcement](#query-enforcement)
examples/blockholders_pipeline/README.mdexamples/blockholders_pipeline/pull_blockholders.pyexamples/blockholders_pipeline/redo_bridge.pyexamples/blockholders_pipeline/src/__init__.pyexamples/blockholders_pipeline/src/aggregate.pyexamples/blockholders_pipeline/src/parser.pyexamples/form4_disposals.pyexamples/form4_pipeline/README.mdexamples/form4_pipeline/form4_step1_query_filings.pyexamples/form4_pipeline/form4_step2_download_xmls.shexamples/form4_pipeline/form4_step3_parse_xmls.pyexamples/form4_pipeline/pull_insider_ownership.pyexamples/form4_pipeline/sas/pull_tr_insiders.sasexamples/form4_pipeline/sas/run_insider_array.shexamples/formd_regd.ipynbexamples/fund_formation_eda.ipynbexamples/lpc_dealscan_eda.ipynbexamples/lpc_dealscan_eda.pyexamples/pitchbook_eda.ipynbexamples/sdc_issuances_eda.ipynbImplements Playwright E2E testing patterns: Page Object Model, test organization, configuration, reporters, artifacts, and CI/CD integration for stable suites.
Guides Next.js 16+ Turbopack for faster dev via incremental bundling, FS caching, and HMR; covers webpack comparison, bundle analysis, and production builds.
Discovers and evaluates Laravel packages via LaraPlugins.io MCP. Searches by keyword/feature, filters by health score, Laravel/PHP compatibility; fetches details, metrics, and version history.
WRDS (Wharton Research Data Services) provides academic research data via PostgreSQL at wrds-pgdata.wharton.upenn.edu:9737.
Before executing ANY WRDS query, you MUST:
This is not negotiable. Skipping sample inspection is NOT HELPFUL — the user builds analysis on data with undetected quality problems.
| Excuse | Reality | Do Instead |
|---|---|---|
| "I'll add filters later" | You'll forget and pull bad data | Add filters NOW, before execution |
| "User didn't specify filters" | Standard filters are ALWAYS required | Apply Critical Filters section defaults |
| "Just a quick test query" | Test queries with bad filters teach bad patterns | Use production filters even for tests |
| "I'll let the user filter in pandas" | Pulling millions of unnecessary rows wastes time/memory | Filter at database level FIRST |
| "The query worked, so it's correct" | Query success ≠ data quality | INSPECT sample for invalid records |
| "I can use f-strings for simple queries" | SQL injection risk + wrong type handling | ALWAYS use parameterized queries |
Before EVERY query execution:
For Compustat queries (comp.funda, comp.fundq):
indfmt = 'INDL'datafmt = 'STD'popsrc = 'D'consol = 'C'For CRSP v2 queries (crsp.dsf_v2, crsp.msf_v2):
sharetype == 'NS'securitytype == 'EQTY'securitysubtype == 'COM'usincflg == 'Y'issuertype.isin(['ACOR', 'CORP'])For Form 4 queries (tr_insiders.table1):
For ALL queries:
.head() or .sample() BEFORE claiming successWriting SAS code that forces full table scans when indexes exist is NOT HELPFUL — the user's job runs 100x slower than necessary and may timeout.
Before EVERY SAS program execution:
For merges/joins:
PROC SORT + DATA merge)defineKey/defineData/defineDone pattern correctlyh.output() uses double quotes for macro resolution (not single quotes)call missing() initializes hash data variables for non-matchesFor WHERE clauses (CRITICAL):
year(date), month(date), datepart(dt) wrapping indexed columnsBETWEEN "01jan&year."d AND "31dec&year."d range patternupcase(), substr() on indexed columnsyear() = X AND quarter() = Y)For batch processing:
#$ -t start-end) not sequential loop-sysparm (not -set or %sysget)#$ -l m_mem_free=4G minimum)For PROC SQL:
calculated keyword used for computed column references in HAVINGFor macros:
&year. not &year)options mprint mlogic symbolgen used during development| Excuse | Reality | Do Instead |
|---|---|---|
| "Sort-merge is simpler to write" | Hash is 10x faster for lookup joins and requires no sorting | Write the hash — it's 5 extra lines |
| "year(date) is readable" | Readable but prevents index usage — full table scan on millions of rows | Use BETWEEN with date literals |
| "I'll parallelize later" | Later never comes and the job runs 18x slower sequentially | Write the SGE array job NOW |
| "Single quotes work fine in hash" | Single quotes block macro resolution — your output dataset name is wrong | ALWAYS double quotes in h.output() |
| "PROC SQL is easier than hash" | PROC SQL still sorts for joins — hash avoids all sorting | Hash for lookups, SQL only for complex aggregations |
| "The job only takes a few minutes per year" | 18 years × 3 minutes = 54 minutes sequential vs 3 minutes parallel | SGE array for ANY multi-year job |
| "%sysget works for getting the year" | Unreliable in SGE context — may return blank silently | Use -sysparm + &sysparm. |
where year(date) = anything → STOP. Use BETWEEN with date literals.proc sort; data; merge for a lookup join → STOP. Use hash object.%do year = start %to end loop → STOP. Use SGE array job.h.output(dataset: '...') → STOP. Use double quotes.-set or %sysget for SGE task parameters → STOP. Use -sysparm.See references/sas-etl.md for complete patterns:
| Dataset | Schema | Key Tables |
|---|---|---|
| Compustat | comp | company, funda, fundq, secd |
| ExecuComp | comp_execucomp | anncomp |
| CRSP | crsp | dsf, msf, stocknames, ccmxpf_linkhist |
| CRSP v2 | crsp | dsf_v2, msf_v2, stocknames_v2 |
| Form 4 Insiders | tr_insiders | table1, header, company |
| ISS Incentive Lab | iss_incentive_lab | comppeer, sumcomp, participantfy |
| Capital IQ | ciq | wrds_compensation |
| IBES | tr_ibes | det_epsus, statsum_epsus |
| Form D / Reg D | wrdssec | wrds_vc_formd (parsed, 2000–2020); index: wrdssec_all.forms (all CIKs) or wrds_forms (filer only) — default to forms, see references/wrds-forms-tables.md |
| SEC EDGAR | wrdssec_all | forms (raw index, all CIKs per filing — default), wrds_forms (filer-only view), wciklink_cusip |
| SEC Search | wrds_sec_search | filing_view, registrant |
| EDGAR | edgar | filings, filing_docs |
| Fama-French | ff | factors_monthly, factors_daily |
| LSEG/Datastream | tr_ds | ds2constmth, ds2indexlist |
| FJC (Federal Judicial Center) | fjc | civil, criminal, bankruptcy, appeals |
| FJC Linking | fjc_linking | wrds_civil_link, wrds_criminal_link |
| SDC New Issues (IPO/SEO/Debt) | tr_sdc_ni | wrds_ni_details — equity + debt offerings |
| SDC Mergers & Acquisitions | tr_sdc_ma | wrds_ma_details — M&A transactions |
| TAQ Legacy | taq | mast_YYYY, wrds_iid_YYYY — second-level (1993–2006) |
| TAQ Millisecond | taqmsec | mastm_YYYY, wrds_iid_YYYY, ctm_YYYYMM, complete_nbbo_YYYYMMDD |
| Thomson S12 (Mutual Fund Holdings) | tfn (SAS) / tr_mutualfunds (PG) | s12 — 13F/N-CSR fund holdings |
| Thomson S34 (13-F Institutional) | tfn (SAS) / tr_13f (PG) | s34 — 13-F institutional holdings |
| FISD / Mergent (Bonds) | fisd_fisd | fisd_mergedissue, fisd_mergedissuer |
| PitchBook | pitchbk_companies_deals, pitchbk_investors_funds_lps, pitchbk_fund_returns | deal, company, fund, wrds_fund_returns — dealsize in USD millions |
Initialize PostgreSQL connection to WRDS:
import psycopg2
conn = psycopg2.connect(
host='wrds-pgdata.wharton.upenn.edu',
port=9737,
database='wrds',
sslmode='require'
# Credentials from ~/.pgpass
)
Configure authentication via ~/.pgpass with chmod 600:
wrds-pgdata.wharton.upenn.edu:9737:wrds:USERNAME:PASSWORD
Connect via SSH tunnel:
ssh wrds
This uses ~/.ssh/wrds_rsa for authentication.
Always include for clean fundamental data:
WHERE indfmt = 'INDL'
AND datafmt = 'STD'
AND popsrc = 'D'
AND consol = 'C'
Equivalent to legacy shrcd IN (10, 11):
df = df.loc[
(df.sharetype == 'NS') &
(df.securitytype == 'EQTY') &
(df.securitysubtype == 'COM') &
(df.usincflg == 'Y') &
(df.issuertype.isin(['ACOR', 'CORP']))
]
WHERE acqdisp = 'D' -- Dispositions
AND trancode IN ('S', 'D', 'G', 'F') -- Sales, Dispositions, Gifts, Tax
Always use parameterized queries (never string formatting):
Use scalar parameter binding for single values:
cursor.execute("""
SELECT gvkey, conm FROM comp.company WHERE gvkey = %s
""", (gvkey,))
Use ANY() for list parameters:
cursor.execute("""
SELECT * FROM comp.funda WHERE gvkey = ANY(%s)
""", (gvkey_list,))
Detailed query patterns and table documentation:
references/compustat.md - Compustat tables, ExecuComp, financial variablesreferences/crsp.md - CRSP stock data, CCM linking, v2 formatreferences/insider-form4.md - Thomson Reuters Form 4, rolecodes, insider typesreferences/iss-compensation.md - ISS Incentive Lab, peer companies, compensationreferences/formd.md - Form D / Reg D: denormalization gotcha, dedup pattern, exemption codes, post-2020 gap, SEC TSV downloadreferences/edgar.md - SEC EDGAR filings, URL construction, DCN vs accession numbersreferences/connection.md - Connection pooling, caching, error handlingreferences/taq.md - TAQ: master files, IID, raw tick processing (NBBO, VWAP, closing auctions), CRSP–TAQ merge, era transition (legacy vs millisecond)references/sas-etl.md - SAS hash objects, index-friendly WHERE, SGE array jobs, PROC SQL optimizationreferences/postgres-vs-sas.md - Decision guide: when to use PostgreSQL vs SAS for WRDS ETL (benchmarks, constraints, hybrid pattern)references/fjc.md - FJC Integrated Database: civil/criminal case data, NOS codes, securities litigation queries, firm linkingreferences/sdc-issuances.md - SDC New Issues: IPOs, SEOs, 144A equity, debt offerings — schema discovery, cleaning filters, CRSP/Compustat linkingreferences/fisd-bonds.md - FISD/Mergent: corporate bond issuances, IG vs HY, 144A vs registered, rating classification, TRACE linkingreferences/sdc-ma.md - SDC M&A: deal counts, PE/LBO vs strategic buyer, deal status codes, public vs private targetreferences/fund-formation.md - Fund formation: Form D (pooled investment funds), EDGAR N-2 (closed-end fund IPOs), Form ADV (RIA registrations)references/pitchbook.md - PitchBook: schema architecture, dealsize/fundsize in USD millions, dealdate outliers, CIK crosswalk, fund performance (wrds_fund_returns), PE/VC/fund formation patternsWorking code from real projects:
examples/form4_disposals.py - Insider trading analysis (from SVB project)examples/wrds_connector.py - Connection pooling patternexamples/formd_regd.ipynb - Form D / Reg D: dedup validation, SEC TSV download, exemption trend chartsexamples/sdc_issuances_eda.ipynb - SDC New Issues: annual IPO/SEO/debt counts, 144A share, IG vs HY breakdownexamples/sdc_ma_eda.ipynb - SDC M&A: annual deal counts, PE/LBO vs strategic, public vs private target trendsexamples/fund_formation_eda.ipynb - Fund formation: Form D 3C.1/3C.7 counts, EDGAR N-2 closed-end fund IPOs, Form ADV RIA registrationsexamples/pitchbook_eda.ipynb - PitchBook: PE deal activity, VC rounds by stage, fund formation by vintage, IRR/TVPI by strategyexamples/voting_ownership_pipeline/ - Self-contained hybrid SAS+Python pipeline: ISS votes, 13-F inst. ownership, MF holdings via MFLINKS, merged panel. Canonical example of PostgreSQL vs SAS decision-making on WRDS. See README.md for architecture and usage.scripts/test_connection.py - Validate WRDS connectivityWRDS-provided samples at ~/resources/wrds-code-samples/:
ResearchApps/CCM2025.ipynb - Modern CRSP-Compustat mergeResearchApps/ff3_crspCIZ.ipynb - Fama-French factor constructioncomp/sas/execcomp_ceo_screen.sas - ExecuComp patternsWhen querying historical data, leverage current date context for dynamic range calculations.
Current date is automatically available via datetime.now(). Apply this to:
Implement dynamic date ranges in queries:
from datetime import datetime, timedelta
# Query last 5 years of data
end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)
query = """
SELECT * FROM comp.funda
WHERE datadate BETWEEN %s AND %s
"""
df = pd.read_sql(query, conn, params=(start_date, end_date))
Always incorporate current date awareness in date-dependent queries to ensure results remain fresh across time.