From agent-skills
Use when pulling data from WRDS databases, merging financial datasets via linking tables, validating panel structure, or constructing financial variables for finance and real estate research.
npx claudepluginhub agentic-assets/agent-skillsThis skill uses the workspace's default tool permissions.
Automated WRDS data extraction with pre-built query templates, merge logic, validation, and variable documentation for finance and real estate research.
assets/examples/quickstart.pyassets/requirements.txtassets/setup.pyassets/wrds_data_pull/__init__.pyassets/wrds_data_pull/config.pyassets/wrds_data_pull/crsp_utils.pyassets/wrds_data_pull/merge_utils.pyassets/wrds_data_pull/orchestrator.pyassets/wrds_data_pull/panel_diagnostics.pyassets/wrds_data_pull/query_templates.pyassets/wrds_data_pull/variable_builders.pyreferences/INDEX.mdreferences/alternative-data.mdreferences/compustat-crsp.mdreferences/known-data-issues.mdreferences/merge-keys.mdreferences/real-estate-databases.mdreferences/variable-construction.mdreferences/wrds-databases.mdscripts/coverage_report.pySearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.
Share bugs, ideas, or general feedback.
Automated WRDS data extraction with pre-built query templates, merge logic, validation, and variable documentation for finance and real estate research.
This skill packages 20+ years of WRDS research patterns into reusable components: parameterized SQL queries for major databases, step-by-step merge sequences with proper key handling, automated panel diagnostics, and standard financial variable construction with citations.
Saves 20-40 hours per project by eliminating repetitive query writing, merge debugging, and data validation setup.
Copy the production-tested pipeline from an existing project:
# Copy existing scripts to your project
import shutil
src = "path/to/---Overleaf_Template/Code/Python/1---wrds-direct-connection"
dst = "your_project/data_pipeline"
shutil.copytree(src, dst)
# Configure credentials
# Add to .env file: WRDS_USERNAME=your_username
# Run full pipeline
python data_pipeline/5---main_orchestrator.py
The pipeline includes:
1---config_wrds_data.py: Configuration with 100+ Compustat/CRSP variables2---pull_compustat_data.py: Annual and quarterly pulls with caching3---pull_crsp_data.py: Monthly data aggregated to quarterly4---merge_and_process_data.py: CUSIP-based linking (99% PERMNO coverage)5---main_orchestrator.py: Coordinated executionFor custom workflows, use the modular components:
from wrds_data_pull import query_templates, merge_utils
# 1. Pull data with template
comp_query = query_templates.compustat_quarterly(
variables=["gvkey", "datadate", "atq", "ltq"],
start_date="2020-01-01",
end_date="2023-12-31"
)
# 2. Execute query
import wrds
conn = wrds.Connection(wrds_username="username")
comp_df = conn.raw_sql(comp_query)
# 3. Validate panel
from wrds_data_pull.panel_diagnostics import validate_panel
diag = validate_panel(comp_df, unit_id="gvkey", time_id="datadate")
print(f"Panel type: {diag['panel_type']}, Duplicates: {diag['n_duplicates']}")
Compustat + CRSP (most common):
IBES + Thomson Reuters:
Real estate databases:
Alternative data:
All query templates are parameterized:
from wrds_data_pull.query_templates import compustat_quarterly
# Basic pull
query = compustat_quarterly(
variables=["gvkey", "datadate", "atq", "niq"],
start_date="2020-01-01",
end_date="2023-12-31"
)
# With filters
query = compustat_quarterly(
variables=["gvkey", "datadate", "atq", "niq"],
start_date="2020-01-01",
end_date="2023-12-31",
gvkey_list=["001045", "001078"], # Filter to specific firms
filters={"indfmt": "INDL", "datafmt": "STD"} # Standard filters
)
Compustat-CRSP via CCM linking table:
from wrds_data_pull.merge_utils import merge_compustat_crsp_via_ccm
merged_df = merge_compustat_crsp_via_ccm(
compustat_df=comp_df,
crsp_df=crsp_df,
link_type="LC" # Primary links only
)
CUSIP-based linking (alternative, higher coverage):
from wrds_data_pull.merge_utils import merge_via_cusip
merged_df = merge_via_cusip(
compustat_df=comp_df, # Must have 'cusip' column
crsp_df=crsp_df, # Must have 'cusip', 'year', 'quarter'
merge_on=["cusip", "year", "quarter"]
)
Run diagnostics after every pull:
python scripts/validate_panel.py \
--input merged_data.parquet \
--unit_id gvkey \
--time_id year_quarter
Checks:
Standard formulas with academic citations:
from wrds_data_pull.variable_builders import FinancialRatios
ratios = FinancialRatios(df)
df['tobins_q'] = ratios.tobins_q() # (ME + BV_debt) / BV_assets
df['leverage'] = ratios.leverage() # Total debt / Total assets
df['roa'] = ratios.roa() # Net income / Total assets
See variable-construction.md for complete formulas.
Configure WRDS connection, validate credentials, test database access.
python scripts/setup_wrds.py --username your_username
Panel diagnostics (balance, duplicates, coverage, known issues).
python scripts/validate_panel.py \
--input data.parquet \
--unit_id gvkey \
--time_id year_quarter \
--output diagnostics.txt
Variable-level missing data report with LaTeX output for papers.
python scripts/coverage_report.py \
--input data.parquet \
--output coverage_table.tex
Full pipeline from existing codebase (tested on 35K observations, 280+ variables):
# Use existing pipeline
import importlib.util
# Load orchestrator
spec = importlib.util.spec_from_file_location(
"orchestrator",
"Code/Python/1---wrds-direct-connection/5---main_orchestrator.py"
)
orch = importlib.util.module_from_spec(spec)
spec.loader.exec_module(orch)
# Run full pipeline
final_df = orch.main() # Returns merged Compustat + CRSP data
Pipeline includes:
# Pull specific variables
from wrds_data_pull.query_templates import compustat_quarterly
import wrds
conn = wrds.Connection(wrds_username="username")
query = compustat_quarterly(
variables=["gvkey", "datadate", "atq", "ltq", "seqq", "niq"],
start_date="2015-01-01",
end_date="2023-12-31"
)
df = conn.raw_sql(query)
# Construct ratios
from wrds_data_pull.variable_builders import construct_financial_ratios
df = construct_financial_ratios(df, ratios=["leverage", "roa"])
# Validate
from wrds_data_pull.panel_diagnostics import validate_panel
diag = validate_panel(df, unit_id="gvkey", time_id="datadate")
print(f"Coverage: {diag['coverage_summary']}")
# Pull NCREIF property-level returns
from wrds_data_pull.query_templates import ncreif_property_returns
query = ncreif_property_returns(
start_date="2010-01-01",
end_date="2023-12-31",
property_types=["Office", "Retail", "Apartment"]
)
# Execute and validate
import wrds
conn = wrds.Connection(wrds_username="username")
ncreif_df = conn.raw_sql(query)
# Merge with CoreLogic (requires separate access)
# See real-estate-databases.md for CoreLogic patterns
datafmt='STD' for unrestated (standardized) datadatacqtr for calendar quarters, datafqtr for fiscalseqq < 0 (common for distressed firms)msedelist table for complete return historyshrcd IN (10,11) for common stock onlydatadate BETWEEN LINKDT AND LINKENDDTLINKTYPE='LC')See known-data-issues.md for comprehensive documentation.
# Test WRDS access
import wrds
conn = wrds.Connection(wrds_username="username")
print(conn.list_libraries()) # Should show: comp, crsp, ibes, taqm, etc.
conn.close()
# Check if variable exists in table
conn = wrds.Connection(wrds_username="username")
fields = conn.list_table_fields("comp", "fundq")
print("atq" in [f.lower() for f in fields]) # True if exists
# Diagnose merge issues
from wrds_data_pull.merge_utils import diagnose_merge
diagnosis = diagnose_merge(
left_df=comp_df,
right_df=crsp_df,
merge_keys=["cusip", "year", "quarter"]
)
print(f"Match rate: {diagnosis['match_rate']:.1%}")
print(f"Left-only: {diagnosis['left_only_count']} observations")
print(f"Right-only: {diagnosis['right_only_count']} observations")
The assets/wrds_data_pull/ directory contains a standalone Python package with modular components:
query_templates.py: Parameterized SQL for all major WRDS databasesmerge_utils.py: Merge functions with date validation and diagnosticspanel_diagnostics.py: Balance checks, duplicate detection, coverage statsvariable_builders.py: Financial ratios with formulas and citations__init__.py: Convenient importspip install wrds pandas numpy scipy
Add to Python path:
import sys
sys.path.insert(0, "/path/to/wrds-data-pull/assets")
from wrds_data_pull import query_templates, merge_utils
Or copy assets/wrds_data_pull/ directly into your project.