<!-- STUB: Canonical source is foremanos-intel/skills/document-intelligence/SKILL.md. Run sync-stubs.sh to update. Do NOT edit directly. -->
From foremanos-compliancenpx claudepluginhub mgoodman60/foreman-os-plugin --plugin foremanos-complianceThis skill uses the workspace's default tool permissions.
references/conflict-detection-rules.mdGuides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Designs KPI dashboards with metrics selection (MRR, churn, LTV/CAC), visualization best practices, real-time monitoring, and hierarchy for executives, operations, and product teams.
Transforms raw data into narratives with story structures, visuals, and frameworks for executive presentations, analytics reports, and stakeholder communications.
A comprehensive extraction system for construction project documents that captures deep, field-actionable intelligence for daily reports, project management, quality control, and procurement tracking.
This skill transforms raw construction documents into structured, actionable project intelligence by:
Key difference from basic document reading: This skill extracts SPECIFIC VALUES and COMPLETE SCHEDULES rather than just summaries. For example:
ALWAYS use this skill when:
Triggers include:
This extraction system uses a 5-phase adaptive model. The phases below (Phase 1-2) cover the per-document extraction methodology. Cross-referencing (Phase 3), remediation (Phase 4), and normalization (Phase 5) are orchestrated at the pipeline level — see commands/process-docs.md for the full pipeline and agents/extraction-pipeline-orchestrator.md for the meta-agent.
For each document, extract metadata and classify:
doc-orchestrator.md classification table)Scan for structural patterns to guide extraction:
Based on document type, extract comprehensive intelligence following specialized references in extraction-rules.md.
Entity resolution during extraction: Assign tentative entity references using directory.json and specs-quality.json as lookup tables. Mark as "entity_status": "tentative". These are confirmed in Phase 3 (Cross-Reference) at the pipeline level.
This pass is REQUIRED for any PDF that contains construction drawings, plan sheets, details, sections, or schedules embedded as graphics. Construction documents communicate ~80% of their information through drawings, not text. Skipping this pass means missing scale data, dimensions, room locations, door swings, equipment positions, and graphical notes.
Method 1 — Claude Vision (Primary, always available in Cowork):
pdftoppm -png -r 200 "document.pdf" /tmp/sheet_images/sheet (or PyMuPDF)"source": "claude_vision" and "confidence": "medium"Method 2 — Python Visual Pipeline (Optional Enhancement):
If cv2, skimage, sklearn are installed, optionally run visual_plan_analyzer.py for automated extraction with precise pixel coordinates:
Method 3 — Tesseract OCR (Supplement for small text):
Supplement small text that Claude Vision misses: tesseract sheet.png output -l eng --psm 6
Priority: Claude Vision (primary, always available) > Tesseract (text supplement) > Python pipeline (optional precise coordinates)
DPI Recipes:
| Document Type | DPI | Notes |
|---|---|---|
| D-size drawings (24x36, 30x42) | 150 | Crop to 1800x1200 for Vision |
| Letter-size PDFs (specs, submittals) | 200 | Standard rendering |
| High-detail areas (enlarged plans, details) | 300-350 | Zone crop specific areas |
| Text-extractable PDFs (contracts, reports) | Skip | Extract text directly, no rendering |
Atomic Python Script Pattern:
When updating multiple JSON files from extraction, use a single atomic script: read all targets → modify in memory → write all → print summary. This prevents partial writes and ensures consistency. See commands/process-docs.md for the code template.
Scale calibration is RECOMMENDED before extracting measurement data from a sheet. It improves dimensional accuracy but is non-blocking — if calibration fails, record "scale_status": "uncalibrated" and continue extraction. Text-based data (room names, door marks, notes, equipment tags) is extracted regardless. Uncalibrated sheets are flagged for remediation in Phase 4.
Without a calibrated scale, pixel-based measurements are unreliable. This sub-pass runs as the FIRST step of visual analysis for every plan sheet.
Step 1 — Graphic Scale Bar Detection (Highest Priority):
Graphic scale bars are the most reliable scale anchor because they survive PDF resizing, printing at different sizes, and scanning. They are typically located near the title block or below individual views.
What to look for:
Claude Vision detection method:
pixels_per_foot = bar_pixel_length / bar_real_world_feetPython pipeline detection method (Pass 7 enhanced):
Step 2 — Text Scale Notation (Secondary):
Read text-format scale notations from title blocks and view labels:
1/4" = 1'-0", 1/8" = 1'-0", 3/16" = 1'-0", 1/2" = 1'-0", 1" = 1'-0", 3" = 1'-0"1:50, 1:100, 1:200SCALE: 1/4" = 1'-0", PLAN SCALE: 1/8"IMPORTANT: Text scales assume the sheet was printed or rendered at the nominal size (24×36", 30×42", etc.). If the PDF was resized during export or scanning, text scales will be WRONG. Always prefer graphic scale bar calibration when available.
Step 3 — Multi-Scale Zone Mapping:
Most plan sheets contain multiple views at different scales on the same sheet. Each view's scale must be mapped independently:
For each view on a sheet, record:
{
"view_name": "FLOOR PLAN",
"scale_text": "1/4\" = 1'-0\"",
"pixels_per_foot": 62.5,
"calibration_method": "graphic_bar",
"zone_bbox": [120, 80, 2800, 3200],
"confidence": "high"
}
Step 4 — Known-Dimension Calibration Fallback:
When neither graphic scale bar nor text scale is found (rare but possible on details, sections, or scanned documents):
pixels_per_foot = pixel_length / dimension_feet"method": "known_dimension", "confidence": "low"Step 5 — Double Calibration (Stretch Detection):
Scanned documents and some PDF exports may have different horizontal and vertical scaling (stretched/skewed). After establishing scale:
h_pixels_per_footv_pixels_per_foot"stretch_detected": truepixels_per_foot = sqrt(h * v)Scale Data Storage — REQUIRED:
Every plan sheet must have scale data recorded in plans-spatial.json under scale_calibration with per-sheet and per-view scale information:
"scale_calibration": {
"sheets": {
"A-101": {
"sheet_name": "FLOOR PLAN",
"calibration_method": "graphic_bar",
"graphic_bar": {
"pixel_length": 250,
"real_world_feet": 16,
"tick_count": 5,
"location": [2400, 3600]
},
"text_scale": "1/4\" = 1'-0\"",
"pixels_per_foot": {
"horizontal": 62.5,
"vertical": 62.3,
"mean": 62.4
},
"stretch_detected": false,
"zones": [
{
"view_name": "FLOOR PLAN",
"scale_text": "1/4\" = 1'-0\"",
"pixels_per_foot": 62.4,
"zone_bbox": [120, 80, 2800, 3200],
"confidence": "high"
},
{
"view_name": "ENLARGED RESTROOM PLAN",
"scale_text": "1/2\" = 1'-0\"",
"pixels_per_foot": 124.8,
"zone_bbox": [2850, 80, 3600, 1200],
"confidence": "high"
}
],
"confidence": "high",
"calibration_date": "2026-02-20"
}
}
}
If scale calibration fails for a sheet (no graphic bar, no text scale, no readable dimensions):
"calibration_method": "none", "scale_status": "uncalibrated", "confidence": "failed""confidence": "low" and "uncalibrated": trueUncalibrated sheets reduce the extraction depth score but do not block the pipeline. They are queued for targeted remediation in Phase 4.
After scale calibration, extract and chain dimension strings. Dimension chains are the primary cross-check for measurement accuracy.
What is a dimension chain? On construction drawings, dimensions are arranged in "strings" — a series of sub-dimensions along a common line that sum to an overall dimension. For example, column grid spacing (31'-4" + 29'-0" + 29'-0" + 29'-0" + 14'-4" = 132'-8"). If the sub-dimensions don't sum to the overall, there's either an extraction error or a drawing error worth flagging.
Claude Vision extraction method:
Store dimension chains in plans-spatial.json under dimensions.chains:
{
"chain_id": "DIM-CHAIN-001",
"sheet": "A-100",
"orientation": "horizontal",
"overall_value": "132'-8\"",
"overall_numeric_ft": 132.67,
"segments": [
{"value": "31'-4\"", "numeric_ft": 31.33, "from_element": "Grid 1", "to_element": "Grid 2"},
{"value": "29'-0\"", "numeric_ft": 29.0, "from_element": "Grid 2", "to_element": "Grid 3"}
],
"segments_sum_ft": 132.67,
"sum_matches_overall": true,
"discrepancy_ft": 0
}
Dimensions NOT part of any chain go into dimensions.isolated.
Elevation markers appear on building sections, wall sections, and elevation views. They provide critical vertical measurement data.
What to look for:
Spot elevations appear on civil/site plans as standalone numbers (e.g., "856.50") with an X or + marker:
Store elevation markers in plans-spatial.json under elevation_markers:
{
"sheet": "A-300",
"label": "T.O. WALL",
"elevation": "12'-0\"",
"elevation_ft": 12.0,
"datum": "FFE = 0'-0\"",
"marker_type": "level"
}
Store spot elevations in plans-spatial.json under site_grading.spot_elevations:
{
"sheet": "C-103",
"elevation_ft": 856.50,
"type": "proposed",
"label": "856.50'"
}
Cross-sheet references are the connective tissue of a plan set. Every section cut, detail callout, elevation marker, and schedule reference on one sheet points to a view on another sheet. Building this index is what enables assembly chains and multi-page calculations.
What to extract:
Section Cut Markers — Circle with number/letter + arrow indicating viewing direction. Format: number / sheet_number (e.g., "3/A-301" means "Section 3 is on sheet A-301").
Detail Callouts — Circle or diamond with detail_number / sheet_number. Points to an enlarged detail view.
Elevation Markers — Circle with number/letter, NO arrow (unlike section cuts). References an exterior or interior elevation view.
Enlarged Plan References — Dashed rectangle on the main plan with label pointing to a larger-scale view on another sheet.
Schedule References — Door marks (D101), window marks (W-3), room numbers, equipment tags that cross-reference to schedule sheets.
Spec Section References — Text like "SEE SECTION 03 30 00" or "PER SPEC 09 65 00" on drawings.
Store in plans-spatial.json under sheet_cross_references:
{
"detail_callouts": [
{
"id": "XREF-001",
"source_sheet": "A-100",
"source_location": {"grid": "C-3", "zone": "plan_area"},
"target_sheet": "A-501",
"target_detail": "3",
"callout_type": "section",
"description": "Building section at Grid 3",
"linked_elements": ["grid_3", "wall_type_2A"]
}
]
}
IMPORTANT: Cross-reference extraction should happen on EVERY sheet, not just architectural. Structural sheets reference details, MEP sheets reference riser diagrams, civil sheets reference profile views. Build the index incrementally — each processed sheet adds its references to the master index.
After processing all sheets in a document, verify the index:
This pass applies ONLY to civil/site plan sheets (C-series sheets, grading plans, site plans with contour lines). Skip for architectural, structural, MEP, and other disciplines.
Contour lines represent terrain elevations. The difference between existing and proposed contours at any point equals the cut or fill depth — critical for earthwork volume estimation.
What to extract:
Contour Elevation Labels — 3-4 digit numbers (e.g., 856, 857, 858) placed along curved lines. These are the elevation values for each contour.
Existing Contours (current terrain):
Proposed Contours (design terrain):
Index Contours — Every 5th or 10th contour is drawn thicker. Labels are always shown on index contours. Use index contour spacing to determine the contour interval (typically 1', 2', or 5').
Drainage Patterns — Water flows perpendicular to contours, from high elevation to low. V-shaped contour patterns pointing uphill indicate valleys/swales (drainage paths). Contours bowing away from buildings should confirm positive drainage.
Cut/Fill Zones — Where proposed contours are LOWER than existing = CUT (excavation). Where proposed contours are HIGHER than existing = FILL (import/compaction). Document the general cut/fill pattern for the site.
Store in plans-spatial.json under site_grading:
{
"contours": {
"existing": [{"elevation_ft": 856.0, "line_style": "dashed", "sheet": "C-103"}],
"proposed": [{"elevation_ft": 856.0, "line_style": "solid", "sheet": "C-103"}],
"contour_interval_ft": 1.0
},
"cut_fill_volumes": {
"net_operation": "cut",
"avg_cut_depth_ft": 3.5,
"max_cut_depth_ft": 6.0,
"confidence": "low",
"note": "Rough estimate — verify with earthwork quantity takeoff"
},
"drainage_patterns": [
{"direction": "southeast", "high_elevation": 862, "low_elevation": 854, "fall_ft": 8}
]
}
Cross-check with SWPPP: The contour/grading data should be consistent with SWPPP disturbed area and cut/fill volumes. Flag if the grading plan shows significantly different earthwork than the SWPPP.
This pass applies to architectural section sheets (A-300 series), structural sections (S-300 series), wall section details (A-400/A-500 series), and any sheet containing building or wall section cuts. Skip for plan views, site plans, MEP plans, and reflected ceiling plans.
Building sections and wall sections are the primary source of vertical dimensioning, assembly composition, and structural depth data that cannot be obtained from plan views.
What to extract:
1. Building Sections (full-height cross-sections through the building):
2. Wall Sections (enlarged vertical cuts through wall assemblies):
3. Section Identification:
Store in plans-spatial.json under sections:
{
"sections": {
"building": [
{
"section_id": "SECT-A300-1",
"sheet": "A-300",
"section_mark": "1",
"cut_line_on_sheet": "A-100",
"cut_location": "Grid line 3, looking east",
"scale": "3/4\" = 1'-0\"",
"title": "Building Section - East/West",
"floor_to_floor_ft": 14.5,
"foundation_depth_ft": 4.0,
"footing_thickness_in": 12,
"stem_wall_height_ft": 2.0,
"ridge_height_ft": 22.5,
"eave_height_ft": 14.0,
"parapet_height_ft": 0,
"roof_slope": "4:12",
"roof_slope_degrees": 18.43,
"ceiling_height_ft": 10.0,
"ceiling_type": "ACT",
"structural_depth_in": 24,
"structural_member": "PEMB rafter",
"sog_thickness_in": 5,
"measurements": [
{"label": "FFE to T.O. Steel", "value_ft": 14.5, "confidence": "high"}
]
}
],
"wall": [
{
"section_id": "WSECT-A302-1",
"sheet": "A-302",
"section_mark": "1",
"wall_type": "Type 4",
"fire_rating": "1-HR",
"total_thickness_in": 5.25,
"layers": [
{"material": "5/8\" Type X GWB", "thickness_in": 0.625, "position": "interior_finish"},
{"material": "3-5/8\" 20ga metal stud @ 16\" OC", "thickness_in": 3.625, "position": "stud_cavity"},
{"material": "R-13 batt insulation", "thickness_in": 3.5, "position": "insulation"},
{"material": "5/8\" Type X GWB", "thickness_in": 0.625, "position": "interior_finish"}
],
"base_condition": "Vinyl base on GWB, sealant at floor line",
"top_condition": "Deflection track, fire safing above ceiling",
"head_detail": null,
"sill_detail": null,
"measurements": [
{"label": "Total assembly", "value_in": 5.25, "confidence": "high"}
]
}
]
}
}
Cross-checks after section extraction:
This pass applies to RCP sheets (A-102, A-5xx series, sheets titled "Reflected Ceiling Plan") and MEP plan sheets (M-series mechanical, P-series plumbing, E-series electrical). Skip for site plans, structural framing plans, and architectural floor plans (unless they contain MEP overlay data).
RCPs and MEP plans are the primary source for overhead-plane data (ceiling heights, fixture placement, HVAC devices) and system-level data (pipe sizes, duct sizes, panel schedules, equipment tags) that cannot be obtained from architectural floor plans or sections.
What to extract:
1. Reflected Ceiling Plan Data:
Store in plans-spatial.json under ceiling_data:
{
"ceiling_data": {
"source_sheet": "A-102",
"grid_type": "2x2",
"grid_module": "24\" x 24\"",
"tile_product": "Armstrong Fine Fissured Second Look 1761",
"height_zones": [
{"rooms": ["101", "102", "103"], "ceiling_height_ft": 9.0, "ceiling_type": "ACT"},
{"rooms": ["104"], "ceiling_height_ft": 10.0, "ceiling_type": "GWB"}
],
"fixture_positions": [
{"tag": "A", "type": "2x4 recessed troffer", "rooms": ["101", "102"], "count": 12},
{"tag": "C", "type": "6\" LED downlight", "rooms": ["103", "104"], "count": 8}
],
"soffits_bulkheads": [
{"room": "101", "description": "Soffit at perimeter, 8'-0\" AFF, 18\" wide"}
]
}
}
2. MEP System Data (Mechanical, Plumbing, Electrical):
Mechanical (M-series):
Plumbing (P-series):
Electrical (E-series):
Store in plans-spatial.json under mep_systems:
{
"mep_systems": {
"electrical": {
"source_sheets": ["E-100", "E-101", "E-200"],
"panel_schedules": [
{
"panel": "LP-1",
"location": "Electrical Room 110",
"voltage": "208/120V",
"phase": "3-phase",
"main_breaker": "225A",
"circuits": [
{"circuit": 1, "description": "Lighting - Rooms 101-103", "breaker": "20A", "load_va": 1800}
]
}
],
"single_line_data": {
"service_size": "400A",
"voltage": "208/120V 3-phase",
"transformer": "75 KVA"
},
"circuit_assignments": [
{"room": "101", "circuits": [1, 3, 5], "description": "Lighting + receptacles"}
]
},
"pipe_sizes": [
{"system": "domestic_cold", "size": "2\"", "material": "copper", "sheet": "P-100", "run_segments": []}
],
"duct_sizes": [
{"size": "12x8", "type": "supply", "sheet": "M-100", "run_segments": []}
],
"equipment": [
{
"tag": "RTU-1",
"type": "rooftop_unit",
"sheet": "M-100",
"position": {"grid": "C-3", "room": "Roof"},
"schedule_data": {"cooling_tons": 10, "heating_mbh": 250, "cfm": 4000, "voltage": "208V/3ph"}
}
]
}
}
Cross-checks after RCP + MEP extraction:
See references/plans-deep-extraction.md (MEP Drawings section and Reflected Ceiling Plan section) for detailed extraction rules, symbol libraries, and cross-reference tables.
MEP Extraction Depth Requirements:
For EVERY MEP sheet processed, the extraction must achieve the following minimums:
| Discipline | Minimum Extraction |
|---|---|
| Mechanical | Every equipment tag + cooling tons + heating MBH + CFM + electrical (V/ph/MCA/MOCP) + served rooms |
| Electrical | Every panel schedule with ALL circuits, single-line diagram hierarchy, lighting fixture schedule with types/quantities/wattages |
| Plumbing | Every fixture from schedule with type/mounting/connections/ADA/flow rate, water heater specs, pipe sizes by system |
| Fire Protection | System type (wet/dry/pre-action), riser location, FDC, head schedule (type/temp/K-factor), fire pump if present |
Do NOT stop at equipment tags only. The extraction must capture schedule data (capacity, ratings, electrical requirements) from mechanical/electrical/plumbing schedule sheets (M-300, E-300, P-400 series).
Reference: See references/mep-deep-extraction.md for complete field-by-field extraction templates.
This pass is MANDATORY for any project with MEP drawings. Schedule sheets contain the engineering data that equipment tags on plan sheets reference. Without schedule extraction, you only get tag names — not capacities, ratings, or specifications.
Processing order:
Output depth per discipline — see references/mep-deep-extraction.md for complete field lists.
Validation: After processing, verify:
This pass applies to exterior elevation sheets (A-200 series) and architectural floor plans with accessibility annotations. Skip for sections, MEP plans, and civil/site sheets (except site accessibility routes).
Exterior elevations provide the vertical face of the building — material zones, window/door positions, grade line, and overall proportions. Accessibility data is typically annotated on floor plans and site plans as required routes, clearances, and signage.
What to extract:
1. Exterior Elevation Data (A-200 series):
Store in plans-spatial.json under elevations.exterior:
{
"elevations": {
"exterior": [
{
"face": "South",
"sheet": "A-200",
"materials": [
{"zone": "primary_cladding", "material": "Vertical metal panels", "color": "Champagne tan", "mfg_code": "Nucor 12-45"},
{"zone": "base", "material": "Split-face CMU", "color": "Warm gray", "height_ft": 4.0},
{"zone": "trim", "material": "Aluminum", "color": "Dark bronze anodized"}
],
"grade_line_elevation_ft": 856.0,
"window_positions": [
{"mark": "W-101", "sill_height_ft": 3.0, "head_height_ft": 7.0}
],
"door_positions": [
{"mark": "101", "type": "storefront entry", "width_ft": 6.0, "height_ft": 7.0}
],
"measurements": [
{"label": "Grade to eave", "value_ft": 14.0},
{"label": "Grade to ridge", "value_ft": 22.5}
]
}
]
}
}
2. Accessibility Data (Floor Plans + Site Plans):
Store in plans-spatial.json under accessibility_data:
{
"accessibility_data": {
"accessible_routes": [
{
"from": "Accessible parking",
"to": "Main entrance",
"route_width_in": 60,
"surface": "Concrete sidewalk",
"slope_pct": 2.0,
"compliant": true
}
],
"ada_restrooms": [
{
"room": "109",
"turning_radius_in": 60,
"wc_centerline_in": 18,
"grab_bars": true,
"lavatory_clearance": true,
"door_width_in": 36,
"compliant": true
}
],
"ramps": [
{"location": "Main entry", "slope": "1:12", "width_in": 44, "handrails": true, "landings": true}
],
"signage_locations": [
{"type": "tactile_braille", "location": "All room doors", "mounting_height_in": 60, "side": "latch"}
],
"accessible_parking": {
"standard_count": 2,
"van_accessible_count": 1,
"signage": true,
"access_aisle_width_ft": 5
},
"counter_heights": [
{"location": "Reception 101", "height_in": 34, "type": "transaction", "knee_clearance": true}
]
}
}
Cross-checks after elevation + accessibility extraction:
See references/plans-deep-extraction.md (Exterior Elevations section) for detailed rendering data extraction rules and material zone formatting.
This pass applies to ALL architectural and structural plan sheets that contain hatched regions or keynote/general note annotations. It is a refinement pass that improves existing Pass 5 material zone data and adds keynote intelligence that no prior pass extracts.
Why this pass exists: Pass 5 uses raw Gabor texture analysis to classify hatched regions, but construction hatching follows AIA/ANSI standards with specific patterns mapped to specific materials. This pass overlays standard-pattern matching on top of Pass 5 results to improve classification accuracy. It also extracts keynote bubbles and general note schedules — critical text-based intelligence that links numbered callouts on drawings to their full descriptions.
What to extract:
Hatch Refinement (improves Pass 5 material_zones):
For each material zone already detected by Pass 5, refine the classification:
"context": "section_cut" or "context": "plan_view" based on the sheet type detected in Pass 10/12material_labelStore refined zones by UPDATING existing entries in plans-spatial.json → material_zones[]:
{
"zone_id": "MATZ-001",
"sheet": "A-301",
"material_type": "concrete",
"hatch_pattern": "AR-CONC",
"hatch_angle_deg": [45, 135],
"hatch_spacing_in": 0.125,
"hatch_density_lpi": 8.0,
"context": "section_cut",
"material_label": "4000 PSI CONC",
"legend_match": true,
"area_sf": 12.5,
"bbox": [120, 340, 280, 520],
"confidence": "high"
}
Keynote Extraction:
Keynotes are numbered callouts on drawings that reference a keynote schedule (either on the same sheet or on a general notes sheet). They are the primary way architects communicate specification-level information on plan sheets.
Store keynotes in plans-spatial.json → keynotes:
{
"keynotes": {
"schedule": [
{
"number": "1",
"description": "PROVIDE 5/8\" TYPE X GWB ON RATED SIDE OF PARTITION",
"spec_section": "09 29 00",
"sheet_source": "A-001"
}
],
"callouts": [
{
"keynote_number": "1",
"sheet": "A-101",
"position": [1240, 890],
"points_to": {"element_type": "wall", "element_id": "Type 2 partition"},
"leader_endpoint": [1180, 870]
}
]
},
"general_notes": [
{
"sheet": "A-001",
"note_number": "1",
"text": "ALL DIMENSIONS ARE TO FACE OF STUD UNLESS OTHERWISE NOTED",
"spec_refs": [],
"category": "dimensions"
}
]
}
Cross-checks after hatch refinement + keynote extraction:
legend_match: true. Zones without legend matches get confidence: "medium" at bestSee references/visual-extraction-reference.md (Pass 13 methodology section) for hatch angle measurement algorithms, keynote bubble detection, and standard pattern classification tables.
Merge visual results with text-based extraction. Visual data fills gaps where PDF text extraction misses graphical content. See references/visual-extraction-reference.md for accuracy expectations and integration rules.
This skill uses domain-specific extraction guides. Read the appropriate reference(s) before extracting:
| Reference | When to Use | Contains |
|---|---|---|
| construction-document-conventions.md | Always read first — before any extraction | How drawings are organized, sheet numbering navigation, cross-reference system, scale/measurement, quantity calculation formulas, contour/grading, line types, hatch patterns, abbreviations |
| extraction-rules.md | Read second | Framework, classification, principles |
| plans-deep-extraction.md | Processing plans/drawings | Complete room/finish/door/window schedules, MEP equipment, structural specs |
| specifications-deep-extraction.md | Processing specs | All CSI divisions with exact values, weather thresholds, testing frequencies |
| schedule-deep-extraction.md | Processing CPM schedules | Activity details, sequencing logic, float, critical path, constraints |
| civil-deep-extraction.md | Processing civil/site drawings | Storm/sanitary/water systems with pipe sizes and elevations |
| compliance-deep-extraction.md | Processing geotech, safety, SWPPP | Bearing capacity, compaction, BMPs, inspection requirements |
| project-docs-deep-extraction.md | Processing contracts, sub lists, RFIs, subcontracts, POs | Contract terms, sub contacts, RFI status, submittal tracking, SC scope/exclusions, PO line items |
| pemb-deep-extraction.md | Processing PEMB manufacturer documents | Column reactions, anchor bolts, framing, erection sequence, panels, accessories |
| submittals-deep-extraction.md | Processing submittal packages | Concrete mixes, door hardware, shop drawings, MEP equipment, finishes |
| dxf-extraction.md | Processing .dxf/.dwg CAD files | Layer mapping, block attributes, hatch areas, polyline geometry, dimensions, parse_dxf.py |
| visual-extraction-reference.md | Processing plan sheet images (PDF→PNG) | OCR text extraction, line/wall detection, symbol recognition, material zones, dimensions, scale calibration |
| masterformat-reference.md | Enriching reports, QA checks, morning briefs | CSI Div 01-33: sequencing, QC issues, hold points, testing, weather limits, PEMB + healthcare overlays |
| mep-deep-extraction.md | Processing MEP plan sheets | Complete HVAC/plumbing/electrical equipment, panel schedules, fixture schedules, pipe/duct sizes, fire protection |
Read base extraction rules:
Read references/extraction-rules.md
Classify the document (Pass 1 + 2):
Read appropriate specialized reference(s):
# If plans/drawings:
Read references/plans-deep-extraction.md
# If specifications:
Read references/specifications-deep-extraction.md
# If schedule:
Read references/schedule-deep-extraction.md
# etc.
Extract comprehensively following the reference guide
Track extraction quality and structure output
CRITICAL: Process ONE document at a time. Do NOT batch-process multiple documents in a single pass — this causes context overload and extraction loops. Instead:
Read base rules: references/extraction-rules.md
Classify all documents first (Pass 1 + 2 for each) — this is a lightweight scan, not full extraction
Present classification to user for confirmation
Use AskUserQuestion to let the user choose which document to process first
Recommended processing order (earlier provides context for later):
After each document: STOP, report results, ask user what to process next
Cross-reference data across documents incrementally as each new document is processed
Report results after each document with metrics
Plans:
Specifications:
Schedule:
After extraction, the quantitative-intelligence skill uses extracted data + sheet cross-references to:
plans-spatial.json and sheet references in the respective data files for use by reports, briefs, and dashboardsA successful extraction includes:
When the user asks to see extracted project data ("show me the project intelligence", "project intel dashboard", "what data do we have"), generate a comprehensive Project Intelligence Dashboard:
project-config.json, plans-spatial.json, specs-quality.json, schedule.json, directory.json, and all log files){PROJECT_CODE}_Project_Intelligence.html in the user's output folderExtended reference: Detailed examples, templates, scoring rubrics, and best practices are in
references/skill-detail.md.
Four Node.js hooks in foremanos-intel/hooks/ provide automated quality guardrails during extraction. They run locally (~12ms each, zero API calls) and never block writes — only warn on stderr.
| Hook | Trigger | What It Does |
|---|---|---|
project-brain-validator.js | PreToolUse (Write|Edit) | Validates JSON structure, canonical top-level keys, catches empty overwrites |
counter-reconciler.js | PostToolUse (Write) | Checks _count fields match actual array lengths |
extraction-checkpoint.js | PreCompact | Saves extraction phase/progress before context compaction |
data-loss-guard.js | PreToolUse (Write) | Warns if new content has fewer records than existing file |
See commands/process-docs.md "Extraction Hooks" section for installation instructions.
Optional MCP server tools enhance the extraction pipeline when available. See references/mcp-extraction-tools.md for the full reference including PDF Tools (Phase 1 classification), PDF Display (Phase 3-4 PM review), Cloudflare D1 (extraction state persistence for 100+ doc projects), and R2 (render archive).