Skill

extract

Extracts requirement candidates from PDFs, Markdown, Word docs, text files, and URLs. Categorizes by type, deduplicates across files, saves in YAML, and provides extraction summary.

documentation

developer-tools

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/requirements-elicitation:extract

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadGlobGrepWriteSkillWebFetch

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Extract requirements from documents for systematic requirement mining.

SKILL.md

300 lines · ~1.7k tokens

Stats

LanguagePython

Parent stars67

Parent forks10

MaintenanceExcellent

Last CommitMar 17, 2026

Actions

View Source View Plugin View on GitHub View README

Extract Command

Extract requirements from documents for systematic requirement mining.

Usage

/requirements-elicitation:extract path/to/document.pdf
/requirements-elicitation:extract path/to/spec.md --domain "authentication"
/requirements-elicitation:extract https://example.com/features --type competitor
/requirements-elicitation:extract ./docs/*.md --mode full-auto

Arguments

Argument	Required	Description
path-or-url	Yes	File path, glob pattern, or URL to extract from
--domain	No	Domain name for organizing output files
--mode	No	Autonomy mode: `guided`, `semi-auto`, `full-auto` (default: `semi-auto`)
--type	No	Document type hint: `spec`, `transcript`, `regulatory`, `competitor`, `auto`

Supported Sources

Source Type	Examples
PDF	`document.pdf`, `spec.pdf`
Markdown	`readme.md`, `requirements.md`
Text	`notes.txt`, `transcript.txt`
Word	`document.docx`
URL	`https://docs.example.com/api`
Glob	`./docs/.md`, `./specs//.pdf`

Workflow

Step 1: Source Resolution

Parse the input to determine:

Single file vs. multiple files (glob)
Local file vs. URL
Document type (auto-detect or from --type)

Step 2: Load Document Extraction Skill

Invoke the requirements-elicitation:document-extraction skill to load extraction strategies.

Step 3: Process Each Document

For each document:

Read/Fetch Content
- Use Read tool for local files
- Use WebFetch for URLs
Assess Document
- Determine document type if not specified
- Choose extraction strategy
Extract Requirements
- Spawn document-miner agent
- Apply appropriate patterns
- Capture with source attribution
Categorize and Deduplicate
- Assign types and categories
- Identify duplicates within and across documents

Step 4: Save Results

Save extraction results to:

.requirements/{domain}/documents/DOC-{filename}-{timestamp}.yaml

Step 5: Report Summary

Display extraction statistics and key findings.

Examples

Single PDF Extraction

/requirements-elicitation:extract ./docs/requirements.pdf --domain "project-x"

Output:

Extracting from: requirements.pdf
Document type: Formal Specification
Mode: semi-auto

Processing... [================] 100%

Extraction Complete:
- Total candidates: 45
- Extracted: 38
- Needs review: 7

By Type:
- Functional: 24
- Non-Functional: 10
- Constraints: 4

Saved to: .requirements/project-x/documents/DOC-requirements-20251225.yaml

Review items flagged - run /requirements-elicitation:gaps for details

Multiple Documents with Glob

/requirements-elicitation:extract ./specs/*.md --mode full-auto

Output:

Found 5 documents matching pattern

Processing:
1. api-spec.md .......... 12 requirements
2. user-stories.md ...... 18 requirements
3. constraints.md ....... 5 requirements
4. nfr-spec.md .......... 8 requirements
5. assumptions.md ....... 3 requirements

Total: 46 requirements extracted
Duplicates detected: 4 (consolidated)
Final count: 42 unique requirements

Saved to: .requirements/specs/documents/

URL Extraction (Competitor Analysis)

/requirements-elicitation:extract https://competitor.com/features --type competitor --domain "competitive-analysis"

Output:

Fetching: https://competitor.com/features
Document type: Competitor Analysis

Extraction Complete:
- Features identified: 15
- Converted to requirements: 15
- Confidence: LOW (external observation)

All items flagged for validation.

Saved to: .requirements/competitive-analysis/documents/DOC-competitor-features.yaml

Next: Validate with stakeholders using /requirements-elicitation:interview

Autonomy Modes

Guided Mode

AI: "I found this potential requirement in Section 2.1:
     'The system shall support up to 1000 concurrent users'

     Should I extract this as a Performance requirement?"

User: "Yes"

AI: "Extracted as REQ-EXT-001 (Performance/Scalability).
     Next candidate..."

Semi-Autonomous Mode

AI: [Processes document section]

    "Completed Section 2. Extracted 8 requirements:
     - 5 Functional
     - 2 Performance
     - 1 Constraint

     2 items flagged for review. Continue to Section 3?"

Fully Autonomous Mode

AI: [Processes entire document]

    "Extraction complete.

     Summary:
     - 34 requirements extracted
     - 6 flagged for review
     - 3 potential duplicates detected

     Results saved. Ready for gap analysis."

Output Format

Saved YAML Structure

extraction_session:
  timestamp: "2025-12-25T14:30:00Z"
  mode: semi-auto
  domain: "{domain}"

sources:
  - file: "requirements.pdf"
    type: specification
    pages: 45
    processed: true

statistics:
  total_candidates: 52
  extracted: 45
  filtered: 7
  needs_review: 8
  duplicates: 3

requirements:
  - id: REQ-EXT-001
    text: "System shall authenticate users via SSO"
    source:
      file: "requirements.pdf"
      location: "Section 3.1, page 8"
    type: functional
    category: security
    confidence: high
    needs_review: false

review_items:
  - id: REQ-EXT-015
    reason: "Vague performance target"
    original: "System should be responsive"
    suggestion: "Define specific response time"

duplicates:
  - group: [REQ-EXT-003, REQ-EXT-022]
    recommended: REQ-EXT-003
    reason: "More specific statement"

Integration

Follow-Up Commands

# Check for gaps after extraction
/requirements-elicitation:gaps

# Analyze meeting transcripts
/requirements-elicitation:analyze-transcript ./meetings/kickoff.md

# Consolidate all sources
/requirements-elicitation:discover "{domain}" --sources documents

# Export to specification format
/requirements-elicitation:export --to canonical

Error Handling

File Not Found

Error: File not found: ./docs/missing.pdf
Suggestion: Check path and try again

Unsupported Format

Error: Unsupported file format: .xyz
Supported: .pdf, .md, .txt, .docx, URLs

URL Fetch Failed

Error: Could not fetch URL: https://example.com/page
Reason: 404 Not Found
Suggestion: Verify URL is accessible

extract

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

extract

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Extract Command

Usage

Arguments

Supported Sources

Workflow

Step 1: Source Resolution

Step 2: Load Document Extraction Skill

Step 3: Process Each Document

Step 4: Save Results

Step 5: Report Summary

Examples

Single PDF Extraction

Multiple Documents with Glob

URL Extraction (Competitor Analysis)

Autonomy Modes

Guided Mode

Semi-Autonomous Mode

Fully Autonomous Mode

Output Format

Saved YAML Structure

Integration

Follow-Up Commands

Error Handling

File Not Found

Unsupported Format

URL Fetch Failed

Similar Skills

Extract Command

Usage

Arguments

Supported Sources

Workflow

Step 1: Source Resolution

Step 2: Load Document Extraction Skill

Step 3: Process Each Document

Step 4: Save Results

Step 5: Report Summary

Examples

Single PDF Extraction

Multiple Documents with Glob

URL Extraction (Competitor Analysis)

Autonomy Modes

Guided Mode

Semi-Autonomous Mode

Fully Autonomous Mode

Output Format

Saved YAML Structure

Integration

Follow-Up Commands

Error Handling

File Not Found

Unsupported Format

URL Fetch Failed

Similar Skills