AI-Generated Text Detection Skill

Purpose

Detect patterns typical of LLM-generated text to ensure natural, human-authored academic writing. This skill helps maintain authenticity in research publications, dissertations, and documentation.

When to Use This Skill

Primary Use Cases

Pre-Commit Validation - Automatically check manuscripts and documentation before git commits
Manuscript Review - Validate academic writing before submission to journals or committees
Quality Assurance - Part of systematic QA workflow for research artifacts
On-Demand Analysis - Manual review of any text file for AI patterns
Writing Evolution Tracking - Monitor writing style changes over time

Specific Scenarios

Before submitting dissertation chapters to advisor
Prior to journal article submission
When reviewing team member contributions
After making significant edits to documentation
When suspicious patterns are noticed in writing
During peer review or committee review preparation

Detection Methodology

1. Grammar Pattern Analysis

What We Check:

Excessive Perfection: Zero typos, missing commas, or minor errors throughout
Comma Placement: Perfect comma usage in all complex sentences
Formal Register: Consistent formal tone with no informal elements
Grammar Consistency: No variations in grammatical choices

Red Flags:

Absolutely no grammatical errors in 10+ pages
Every semicolon and colon used perfectly
No sentence fragments or run-ons even in appropriate contexts
Overly formal language even in methods sections

Human Writing Typically Has:

Occasional minor typos or comma splices
Some inconsistency in formality
Natural variations in grammar choices
Context-appropriate informality

2. Sentence Structure Uniformity

What We Check:

Sentence Length Distribution: Variation in sentence lengths
Structural Patterns: Repetitive sentence structures
Complexity Variation: Mix of simple, compound, complex sentences
Opening Patterns: How sentences begin

Red Flags:

Most sentences 15-25 words (AI sweet spot)
Repetitive subject-verb-object patterns
Every paragraph starts with topic sentence
Excessive use of transition words at sentence starts
Predictable sentence complexity patterns

Human Writing Typically Has:

Wide variation (5-40+ word sentences)
Unpredictable sentence structures
Occasional fragments for emphasis
Natural flow without forced transitions

3. Paragraph Structure Analysis

What We Check:

Paragraph Length: Uniformity vs. natural variation
Structural Pattern: Topic sentence + support + conclusion pattern
Information Flow: Natural vs. algorithmic organization
Paragraph Transitions: Connection between paragraphs

Red Flags:

All paragraphs 4-6 sentences long
Every paragraph follows same structure
Mechanical transitions between paragraphs
Perfectly balanced paragraph lengths
No single-sentence paragraphs

Human Writing Typically Has:

Paragraph length variation (1-10+ sentences)
Structural diversity based on content
Natural transitions
Strategic use of short/long paragraphs for emphasis

4. Word Frequency Analysis (AI-Typical Words)

High-Risk AI Words (overused by LLMs):

Verbs:

"delve" (rarely used by humans)
"leverage" (business jargon)
"utilize" (instead of "use")
"facilitate" (overly formal)
"demonstrate" (overused)
"implement" (in non-technical contexts)
"enhance" (marketing language)

Adjectives:

"robust" (technical overuse)
"comprehensive" (vague intensifier)
"innovative" (buzzword)
"cutting-edge" (cliché)
"significant" (statistical overuse)
"substantial" (formal overuse)
"considerable" (formal overuse)
"crucial" (intensity overuse)

Transition Words (overused):

"furthermore" (very formal)
"moreover" (archaic feeling)
"additionally" (redundant)
"consequently" (overused)
"subsequently" (temporal overuse)
"nevertheless" (formal overuse)
"nonetheless" (synonym overuse)

Phrases:

"it is important to note that"
"it should be emphasized that"
"a comprehensive analysis of"
"in the context of"
"with respect to"
"in terms of"
"in order to" (instead of "to")

Detection Criteria:

Count frequency per 1000 words
Compare to human academic writing baselines
Flag if 3+ high-risk words per 1000 words
Weight by word rarity (delve = high weight)

5. Punctuation Patterns

What We Check:

Semicolon Usage: Frequency and correctness
Colon Usage: Perfect usage patterns
Em-Dash Usage: Consistent stylistic choices
Comma Patterns: Perfection vs. natural variation
Ellipsis/Exclamation: Absence in informal contexts

Red Flags:

Excessive semicolon use (2+ per paragraph)
Perfect colon usage throughout
Consistent em-dash formatting (—)
No missing commas anywhere
Zero informal punctuation

Human Writing Typically Has:

Inconsistent punctuation choices
Occasional missing/extra commas
Variable dash formatting (- vs -- vs —)
Some informal punctuation where appropriate

Confidence Scoring System

Scoring Formula

Overall Confidence = Weighted average of:

Grammar perfection: 20%
Sentence uniformity: 25%
Paragraph structure: 20%
AI-typical words: 25%
Punctuation patterns: 10%

Each metric scored 0-100, then combined with weights.

Confidence Levels

Low Confidence (0-30%): Likely Human Writing

Characteristics:

Natural sentence length variation (5-40+ words)
Occasional grammatical imperfections
Authentic voice and natural flow
Domain-specific terminology used naturally
Structural variety in paragraphs
Minimal AI-typical words (0-2 per 1000 words)

Action: ✅ Writing appears authentic, no changes needed

Medium Confidence (30-70%): Possible AI Assistance

Characteristics:

Some uniformity in sentence structure
Mix of AI-typical and natural patterns
May be human-edited AI output
Overly formal in places
Some transition word overuse
3-5 AI-typical words per 1000 words

Action: ⚠️ Review flagged sections, apply suggestions selectively

Examples of Mixed Writing:

AI-generated first draft with heavy human editing
Human writing that mimics academic formality excessively
Non-native English speakers using formal templates
Multiple authors with different styles

High Confidence (70-100%): Likely AI-Generated

Characteristics:

Excessive uniformity across all metrics
Multiple AI-typical word clusters
Perfect grammar and punctuation throughout
Artificial transition patterns
Mechanical paragraph structure
6+ AI-typical words per 1000 words

Action: 🚫 Significant revision needed, rewrite in authentic voice

Output Format

When running AI-check analysis, generate a comprehensive report:

1. Executive Summary

Overall Confidence Score: 65%
Status: MEDIUM - Possible AI assistance detected
Files Analyzed: 1
Total Words: 3,456
Recommendation: Review flagged sections

2. Metric Breakdown

Grammar Perfection:     85% (High - suspiciously few errors)
Sentence Uniformity:    72% (High - repetitive structures)
Paragraph Structure:    68% (Medium - some variation)
AI-Typical Words:       58% (Medium - 4.2 per 1000 words)
Punctuation Patterns:   45% (Low - natural variation)

3. Flagged Sections

Lines 45-67 (Confidence: 82%)
  Pattern: Excessive transition words + uniform sentences
  AI Words: "moreover", "furthermore", "leverage", "robust"
  
Lines 112-134 (Confidence: 76%)
  Pattern: Perfect grammar + mechanical structure
  AI Words: "delve", "comprehensive", "facilitate"

4. Specific Issues Detected

High-Risk AI Words Found (per 1000 words):
  • "delve" (2 occurrences) - RARELY used by humans
  • "leverage" (3 occurrences) - Business jargon overuse
  • "robust" (4 occurrences) - Technical overuse
  • "furthermore" (6 occurrences) - Formal transition overuse

Sentence Uniformity Issues:
  • 67% of sentences are 15-25 words (AI sweet spot)
  • 82% of paragraphs start with transition words
  • Low variation in sentence complexity

Paragraph Structure Issues:
  • All paragraphs 4-6 sentences long
  • Mechanical topic-sentence pattern throughout

5. Word Frequency Report

Top AI-Typical Words:
1. "furthermore" - 6x (baseline: 0.5x per 1000 words)
2. "robust" - 4x (baseline: 0.8x per 1000 words)
3. "leverage" - 3x (baseline: 0.3x per 1000 words)
4. "comprehensive" - 3x (baseline: 1.2x per 1000 words)
5. "delve" - 2x (baseline: 0.1x per 1000 words)

Comparison to Human Academic Writing:
  Your text: 4.2 AI-typical words per 1000
  Human baseline: 1.5 AI-typical words per 1000
  Ratio: 2.8x higher than human baseline

Improvement Suggestions

For High Confidence (70-100%) Detections

Sentence Structure:

❌ "Furthermore, the results demonstrate a comprehensive analysis of the robust dataset."
✅ "The results show our analysis covered the full dataset."

Why Better: Simpler words, no transition word, more direct

Word Choice:

❌ "This study delves into the utilization of innovative methodologies."
✅ "We examine how researchers use new methods."

Why Better: Active voice, common words, clearer meaning

Paragraph Variation:

❌ All paragraphs 5 sentences, topic sentence + 3 support + conclusion
✅ Mix paragraph lengths: 2, 7, 4, 3, 6 sentences based on content needs

Why Better: Natural flow based on content, not formula

Specific Suggestion Categories

1. Vary Sentence Lengths

Current: 15-25 word sentences consistently
Suggestion: Mix short (5-10), medium (15-20), long (25-35) sentences
Example:
  - Short: "The effect was significant."
  - Medium: "We observed a 23% increase across all conditions."
  - Long: "This finding aligns with previous work showing that..."

2. Replace AI-Typical Words

Replace → With
- "delve into" → "examine", "explore", "investigate"
- "leverage" → "use", "apply", "employ"
- "utilize" → "use"
- "robust" → "strong", "reliable", "thorough"
- "facilitate" → "enable", "help", "allow"
- "furthermore" → "also", "next", [or remove]
- "moreover" → "additionally", "also", [or use dash]
- "comprehensive" → "complete", "thorough", "full"

3. Add Natural Imperfections (Where Appropriate)

- Use contractions in appropriate contexts ("it's", "we'll")
- Include domain-specific jargon naturally
- Allow informal phrasing in methods/procedures
- Use occasional sentence fragments for emphasis
- Add personal observations or interpretations
- Include field-specific colloquialisms

4. Break Paragraph Uniformity

Current: All paragraphs follow topic-support-support-support-conclusion
Suggestion: Vary based on content
  - Use single-sentence paragraphs for emphasis
  - Combine related ideas into longer paragraphs
  - Don't force every paragraph to have 5 sentences
  - Let content determine structure, not formula

5. Remove Mechanical Transitions

❌ "Furthermore, the results show... Moreover, the analysis reveals..."
✅ "The results show... The analysis also reveals..." [simpler transitions]
✅ "The results show... Looking closer, the analysis..." [natural bridges]

Integration Points

1. Pre-Commit Hook Integration

Automatic checking before git commits

# Configured in .claude/settings.json
"gitPreCommit": {
  "command": "python3 hooks/pre-commit-ai-check.py",
  "enabled": true
}

Behavior:

Runs on staged .md, .tex, .rst files
Warns if confidence 30-70%
Blocks commit if confidence >70%
User can override with git commit --no-verify

Exit Codes:

0: Pass (confidence <30%)
1: Warning (confidence 30-70%, commit allowed)
2: Block (confidence >70%, commit blocked)

2. Quality Assurance Integration

Part of comprehensive QA workflow

Integrated into code/quality_assurance/qa_manager.py:

Runs during manuscript phase QA validation
Checks all deliverable documents
Generates detailed QA report section
Fails QA if confidence >40%

Configuration (.ai-check-config.yaml):

qa_integration:
  enabled: true
  max_confidence_threshold: 0.40
  check_manuscripts: true
  check_documentation: true
  generate_detailed_reports: true

3. Manuscript Writer Agent Integration

Real-time feedback during writing

Agent checks writing incrementally:

After drafting each section
Before moving to next phase
Applies suggestions automatically
Re-checks until confidence <30%

Agent Workflow:

Draft section
Run ai-check skill
Review detection results
Apply improvement suggestions
Re-check until authentic
Proceed to next section

4. Standalone Skill Usage

Manual invocation by user or agents

User Invocation:

Please run ai-check on docs/manuscript/discussion.tex and provide detailed feedback.

Agent Invocation:

I'll use the ai-check skill to verify this text before proceeding.

CLI Tool:

python tools/ai_check.py path/to/file.md
python tools/ai_check.py --directory docs/
python tools/ai_check.py --format html --output report.html

Tracking System

Historical Tracking

Log all AI-check runs to database for evolution tracking:

Database Schema (PostgreSQL via research-database MCP):

CREATE TABLE ai_check_history (
  id SERIAL PRIMARY KEY,
  file_path TEXT NOT NULL,
  git_commit TEXT,
  timestamp TIMESTAMP DEFAULT NOW(),
  overall_confidence FLOAT,
  grammar_score FLOAT,
  sentence_score FLOAT,
  paragraph_score FLOAT,
  word_score FLOAT,
  punctuation_score FLOAT,
  ai_words_found JSONB,
  flagged_sections JSONB
);

Trend Analysis

Track writing evolution:

File: docs/manuscript/discussion.tex

Version History:
2025-01-15: 78% confidence (HIGH - likely AI)
2025-01-18: 52% confidence (MEDIUM - revision 1)
2025-01-20: 34% confidence (LOW-MEDIUM - revision 2)
2025-01-22: 18% confidence (LOW - authentic writing)

Trend: ✅ Improving toward authentic writing

Use Cases:

Monitor dissertation chapters over time
Track improvements after applying suggestions
Demonstrate writing authenticity to committee
Identify sections needing more work

Configuration

Configuration File: `.ai-check-config.yaml`

# AI-Check Skill Configuration

# Pre-Commit Hook Settings
pre_commit:
  enabled: true
  check_files: [".md", ".tex", ".rst", ".txt"]
  check_docstrings: true  # Check Python docstrings
  block_threshold: 0.70   # Block commit if >= 70%
  warn_threshold: 0.30    # Warn if >= 30%
  exclude_patterns:
    - "*/examples/*"
    - "*/tests/*"
    - "*/node_modules/*"
    - "*/.venv/*"

# Quality Assurance Integration
qa_integration:
  enabled: true
  max_confidence_threshold: 0.40  # Fail QA if >= 40%
  check_manuscripts: true
  check_documentation: true
  generate_detailed_reports: true
  track_history: true

# Detection Parameters
detection:
  # Weight each metric (must sum to 1.0)
  weights:
    grammar_perfection: 0.20
    sentence_uniformity: 0.25
    paragraph_structure: 0.20
    ai_word_frequency: 0.25
    punctuation_patterns: 0.10
  
  # AI-typical word lists
  ai_words:
    high_risk: ["delve", "leverage", "utilize"]
    medium_risk: ["robust", "comprehensive", "facilitate"]
    transitions: ["furthermore", "moreover", "additionally"]
  
  # Thresholds
  ai_words_per_1000_threshold: 3.0
  human_baseline_per_1000: 1.5
  
# Report Generation
reporting:
  default_format: "markdown"  # markdown, json, html
  include_suggestions: true
  include_word_frequency: true
  include_flagged_sections: true
  max_flagged_sections: 10

# Tracking
tracking:
  enabled: true
  database: "research-database-mcp"
  retention_days: 365

Per-Project Overrides

Create .ai-check.local.yaml for project-specific settings:

# Project-specific overrides
pre_commit:
  block_threshold: 0.60  # More lenient for early drafts
  
detection:
  ai_words:
    high_risk: ["delve"]  # Only flag worst offenders

Examples

Example 1: High Confidence Detection

Input Text:

Furthermore, this comprehensive study delves into the robust 
methodologies utilized to facilitate the implementation of innovative 
approaches. Moreover, the analysis demonstrates significant findings 
that leverage state-of-the-art techniques. Subsequently, the results 
indicate substantial improvements across all metrics. Nevertheless, 
additional research is crucial to fully comprehend the implications.

AI-Check Report:

Overall Confidence: 89% (HIGH - Likely AI-generated)

Issues Detected:
- 8 AI-typical words in 60 words (13.3 per 1000 words!)
- Every sentence starts with transition word
- Uniform sentence length (15-18 words each)
- Perfect grammar, zero natural imperfections
- Mechanical paragraph structure

AI Words Found:
- furthermore, comprehensive, delves, robust
- utilized, facilitate, innovative, leverage
- demonstrates, significant, subsequently, substantial
- nevertheless, crucial, comprehend

Recommendation: Complete rewrite recommended

Suggested Revision:

We examined the methods used in this approach. The analysis shows 
clear improvements across metrics. However, more research is needed 
to understand the full implications.

(23 words, 12% confidence - much more natural)

Example 2: Medium Confidence Detection

Input Text:

The experimental design followed standard protocols established in 
previous work (Smith et al., 2023). We collected data from 150 
participants over six months. Statistical analysis used mixed-effects 
models to account for repeated measures. The results showed a 
significant main effect of condition (p < 0.001).

AI-Check Report:

Overall Confidence: 35% (MEDIUM - Possible minor AI assistance)

Issues Detected:
- Slightly uniform sentence length (11-15 words)
- One AI-typical word: "significant" (statistical context acceptable)
- Otherwise natural academic writing

Recommendation: Minor revisions optional, writing appears largely authentic

Example 3: Low Confidence (Human Writing)

Input Text:

OK so here's what we found. The effect was huge - way bigger than 
expected. Participants in the experimental group scored 23% higher 
on average. This wasn't just statistically significant; it was 
practically meaningful.

We're still not sure why. Maybe it's the timing? Could be the 
instructions were clearer. Need to run follow-ups.

AI-Check Report:

Overall Confidence: 8% (LOW - Clearly human writing)

Human Writing Indicators:
- Natural sentence variation (4-19 words)
- Informal elements ("OK so", "way bigger")
- Incomplete thoughts and questions
- Natural uncertainty expressions
- Zero AI-typical words
- Authentic voice throughout

Recommendation: Writing is authentic, no changes needed

Best Practices

For PhD Students

Run Before Advisor Meetings
- Check chapters before sending to advisor
- Ensure authenticity before committee review
- Track improvements over time
Use During Drafting
- Check each section after writing
- Apply suggestions immediately
- Develop natural writing habits
Pre-Submission Validation
- Run on complete manuscripts before journal submission
- Check supplementary materials
- Verify all documentation

For Research Teams

Establish Team Standards
- Set agreed-upon confidence thresholds
- Define when to block vs. warn
- Create team-specific word lists
Code Review Integration
- Check documentation in pull requests
- Validate README files and guides
- Ensure authentic technical writing
Track Team Writing
- Monitor trends across team members
- Identify systematic issues
- Share improvement strategies

For Journal Submission

Pre-Submission Checklist
- Overall confidence <30%
- No flagged high-risk sections
- AI-typical words <2 per 1000 words
- Natural sentence variation present
- Authentic academic voice throughout
Demonstrating Authenticity
- Include AI-check reports in submission materials
- Show writing evolution over time
- Document revision process

Limitations

What This Skill Cannot Do

Not 100% Accurate
- LLMs constantly improving
- Patterns evolve over time
- False positives possible (very formal human writing)
- False negatives possible (heavily edited AI text)
Cannot Detect All AI Usage
- Well-edited AI text may pass
- Human writing in AI style may be flagged
- Paraphrasing tools may evade detection
- Future models may have different patterns
Domain Limitations
- Trained primarily on academic writing
- May not work well for creative writing
- Technical jargon may affect scores
- Non-English text not supported

Use Alongside Human Judgment

This skill is a tool, not a replacement for human judgment:

Use confidence scores as guidance, not absolute truth
Consider context and field-specific norms
Combine with plagiarism detection tools
Maintain academic integrity standards
Update word lists as AI patterns evolve

Support

Troubleshooting

Problem: False positive on authentic writing Solution: Check if writing is overly formal. Consider field-specific norms. Adjust thresholds in config.

Problem: AI text passing with low confidence Solution: Update AI-typical word lists. Check for heavily edited text. Report patterns for skill updates.

Problem: Pre-commit hook too slow Solution: Reduce checked file types. Enable caching. Check only modified sections.

Problem: Disagreement with manual review Solution: Generate detailed report. Review flagged sections specifically. Consider multiple metrics not just overall score.

Getting Help

Documentation: See docs/skills/ai-check-reference.md
Issues: https://github.com/astoreyai/ai_scientist/issues
Updates: Check for skill updates regularly as AI patterns evolve

Last Updated: 2025-11-09 Version: 1.0.0 License: MIT

ai-check

AI-Generated Text Detection Skill

Purpose

When to Use This Skill

Primary Use Cases

Specific Scenarios

Detection Methodology

1. Grammar Pattern Analysis

2. Sentence Structure Uniformity

3. Paragraph Structure Analysis

4. Word Frequency Analysis (AI-Typical Words)

5. Punctuation Patterns

Confidence Scoring System

Scoring Formula

Confidence Levels

Low Confidence (0-30%): Likely Human Writing

Medium Confidence (30-70%): Possible AI Assistance

High Confidence (70-100%): Likely AI-Generated

Output Format

1. Executive Summary

2. Metric Breakdown

3. Flagged Sections

4. Specific Issues Detected

5. Word Frequency Report

Improvement Suggestions

For High Confidence (70-100%) Detections

Specific Suggestion Categories

1. Vary Sentence Lengths

2. Replace AI-Typical Words

3. Add Natural Imperfections (Where Appropriate)

4. Break Paragraph Uniformity

5. Remove Mechanical Transitions

Integration Points

1. Pre-Commit Hook Integration

2. Quality Assurance Integration

3. Manuscript Writer Agent Integration

4. Standalone Skill Usage

Tracking System

Historical Tracking

Trend Analysis

Configuration

Configuration File: .ai-check-config.yaml

Per-Project Overrides

Examples

Example 1: High Confidence Detection

Example 2: Medium Confidence Detection

Example 3: Low Confidence (Human Writing)

Best Practices

For PhD Students

For Research Teams

For Journal Submission

Limitations

What This Skill Cannot Do

Use Alongside Human Judgment

Support

Troubleshooting

Getting Help

Similar Skills

Configuration File: `.ai-check-config.yaml`