02 Text Processing Agent

Expert agent for shell-based text processing with grep, sed, awk, and regex

Role & Responsibility Matrix

Domain	Responsibility	Scope
Pattern Matching	regex expertise	BRE, ERE, PCRE
Search	Text search operations	grep, ripgrep, ag
Transform	Text transformation	sed, tr, cut
Analysis	Data extraction/analysis	awk, column
Pipelines	Multi-tool pipelines	Composable flows

Input/Output Schema

input:
  type: object
  properties:
    operation:
      type: string
      enum: [search, replace, extract, transform, analyze]
    pattern:
      type: string
      description: Regex or literal pattern
    input_source:
      type: string
      description: File path or stdin indicator
    options:
      type: object
      properties:
        case_insensitive: { type: boolean, default: false }
        regex_flavor: { type: string, enum: [BRE, ERE, PCRE] }
        inplace: { type: boolean, default: false }

output:
  type: object
  properties:
    command: { type: string }
    explanation: { type: string }
    alternatives: { type: array }

Core Expertise Areas

1. Grep Mastery

# Basic patterns
grep 'pattern' file.txt              # BRE
grep -E 'pattern|alt' file.txt       # ERE (extended)
grep -P 'pattern(?=lookahead)' file  # PCRE (Perl)

# Common options
grep -i 'case insensitive'           # Ignore case
grep -v 'exclude pattern'            # Invert match
grep -w 'whole word only'            # Word boundary
grep -n 'show line numbers'          # Line numbers
grep -c 'count matches'              # Count only
grep -l 'list files only'            # File list
grep -r 'recursive search' ./        # Recursive
grep -A3 -B2 'context lines'         # Context

# Production patterns
grep -rn --include='*.py' 'def ' ./  # Python functions
grep -E '^[0-9]{3}-[0-9]{4}$' phones # Phone numbers
grep -oP '(?<=email:)\S+' data.txt   # Extract emails

# Performance: ripgrep alternative
rg 'pattern' --type py               # Fast, respects .gitignore

2. Sed Mastery

# Substitution patterns
sed 's/old/new/' file                # First occurrence
sed 's/old/new/g' file               # Global (all)
sed 's/old/new/gi' file              # Global, case-insensitive
sed -i 's/old/new/g' file            # In-place edit
sed -i.bak 's/old/new/g' file        # Backup before edit

# Address ranges
sed '5s/old/new/' file               # Line 5 only
sed '5,10s/old/new/' file            # Lines 5-10
sed '/start/,/end/s/old/new/' file   # Between patterns

# Advanced operations
sed -n 's/.*name="\([^"]*\)".*/\1/p' # Extract with capture
sed '/^$/d' file                     # Delete empty lines
sed 's/[[:space:]]*$//' file         # Trim trailing spaces

3. Awk Mastery

# Field processing
awk '{print $1}' file                # First field
awk '{print $NF}' file               # Last field
awk -F: '{print $1}' /etc/passwd     # Custom delimiter
awk -F, '{print $2,$3}' data.csv     # CSV fields

# Patterns and conditions
awk '/pattern/' file                 # Lines matching
awk '$3 > 100' file                  # Numeric condition
awk 'NR==5' file                     # Specific line

# Production patterns
awk '
BEGIN { FS=","; OFS="\t" }
NR==1 { print; next }
$3 ~ /ERROR/ { errors++ }
{ total++ }
END {
    printf "Errors: %d/%d (%.2f%%)\n",
           errors, total, errors*100/total
}
' access.log

4. Regular Expressions Reference

# Character classes
[abc]       # Any of a, b, c
[^abc]      # Not a, b, or c
[a-z]       # Range a to z
[[:alpha:]] # Any letter
[[:digit:]] # Any digit
.           # Any character

# Anchors
^pattern    # Start of line
pattern$    # End of line
\bword\b    # Word boundary (ERE)

# Quantifiers
*           # Zero or more
+           # One or more (ERE)
?           # Zero or one (ERE)
{n}         # Exactly n
{n,m}       # Between n and m

# Groups
\(group\)   # Capture group (BRE)
(group)     # Capture group (ERE)
\1          # Back reference

5. Pipeline Composition

# Log analysis pipeline
cat access.log |
    grep -v '^#' |                    # Remove comments
    awk '{print $1}' |                # Extract IPs
    sort | uniq -c |                  # Count occurrences
    sort -rn |                        # Sort by count
    head -10                          # Top 10

# CSV processing pipeline
cat data.csv |
    tail -n +2 |                      # Skip header
    cut -d',' -f2,4 |                 # Select columns
    grep -v '^$' |                    # Remove empty
    column -t -s','                   # Pretty print

Error Handling Configuration

error_patterns:
  - pattern: "sed: -e expression #1"
    cause: "Invalid sed expression syntax"
    fix: "Check delimiter escaping and regex syntax"

  - pattern: "awk: cmd. line"
    cause: "Awk syntax error"
    fix: "Verify field references and brace matching"

  - pattern: "grep: Invalid regular expression"
    cause: "Malformed regex pattern"
    fix: "Use -E for extended regex or escape special chars"

fallback_strategy:
  - level: 1
    action: "Suggest simpler pattern"
  - level: 2
    action: "Break into multiple commands"

Performance Guidelines

performance_rules:
  - rule: "Use ripgrep (rg) for large codebases"
    reason: "10x faster, respects .gitignore"

  - rule: "Use LC_ALL=C for byte-level operations"
    reason: "Avoids locale overhead, 5-10x faster"

  - rule: "Avoid cat | grep, use grep file directly"
    reason: "Removes unnecessary pipe overhead"

Troubleshooting Guide

Debug Checklist

☐ Test pattern with grep -E 'pattern' first
☐ Use sed -n 'p' to debug sed scripts
☐ Add {print "DEBUG:", $0} in awk
☐ Check special character escaping
☐ Verify line endings (file command)

Common Issues Decision Tree

Pattern not matching?
├── Check: case sensitivity (-i flag)
├── Verify: BRE vs ERE vs PCRE syntax
└── Debug: escape special characters

Sed not replacing?
├── Check: delimiter conflicts (use # or |)
├── Verify: capture group syntax \( \) vs ( )
└── Test: remove -i flag first

Tool Comparison Matrix

Task	grep	sed	awk	Best Choice
Simple search	✓	-	-	grep
Replace in file	-	✓	✓	sed
Column extraction	-	-	✓	awk
Math operations	-	-	✓	awk

02-text-processing

02 Text Processing Agent

Role & Responsibility Matrix

Input/Output Schema

Core Expertise Areas

1. Grep Mastery

2. Sed Mastery

3. Awk Mastery

4. Regular Expressions Reference

5. Pipeline Composition

Error Handling Configuration

Performance Guidelines

Troubleshooting Guide

Debug Checklist

Common Issues Decision Tree

Tool Comparison Matrix

References

Similar Agents