From nickcrew-claude-ctx-plugin
Builds, explains, and debugs regular expressions for pattern matching, validating inputs like emails/URLs/phones, and extracting data from logs/documents/text.
npx claudepluginhub nickcrew/claude-cortexThis skill uses the workspace's default tool permissions.
The Regex Master skill covers building correct, readable, and efficient regular expressions for common text processing tasks. It explains character classes, quantifiers, groups, alternation, anchors, and lookahead/lookbehind assertions. It includes a quick-reference library of battle-tested patterns for emails, URLs, dates, phone numbers, IP addresses, and log lines. Every regex is explained to...
Writes, debugs, and explains regex patterns emphasizing readability, performance optimization, edge cases, testing strategies, and regex alternatives. Activates on mentions of regex, regular expressions, pattern matching, text parsing, extraction, or format validation.
DEPRECATED: Generates, explains, and tests regex patterns from positive/negative examples with Python/JavaScript code. Retained for reference only.
Tests and debugs regular expressions with match highlighting, capture group extraction, and common pattern library. Useful for pattern matching help via /regex command or auto-activation.
Share bugs, ideas, or general feedback.
The Regex Master skill covers building correct, readable, and efficient regular expressions for common text processing tasks. It explains character classes, quantifiers, groups, alternation, anchors, and lookahead/lookbehind assertions. It includes a quick-reference library of battle-tested patterns for emails, URLs, dates, phone numbers, IP addresses, and log lines. Every regex is explained token by token so you understand what it does rather than just copying it.
grep, sed, awk, Python, or JavaScript| Token | Meaning | Example | Matches |
|---|---|---|---|
. | Any character except newline | a.c | abc, a1c, a-c |
\d | Digit [0-9] | \d+ | 42, 007 |
\w | Word char [a-zA-Z0-9_] | \w+ | hello, foo_2 |
\s | Whitespace | \s+ | space, tab, newline |
\D | Non-digit | \D | a, -, |
\W | Non-word char | \W | !, @, |
\S | Non-whitespace | \S+ | hello, 42 |
^ | Start of string/line | ^Hello | Hello world |
$ | End of string/line | world$ | Hello world |
\b | Word boundary | \bcat\b | the cat sat (not catch) |
[abc] | Character class | [aeiou] | any vowel |
[^abc] | Negated class | [^0-9] | any non-digit |
[a-z] | Range | [a-zA-Z] | any letter |
| Quantifier | Meaning | Greedy? |
|---|---|---|
* | 0 or more | Yes |
+ | 1 or more | Yes |
? | 0 or 1 (optional) | Yes |
{n} | Exactly n | Yes |
{n,m} | Between n and m | Yes |
{n,} | n or more | Yes |
*? | 0 or more | No (lazy) |
+? | 1 or more | No (lazy) |
| Syntax | Meaning |
|---|---|
(abc) | Capturing group |
(?:abc) | Non-capturing group |
(?P<name>abc) | Named capturing group (Python) |
(?<name>abc) | Named capturing group (JS/PCRE) |
\1, $1 | Back-reference to group 1 |
a|b | Alternation: a or b |
| Syntax | Meaning |
|---|---|
(?=abc) | Positive lookahead: followed by abc |
(?!abc) | Negative lookahead: NOT followed by abc |
(?<=abc) | Positive lookbehind: preceded by abc |
(?<!abc) | Negative lookbehind: NOT preceded by abc |
| Flag | Effect |
|---|---|
i / re.IGNORECASE | Case-insensitive |
m / re.MULTILINE | ^/$ match line boundaries |
s / re.DOTALL | . matches newlines |
g | Global — find all matches (JS) |
x / re.VERBOSE | Allow whitespace/comments in pattern |
| Pattern | Regex |
|---|---|
| Email (simple) | ^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$ |
| URL | https?://[\w.-]+(?:/[\w./?=#&%-]*)? |
| IPv4 | \b(?:\d{1,3}\.){3}\d{1,3}\b |
| Date (YYYY-MM-DD) | \d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01]) |
| Time (HH:MM:SS) | (?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d |
| US Phone | \+?1?\s?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4} |
| ZIP code (US) | \d{5}(?:-\d{4})? |
| Credit card | \b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b |
| Hex color | #(?:[0-9a-fA-F]{3}|[0-9a-fA-F]{6})\b |
| UUID | [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} |
| Semantic version | \d+\.\d+\.\d+(?:-[\w.]+)?(?:\+[\w.]+)? |
| HTML tag (simple) | <([a-z][a-z0-9]*)\b[^>]*>(.*?)</\1> |
| Slug | ^[a-z0-9]+(?:-[a-z0-9]+)*$ |
Define what to match precisely in plain English first
Build the regex incrementally
Anchor appropriately
^...$ for full-string validation (email, phone, slug).\b for word-boundary matching to avoid partial matches.Choose capturing vs. non-capturing groups
(?:...) for grouping without capturing when you don't need the group's value.(?P<year>\d{4}) for readability and reliable extraction.Handle edge cases explicitly
(?:\.com)? — the ? makes the group optional.re.UNICODE flag or \p{L} (PCRE2) for non-ASCII letters.s / DOTALL flag if . should match newlines.Test thoroughly
Document the regex
re.VERBOSE in Python) to add inline comments for complex patterns.Input: "Write a regex to validate email addresses."
Output:
import re
# Email validation regex
# Allows: local part with letters, digits, dots, plus, hyphen
# Requires: @ symbol
# Domain: letters, digits, hyphens
# TLD: 2+ letters
EMAIL_REGEX = re.compile(
r'^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$',
re.IGNORECASE
)
# Verbose version explaining each token:
EMAIL_REGEX_VERBOSE = re.compile(r'''
^ # Start of string
[\w.+-]+ # Local part: word chars, dots, plus, hyphen (1 or more)
@ # Literal @ symbol
[\w-]+ # Domain name: word chars and hyphens (1 or more)
\. # Literal dot (escaped)
[a-zA-Z]{2,} # TLD: 2 or more letters
$ # End of string
''', re.VERBOSE | re.IGNORECASE)
def validate_email(email: str) -> bool:
return bool(EMAIL_REGEX.match(email))
# Tests
assert validate_email("alice@example.com") == True
assert validate_email("alice.smith+tag@sub.domain.org") == True
assert validate_email("alice@example.c") == False # TLD too short
assert validate_email("@example.com") == False # Missing local part
assert validate_email("alice@.com") == False # Missing domain
assert validate_email("alice example.com") == False # Missing @
What it matches / rejects:
✅ alice@example.com
✅ alice.smith+filter@sub.domain.co.uk
✅ user123@my-company.io
❌ alice@example.c (TLD must be 2+ chars)
❌ alice @example.com (spaces not allowed)
❌ alice@@example.com (double @)
❌ @example.com (empty local part)
Note: RFC 5321 email validation is extremely complex. This regex covers ~99% of real-world valid addresses. For production, combine with a verification email.
Input: Parse Apache Combined Log Format lines:
192.168.1.1 - frank [10/Oct/2024:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326 "http://example.com/" "Mozilla/5.0"
Output:
import re
from datetime import datetime
LOG_PATTERN = re.compile(r'''
^
(?P<ip>\d{1,3}(?:\.\d{1,3}){3}) # Client IP address
\s-\s # Separator " - "
(?P<user>\S+) # Remote user (- if none)
\s
\[(?P<timestamp>[^\]]+)\] # Timestamp in brackets
\s
"(?P<method>[A-Z]+) # HTTP method
\s(?P<path>[^\s"]+) # Request path
\s(?P<protocol>HTTP/[\d.]+)" # Protocol
\s
(?P<status>\d{3}) # HTTP status code
\s
(?P<size>\d+|-) # Response size (or - if none)
(?:\s"(?P<referer>[^"]*)")? # Optional referer
(?:\s"(?P<user_agent>[^"]*)")? # Optional user agent
$
''', re.VERBOSE)
def parse_log_line(line: str) -> dict | None:
match = LOG_PATTERN.match(line.strip())
if not match:
return None
data = match.groupdict()
data['status'] = int(data['status'])
data['size'] = int(data['size']) if data['size'] != '-' else 0
return data
# Test
line = '192.168.1.1 - frank [10/Oct/2024:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326 "http://example.com/" "Mozilla/5.0"'
result = parse_log_line(line)
print(result)
# {
# 'ip': '192.168.1.1', 'user': 'frank',
# 'timestamp': '10/Oct/2024:13:55:36 -0700',
# 'method': 'GET', 'path': '/index.html',
# 'protocol': 'HTTP/1.1', 'status': 200,
# 'size': 2326, 'referer': 'http://example.com/',
# 'user_agent': 'Mozilla/5.0'
# }
JavaScript equivalent for the same pattern:
const LOG_PATTERN = /^(\d{1,3}(?:\.\d{1,3}){3}) - (\S+) \[([^\]]+)\] "([A-Z]+) ([^\s"]+) (HTTP\/[\d.]+)" (\d{3}) (\d+|-)(?: "([^"]*)")?(?: "([^"]*)")?$/;
function parseLogLine(line) {
const m = line.match(LOG_PATTERN);
if (!m) return null;
const [, ip, user, timestamp, method, path, protocol, status, size, referer, userAgent] = m;
return { ip, user, timestamp, method, path, protocol, status: +status, size: size === '-' ? 0 : +size, referer, userAgent };
}
[a-z]) over . + filter to avoid surprise matches+?, *?) when you need the shortest match, not the longestre.compile()) in performance-critical code.* (greedy) when you want the shortest match — use .*? (lazy). — unescaped . matches any character^/$ when multiline mode is needed (they match string boundaries by default, not line boundaries)(a+)+ on long non-matching input\d+ matches the digits in abc123def() when non-capturing (?:) is sufficient — wastes memoryre.fullmatch() in Python is safer than re.match() for validation — no partial matches/(?<year>\d{4})-(?<month>\d{2})/ — access via match.groups.yearsed -E or grep -P for extended/PCRE regex on the command line(?>...) (PCRE) prevent backtracking — useful for catastrophic backtracking prevention