Derive voice profiles from existing writing samples by analyzing patterns, sentence structure, and tone.
From prosenpx claudepluginhub rhuss/cc-prose --plugin proseThis skill is limited to using the following tools:
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Analyzes BMad project state from catalog CSV, configs, artifacts, and query to recommend next skills or answer questions. Useful for help requests, 'what next', or starting BMad.
You are a specialist in analyzing writing samples to extract voice profiles that capture an author's authentic writing style.
Analyze existing content (Markdown, AsciiDoc, plain text, or PDFs) to:
This is the inverse of voice-architect: instead of creating voice profiles interactively, you extract them from real writing samples.
Trigger conditions (invoke if ANY match):
Do NOT activate for:
Accept these input types:
| Input Type | Example | Handling |
|---|---|---|
| Single file | docs/intro.md | Read directly |
| Glob pattern | "docs/**/*.md" | Expand and read all matches |
| Directory | docs/ | Find all .md, .adoc, .txt files recursively |
| PDF file | document.pdf | Read PDF content |
| Multiple files | Space-separated paths | Read each file |
Minimum corpus requirement: At least 500 words for reliable extraction. Warn if corpus is smaller.
When processing multiple files, the extractor uses one of two modes:
| Mode | When Used | Behavior |
|---|---|---|
| Single-pass | 1 file OR total < 1000 words | Process all content as unified corpus |
| Incremental | Multiple files with >= 1000 words | Representative baseline + file-by-file processing |
When entering incremental mode, first select 1-3 representative files to establish a baseline profile.
Each file receives a composite score (0.0-1.0) based on:
| Factor | Weight | Scoring |
|---|---|---|
| Word count | 40% | Longer files score higher (normalized to corpus max) |
| Recency | 20% | Recently modified files score higher |
| Format quality | 20% | Clean prose scores higher than code-heavy/table-heavy |
| Relevance | 20% | Main content scores higher than README/CHANGELOG |
| Content Type | Score | Detection |
|---|---|---|
| Clean prose | 1.0 | <10% code blocks, <5% tables |
| Mixed content | 0.6 | 10-30% code blocks OR 5-15% tables |
| Code-heavy | 0.3 | >30% code blocks |
| Table-heavy | 0.3 | >15% tables |
| Mostly non-prose | 0.1 | >50% non-prose elements |
| File Pattern | Score | Examples |
|---|---|---|
| Main content | 1.0 | docs/*.md, guide.adoc, chapter-*.md |
| Supporting | 0.7 | getting-started.md, faq.md |
| Meta | 0.4 | README.md, CONTRIBUTING.md |
| Changelog | 0.2 | CHANGELOG.md, HISTORY.md, RELEASE-NOTES.md |
if file_count < 10:
select top 1 file
elif file_count <= 30:
select top 2 files
else:
select top 3 files
# Ensure minimum baseline quality
if combined_word_count < 1000:
add next highest-scoring files until >= 1000 words
## Representative Sample Selection
Analyzed [N] files, scoring by word count, recency, format, and relevance.
**Selected baseline files:**
| Rank | File | Words | Score | Rationale |
|------|------|-------|-------|-----------|
| 1 | docs/architecture-guide.md | 1,847 | 0.92 | Long, recent, clean prose |
| 2 | docs/getting-started.md | 1,234 | 0.87 | Good length, tutorial content |
**Baseline corpus:** 3,081 words from 2 files
---
Extracting baseline voice profile...
After establishing a baseline, process remaining files one by one with weighted averaging.
Fields: formality, personality, avg_length_target, you_percentage, we_percentage
updated_value = (current_value × total_words + new_value × new_words) / (total_words + new_words)
Example:
Fields: first_person, contractions, mix_short, rhetorical_questions, provide_context, include_examples, explain_reasoning, opinions, acknowledge_complexity, personal_experience
# Track weighted votes
true_weight += new_words if new_value == true
false_weight += new_words if new_value == false
# Current value = majority by word count
current_value = true_weight > false_weight
# Confidence = strength of majority
confidence = max(true_weight, false_weight) / (true_weight + false_weight)
Display confidence when relevant:
Fields: audience, variation, depth, humor
# Maintain weighted frequency map
category_weights[category] += new_words
# Current value = category with highest weight
current_value = max(category_weights, key=category_weights.get)
# Aggregate occurrence counts across files
phrase_counts[phrase] += occurrences_in_file
# Re-rank top 5 after each merge
signature_phrases = sorted(phrase_counts, key=phrase_counts.get, reverse=True)[:5]
For each remaining file (in score order, descending):
Detect files that don't match the emerging voice profile to prevent contamination.
| Parameter | Threshold | Severity | Description |
|---|---|---|---|
| Formality | > 0.3 | HIGH | Very different register |
| Personality | > 0.3 | HIGH | Very different engagement level |
| Audience (level distance) | >= 2 | HIGH | beginner↔expert gap |
| Sentence length | > 6 words | MEDIUM | Very different rhythm |
| Boolean contradiction | confident opposite | MEDIUM | Strong disagreement on style |
| From/To | beginner | intermediate | expert |
|---|---|---|---|
| beginner | 0 | 1 | 2 |
| intermediate | 1 | 0 | 1 |
| expert | 2 | 1 | 0 |
outlier_score = 0.0
# Numeric deviations (scaled to threshold)
if abs(file_formality - profile_formality) > 0.3:
outlier_score += 0.3
elif abs(file_formality - profile_formality) > 0.2:
outlier_score += 0.15
if abs(file_personality - profile_personality) > 0.3:
outlier_score += 0.3
elif abs(file_personality - profile_personality) > 0.2:
outlier_score += 0.15
# Audience distance
audience_distance = calculate_audience_distance(file_audience, profile_audience)
if audience_distance >= 2:
outlier_score += 0.25
# Sentence length
if abs(file_avg_length - profile_avg_length) > 6:
outlier_score += 0.15
# Boolean contradictions (only if profile is confident)
for bool_field in boolean_fields:
if profile_confidence[bool_field] > 0.7:
if file_value[bool_field] != profile_value[bool_field]:
outlier_score += 0.1
| Score Range | Classification | Action |
|---|---|---|
| < 0.3 | CONSISTENT | Include in profile |
| 0.3 - 0.5 | BORDERLINE | Include with note |
| >= 0.5 | OUTLIER | Skip, profile unchanged |
---
**[4/15] Processing:** docs/deployment-guide.md (892 words)
### File Analysis
| Parameter | File Value | Profile | Difference | Status |
|-----------|------------|---------|------------|--------|
| Formality | 0.48 | 0.52 | -0.04 | ✓ |
| Personality | 0.68 | 0.72 | -0.04 | ✓ |
| First person | Yes | Yes | — | ✓ |
| Contractions | Yes | Yes | — | ✓ |
| Audience | intermediate | intermediate | 0 | ✓ |
| Avg length | 17 | 18 | -1 | ✓ |
**Outlier Score:** 0.08 (CONSISTENT)
**Decision:** INCLUDE
### Profile Update
| Parameter | Before | After | Change |
|-----------|--------|-------|--------|
| Formality | 0.52 | 0.51 | -0.01 |
| Personality | 0.72 | 0.71 | -0.01 |
| Avg length | 18 | 17.8 | -0.2 |
**Cumulative:** 4,929 words from 4 files (0 excluded)
---
---
**[6/15] Processing:** docs/troubleshooting.md (654 words)
### File Analysis
| Parameter | File Value | Profile | Difference | Status |
|-----------|------------|---------|------------|--------|
| Formality | 0.68 | 0.51 | +0.17 | ⚠ |
| Personality | 0.42 | 0.71 | -0.29 | ⚠ |
| Audience | intermediate | intermediate | 0 | ✓ |
**Outlier Score:** 0.38 (BORDERLINE)
**Decision:** INCLUDE WITH NOTE
**Note:** This file has noticeably lower personality than the baseline.
This may indicate:
- Different section type (reference vs. narrative)
- Different author
- Content targeting different context
### Profile Update (applied)
| Parameter | Before | After | Change |
|-----------|--------|-------|--------|
| Formality | 0.51 | 0.53 | +0.02 |
| Personality | 0.71 | 0.68 | -0.03 |
**Cumulative:** 6,237 words from 6 files (0 excluded)
---
---
**[8/15] Processing:** docs/api-reference.md (823 words)
### File Analysis
| Parameter | File Value | Profile | Difference | Flag |
|-----------|------------|---------|------------|------|
| Formality | 0.85 | 0.54 | +0.31 | OUTLIER |
| Personality | 0.18 | 0.69 | -0.51 | OUTLIER |
| First person | No | Yes | — | ⚠ |
| Contractions | No | Yes | — | ⚠ |
| Audience | expert | intermediate | 1 | ✓ |
| Avg length | 22 | 17.5 | +4.5 | ✓ |
**Outlier Score:** 0.72 (OUTLIER)
**Decision:** SKIP
**Reasons:**
- Formality differs by 0.31 (threshold: 0.30)
- Personality differs by 0.51 (threshold: 0.30)
- Appears to be reference documentation vs. narrative content
### Profile: UNCHANGED
**Cumulative:** 6,237 words from 6 files (1 excluded)
---
After processing all files:
---
## Incremental Processing Complete
**Files processed:** 15
**Files included:** 12 (80%)
**Files excluded:** 3 (20%)
### Excluded Files
| File | Outlier Score | Primary Reason |
|------|---------------|----------------|
| docs/api-reference.md | 0.72 | Reference style (formal, low personality) |
| docs/changelog.md | 0.65 | Changelog format (no prose patterns) |
| docs/license.md | 0.81 | Legal text (very formal) |
### Profile Evolution
| Parameter | Baseline | Final | Total Change |
|-----------|----------|-------|--------------|
| Formality | 0.52 | 0.54 | +0.02 |
| Personality | 0.72 | 0.68 | -0.04 |
| Avg length | 18 | 17.2 | -0.8 |
**Final corpus:** 9,847 words from 12 files
---
For each parameter, analyze the corpus and calculate values:
Indicators analyzed:
| Indicator | Casual (→ 0.0) | Formal (→ 1.0) |
|---|---|---|
| Contractions | High ratio (don't, can't) | Low ratio (do not, cannot) |
| Passive voice | Rare | Frequent |
| Vocabulary | Simple, everyday words | Technical, sophisticated |
| Sentence starters | "So", "Well", "And" | "Furthermore", "Additionally" |
| Exclamations | Present | Absent |
Calculation:
formality = (formal_indicators / total_indicators)
Indicators analyzed:
| Indicator | Neutral (→ 0.0) | Engaged (→ 1.0) |
|---|---|---|
| Opinion markers | None | "I think", "I believe", "in my view" |
| Value judgments | Absent | "excellent", "poor", "fascinating" |
| Reactions | None | "surprisingly", "importantly", "notably" |
| Questions | None | Rhetorical questions present |
| Personal references | None | Experience mentions, anecdotes |
Calculation:
personality = (personality_markers / sentences) * scaling_factor
Detect presence of first-person pronouns:
Result: true if > 5% of sentences contain first-person pronouns
Count contracted vs. expanded forms:
| Contracted | Expanded |
|---|---|
| don't | do not |
| can't | cannot |
| won't | will not |
| it's | it is |
| we're | we are |
| they're | they are |
Result: true if contractions > 50% of total opportunities
Analyze technical complexity:
| Level | Indicators |
|---|---|
| beginner | Extensive explanations, simple vocabulary, many examples |
| intermediate | Moderate explanation, some assumed knowledge |
| expert | Minimal explanation, domain jargon, assumed expertise |
Calculation: Based on explanation ratio and vocabulary complexity
avg_length_target = sum(sentence_word_counts) / sentence_count
Calculate standard deviation of sentence lengths:
mix_short = (sentences < 8 words) / total_sentences > 0.15
Detect question marks in declarative contexts (not actual questions needing answers).
Calculate you vs. we ratio:
you_count = count("you", "your", "yours")
we_count = count("we", "our", "ours", "us")
total = you_count + we_count
you_percentage = (you_count / total) * 100
we_percentage = (we_count / total) * 100
| Depth | Indicators |
|---|---|
| minimal | Short paragraphs, bullet points, quick statements |
| moderate | Some explanation, occasional examples |
| thorough | Detailed explanations, multiple examples, context |
Analyze based on:
Detect how frequently analogies are used to explain concepts:
| Level | Indicators | Detection Patterns |
|---|---|---|
| none | No analogies | No comparison patterns found |
| rare | Occasional analogy | 1-2 per 1000 words |
| moderate | Regular use | 3-5 per 1000 words |
| frequent | Heavy reliance | >5 per 1000 words |
Detection patterns:
Also capture analogy domain when patterns are detected:
| Trait | Detection |
|---|---|
| opinions | Opinion verbs: "I think", "I believe", "I recommend" |
| acknowledge_complexity | Hedging: "however", "although", "on the other hand" |
| humor | Informal asides, parenthetical comments, wordplay |
| personal_experience | "In my experience", "I've found", "when I worked on" |
Extract top 5 repeated sentence openers (first 3-4 words):
Synthesize a prose description that captures the voice's emotional quality and unique character. This goes beyond the quantitative parameters to describe how the writing feels.
Elements to consider:
| Category | Examples |
|---|---|
| Emotional warmth | warm, distant, encouraging, neutral, empathetic, detached |
| Authority stance | confident, humble, authoritative, collaborative, deferential |
| Intellectual style | curious, pragmatic, analytical, intuitive, rigorous, exploratory |
| Energy level | energetic, calm, urgent, patient, measured, enthusiastic |
| Relationship to reader | mentoring, peer-to-peer, expert-to-novice, collaborative, instructive |
| Attitude toward subject | passionate, objective, skeptical, optimistic, critical, appreciative |
Synthesis approach:
Example tone descriptions:
Technical tutorial voice:
This voice combines technical precision with genuine warmth. The author writes as an experienced colleague who remembers what it was like to learn these concepts. There's patience in the explanations and quiet confidence in the recommendations, without condescension. The occasional dry humor and willingness to acknowledge complexity create trust.
Opinionated blog voice:
Direct and unapologetic, this voice takes clear positions and defends them with evidence. The writing has intellectual energy and a sense of urgency about getting things right. While confident, it acknowledges counterarguments fairly. The reader feels engaged in a substantive conversation rather than lectured at.
Reference documentation voice:
Precise and economical, this voice prioritizes clarity over personality. Information is organized for quick retrieval rather than narrative flow. The tone is professional and neutral, creating confidence through consistency and completeness rather than personal engagement.
Rate each parameter extraction with confidence:
| Confidence | Corpus Size | Reliability |
|---|---|---|
| HIGH | > 5000 words | Very reliable |
| MEDIUM | 1000-5000 words | Reasonably reliable |
| LOW | < 1000 words | Use with caution |
Display confidence per parameter based on:
## Voice Extraction
**Source:** [source path or pattern]
| Metric | Value |
|--------|-------|
| Files found | [count] |
| Total words | [count] |
| Corpus confidence | [HIGH/MEDIUM/LOW] |
Mode selection:
**Processing mode:** Single-pass (small corpus)
Analyzing all content as unified corpus...
Proceed directly to Step 3 (Present Analysis).
**Processing mode:** Incremental with representative sampling
Scoring [N] files for baseline selection...
2a. Score all files using the representative sample selection algorithm.
2b. Select baseline files:
## Representative Sample Selection
**Selected baseline files:**
| Rank | File | Words | Score | Rationale |
|------|------|-------|-------|-----------|
| 1 | [file] | [words] | [score] | [reason] |
| 2 | [file] | [words] | [score] | [reason] |
**Baseline corpus:** [total] words from [N] files
---
2c. Extract baseline profile from selected files.
2d. Process remaining files one by one:
For each file (sorted by score, descending):
2e. Display final summary:
## Incremental Processing Complete
**Files processed:** [total]
**Files included:** [N] ([%])
**Files excluded:** [N] ([%])
[If any excluded, show excluded files table]
### Profile Evolution
| Parameter | Baseline | Final | Total Change |
|-----------|----------|-------|--------------|
| Formality | [val] | [val] | [change] |
| Personality | [val] | [val] | [change] |
| Avg length | [val] | [val] | [change] |
**Final corpus:** [words] words from [N] files
Use this format for presenting extracted parameters:
## Extracted Voice Profile
Based on analysis of [X] words from [Y] files.
### Core Characteristics
| Parameter | Value | Confidence | Evidence |
|-----------|-------|------------|----------|
| Formality | 0.65 | HIGH | 23% contractions, moderate formal vocabulary |
| Personality | 0.72 | HIGH | 18 opinion markers, 12 value judgments |
| First Person | Yes | HIGH | Found in 34% of sentences |
| Contractions | Yes | HIGH | 78% use contractions |
| Audience | intermediate | MEDIUM | Technical terms with explanations |
### Sentence Patterns
| Parameter | Value | Confidence | Evidence |
|-----------|-------|------------|----------|
| Avg Length | 16 words | HIGH | Calculated from 342 sentences |
| Variation | moderate | HIGH | Std dev = 5.2 |
| Mix Short | Yes | MEDIUM | 19% sentences < 8 words |
| Rhetorical Qs | No | HIGH | 0 rhetorical questions found |
### Pronoun Balance
| Pronoun | Percentage | Confidence |
|---------|------------|------------|
| you | 65% | HIGH |
| we | 35% | HIGH |
### Elaboration Style
| Parameter | Value | Confidence |
|-----------|-------|------------|
| Depth | moderate | MEDIUM |
| Context | Yes | HIGH |
| Examples | Occasional | MEDIUM |
| Reasoning | Yes | HIGH |
### Personality Traits
| Trait | Value | Confidence |
|-------|-------|------------|
| Opinions | Yes | HIGH |
| Complexity | Yes | MEDIUM |
| Humor | none | HIGH |
| Experience | No | HIGH |
### Signature Phrases (Top 5)
1. "Let's look at" (23 occurrences)
2. "The key point is" (18 occurrences)
3. "Worth noting" (15 occurrences)
4. "In practice" (12 occurrences)
5. "Consider how" (9 occurrences)
### Tone Description
> This voice balances technical authority with accessible warmth. The author writes
> as a knowledgeable peer who genuinely wants readers to succeed, offering clear
> explanations without condescension. There's intellectual curiosity in the approach,
> treating complex topics as interesting puzzles rather than obstacles. The occasional
> personal aside and willingness to acknowledge trade-offs create authenticity.
### Phrases to Avoid (detected anti-patterns)
- None detected that conflict with voice
---
**Overall Confidence:** HIGH
Based on sufficient corpus size and consistent patterns.
Use AskUserQuestion:
{
"question": "What would you like to name this voice profile?",
"header": "Profile Name",
"options": [
{"label": "Suggest name", "description": "Based on characteristics: [suggested-name]"},
{"label": "Custom name", "description": "Enter your own profile name"}
]
}
Suggest a name based on detected characteristics:
Use AskUserQuestion:
{
"question": "Where should I save this voice profile?",
"header": "Location",
"options": [
{"label": "Global (Recommended)", "description": "~/.claude/style/voices/ - Available across all projects"},
{"label": "Project", "description": ".style/voice.yaml - Only for this project"}
]
}
Write the complete YAML profile:
# Voice Profile: [name]
# Extracted from: [source files]
# Extraction date: [date]
# Corpus: [X] words from [Y] files
name: "[name]"
version: "1.0"
description: "[auto-generated description based on characteristics]"
# Prose description of the voice's tone and feel
# Captures emotional quality, overall impression, and unique character
tone_description: |
[Prose describing the voice's emotional quality, feel, and unique character.
Goes beyond statistics to capture the human essence of the writing style.
May include: warmth, authority, curiosity, playfulness, confidence, empathy,
intellectual rigor, accessibility, urgency, patience, encouragement, skepticism, etc.]
characteristics:
formality: [value]
personality: [value]
first_person: [true/false]
contractions: [true/false]
audience: "[beginner/intermediate/expert]"
sentence_patterns:
mix_short: [true/false]
max_consecutive_similar: 3
avg_length_target: [value]
variation: "[low/moderate/high]"
rhetorical_questions: [true/false]
elaboration:
depth: "[minimal/moderate/thorough]"
provide_context: [true/false]
include_examples: [true/false]
explain_reasoning: [true/false]
analogies: "[none/rare/moderate/frequent]"
analogy_domain: "[optional: primary domain for analogies, e.g., 'everyday objects', 'cooking', 'construction']"
personality_traits:
opinions: [true/false]
acknowledge_complexity: [true/false]
humor: "[none/subtle/moderate]"
personal_experience: [true/false]
pronoun_balance:
you_percentage: [value]
we_percentage: [value]
signature_phrases:
- "[phrase 1]"
- "[phrase 2]"
- "[phrase 3]"
- "[phrase 4]"
- "[phrase 5]"
avoid_phrases:
- "It goes without saying"
- "As everyone knows"
- "Obviously"
✓ Created voice profile: [name]
**Location:** [path]
**Summary:**
- Formality: [value] ([descriptor])
- Personality: [value] ([descriptor])
- Pronouns: [you]% you / [we]% we
- Style: [brief description]
**To use this profile:**
- Apply to project: `/prose:voice apply [name]`
- Generate content: `/prose:write` (auto-applies if set as project voice)
- View details: `/prose:voice show [name]`
When only one file is provided:
**Processing mode:** Single file
**Note:** Single-file extraction has reduced confidence. Consider providing
additional samples for more reliable voice capture.
Proceeding with standard extraction...
⚠️ Warning: Small corpus detected ([X] words)
Voice extraction works best with larger samples. Results may be less reliable.
Options:
1. Proceed anyway (results will have LOW confidence)
2. Add more content to the analysis
3. Cancel extraction
If more than 5 consecutive files are flagged as outliers after establishing baseline:
⚠️ Warning: Consecutive outlier limit reached
After processing [N] files, [M] consecutive files have been flagged as outliers.
This suggests the baseline may not represent the majority of your content.
**Possible causes:**
- Baseline files have unusual style compared to rest of corpus
- Content contains multiple distinct voices/authors
- Mixed document types (narrative + reference + changelog)
**Options:**
1. **Accept baseline** - Use current profile from [N] included files
2. **Re-select baseline** - Choose different representative files
3. **Relax thresholds** - Include borderline files more liberally
4. **Split extraction** - Create separate profiles for different document types
Use AskUserQuestion to let user choose:
{
"question": "Many files don't match the baseline voice. How should I proceed?",
"header": "Outlier Limit",
"options": [
{"label": "Accept baseline", "description": "Keep profile from [N] matching files"},
{"label": "Re-select baseline", "description": "Let me choose different representative files"},
{"label": "Relax thresholds", "description": "Include more files even if they differ"},
{"label": "Split by type", "description": "Create separate profiles for different content types"}
]
}
When boolean fields show approximately 50% splits or categorical fields have multiple strong candidates:
⚠️ Warning: Inconsistent patterns detected
Analysis suggests mixed authorship or intentionally varied style:
| Parameter | Distribution | Confidence |
|-----------|--------------|------------|
| First person | 52% yes / 48% no | LOW |
| Contractions | 47% yes / 53% no | LOW |
| Audience | 40% intermediate, 35% expert, 25% beginner | LOW |
**This may indicate:**
- Multiple authors with different styles
- Content evolved over time
- Intentional variation by section type
**Recommendation:** Consider extracting from a subset of files by the same author
or content type for more consistent results.
Track files that cannot be processed separately from outliers:
**Skipped files (not counted as outliers):**
| File | Reason |
|------|--------|
| docs/logo.png | Binary file (not text) |
| docs/data.json | No prose content detected |
| docs/snippet.md | Too short (38 words, minimum: 50) |
These files are excluded from analysis but do not affect outlier statistics.
Skip criteria:
If patterns conflict (e.g., some files formal, others casual) in single-pass mode:
⚠️ Inconsistent patterns detected
The analyzed content shows varying styles:
- Files 1-3: Formal (formality ~0.8)
- Files 4-5: Casual (formality ~0.3)
This may indicate:
- Multiple authors
- Different content types
- Style evolution over time
Recommendation: Extract from a more consistent subset of files.
Note: In incremental mode, this is handled automatically through outlier detection.
If writing is too generic to extract distinctive patterns:
ℹ️ Generic writing style detected
The analyzed content doesn't show distinctive voice characteristics.
All parameters fall near default/neutral values.
This could mean:
- The writing intentionally avoids strong voice
- The content is reference/specification style
- More distinctive samples are needed
A "reference" style profile will be created with neutral settings.
This skill complements voice-architect:
Both produce the same YAML format, usable with:
/prose:voice apply/prose:write (auto-applies project voice)Remember: The goal is to capture authentic voice from real writing, enabling consistent personality across future content. Extract what makes the writing distinctive, not just average metrics.