Help us improve
Share bugs, ideas, or general feedback.
Applies computational methods to humanities research: text mining, NLP, corpus linguistics, GIS, network analysis, stylometry, OCR, and data visualization. Use for distant reading, mapping historical events, authorship attribution, or digitizing documents.
npx claudepluginhub alterlab-ieu/alterlab-academic-skills --plugin alterlab-writing-toolsHow this skill is triggered — by the user, by Claude, or both
Slash command
/alterlab-writing-tools:alterlab-digital-humanitiesThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Digital humanities (DH) applies computational methods to the study of human culture, history, language, and society. It is not the replacement of humanistic inquiry with algorithms but the augmentation of interpretive scholarship with tools that can reveal patterns invisible to unaided reading, connect dispersed archives, visualize historical processes, and make cultural heritage accessible to ...
Applies computational methods to humanities research: text mining, corpus linguistics, digital archives, GIS, network analysis, stylometry, OCR, and data visualization.
Automates scientific research workflows from data analysis to publication using a multiagent AI system. Use for generating hypotheses, developing methodologies, executing experiments, and writing LaTeX papers.
Share bugs, ideas, or general feedback.
Digital humanities (DH) applies computational methods to the study of human culture, history, language, and society. It is not the replacement of humanistic inquiry with algorithms but the augmentation of interpretive scholarship with tools that can reveal patterns invisible to unaided reading, connect dispersed archives, visualize historical processes, and make cultural heritage accessible to broader audiences.
This skill covers the major computational methods used in humanities research: text mining and natural language processing (topic modeling with LDA and BERTopic, sentiment analysis, named entity recognition), corpus linguistics (concordance, collocation, frequency analysis, keyness), digital archiving and metadata standards (Dublin Core, TEI XML), geographic information systems (GIS) for historical research, network analysis of historical figures and literary characters, stylometry and computational authorship attribution, optical character recognition (OCR) workflows for digitizing historical texts, digital scholarly editions, data visualization for humanities data, distant reading as theorized by Franco Moretti, cultural analytics as developed by Lev Manovich, and the Python ecosystem for humanities computing (spaCy, NLTK, Voyant Tools, AntConc).
The skill is designed for humanities scholars who want to integrate computational methods into their research -- whether they are analyzing Victorian novels, mapping colonial trade networks, studying the evolution of political rhetoric, or building digital archives of endangered languages. No prior programming experience is assumed, though some methods require basic Python or R skills. For each method, the skill describes the intellectual rationale, practical implementation, available tools (from no-code to full programming), and critical perspectives on the method's limitations.
Use this skill when you need to:
Natural language processing (NLP) provides computational tools for analyzing text at scales impossible for human readers. In humanities research, NLP is not a replacement for close reading but a complement that can identify patterns across thousands or millions of texts, guide the selection of passages for close analysis, and test hypotheses about language change, genre conventions, and cultural trends.
Topic modeling uses unsupervised machine learning to discover latent thematic structure in document collections. The two dominant approaches are Latent Dirichlet Allocation (LDA) and BERTopic.
Latent Dirichlet Allocation (LDA):
LDA (Blei, Ng, & Jordan, 2003) models each document as a mixture of topics, and each topic as a distribution over words. It is a bag-of-words model -- word order does not matter.
LDA workflow:
Example: LDA topic from a corpus of 19th-century British novels
Topic 7 (labeled "Domestic Life"):
Top words: room, house, door, table, fire, chair, window, sat,
morning, evening, bed, garden, dinner, tea, kitchen
Top documents: Cranford (Gaskell), Middlemarch (Eliot),
North and South (Gaskell)
Interpretation: This topic captures domestic settings and daily
routines. Its prevalence increases in novels by women authors
and in novels published after 1850, suggesting a shift toward
domestic realism in mid-Victorian fiction.
Choosing the number of topics (k):
BERTopic:
BERTopic (Grootendorst, 2022) uses transformer-based sentence embeddings (BERT) to create document representations, then clusters them using HDBSCAN and extracts topic representations using c-TF-IDF. Unlike LDA, it captures semantic meaning beyond individual words.
BERTopic advantages over LDA:
BERTopic Python implementation:
from bertopic import BERTopic
from sentence_transformers import SentenceTransformer
# Use a sentence transformer model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
# Initialize and fit BERTopic
topic_model = BERTopic(
embedding_model=embedding_model,
min_topic_size=10,
nr_topics="auto"
)
topics, probs = topic_model.fit_transform(documents)
# Inspect topics
topic_model.get_topic_info()
# Visualize topic distribution
topic_model.visualize_topics()
# Track topics over time
topics_over_time = topic_model.topics_over_time(
documents, timestamps
)
topic_model.visualize_topics_over_time(topics_over_time)
Sentiment analysis classifies text by emotional valence (positive, negative, neutral) or more specific emotional categories. In humanities research, it is used to study emotional arcs in novels, shifts in political rhetoric, audience reception in reviews, and emotional expression across historical periods.
Approaches to sentiment analysis:
| Approach | How It Works | Best For | Limitations |
|---|---|---|---|
| Lexicon-based (VADER, AFINN, NRC) | Counts words from sentiment dictionaries | Quick analysis, transparent | Misses context, sarcasm, domain-specific usage |
| Machine learning (Naive Bayes, SVM) | Trained on labeled examples | Domain-specific tasks | Requires labeled training data |
| Transformer-based (BERT, RoBERTa) | Fine-tuned language models | High accuracy, context-aware | Computationally expensive, may need fine-tuning |
Cautions for humanities research:
Example: Sentiment arc analysis of a novel
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
# Initialize VADER
sid = SentimentIntensityAnalyzer()
# Split novel into chunks (e.g., 1000-word windows)
chunks = split_text(novel_text, window_size=1000)
# Calculate sentiment for each chunk
sentiments = []
for chunk in chunks:
scores = sid.polarity_scores(chunk)
sentiments.append(scores["compound"])
# Plot the sentiment arc
import matplotlib.pyplot as plt
plt.plot(range(len(sentiments)), sentiments)
plt.xlabel("Narrative Position")
plt.ylabel("Sentiment (VADER compound)")
plt.title("Emotional Arc: Pride and Prejudice")
plt.axhline(y=0, color="gray", linestyle="--")
plt.show()
NER identifies and classifies named entities in text -- people, places, organizations, dates, monetary values, and other proper nouns. In humanities research, NER enables automated extraction of historical actors, geographic references, and temporal markers from large corpora.
NER tools for humanities:
| Tool | Language | Strengths | Notes |
|---|---|---|---|
| spaCy | Python | Fast, accurate, multiple languages | Best general-purpose NER |
| NLTK | Python | Educational, well-documented | Older, less accurate than spaCy |
| Stanza (Stanford NLP) | Python | Research-grade, many languages | Good for non-English texts |
| Flair | Python | State-of-the-art, flexible | Can fine-tune for historical text |
| BookNLP | Python/Java | Designed for literary texts | Character identification, coreference |
Fine-tuning NER for historical texts:
Pre-trained NER models are trained on modern text (news articles, Wikipedia) and perform poorly on historical text with archaic spelling, different naming conventions, and unfamiliar entities. Fine-tuning on manually annotated historical text dramatically improves accuracy.
import spacy
from spacy.training import Example
# Load base model
nlp = spacy.load("en_core_web_sm")
# Prepare training data (manually annotated historical text)
TRAIN_DATA = [
("Mr. Darcy arrived at Pemberley in the autumn of 1811.",
{"entities": [(0, 10, "PERSON"), (22, 31, "LOC"), (50, 54, "DATE")]}),
("The East India Company dispatched three vessels from Calcutta.",
{"entities": [(4, 22, "ORG"), (53, 61, "LOC")]}),
]
# Fine-tune the NER component
# (simplified -- production code needs more examples and proper training loop)
Corpus linguistics analyzes large, structured text collections to study language patterns. It provides empirical evidence for claims about language use, change, and variation that would be impossible to verify by intuition alone.
A concordance displays every occurrence of a search term in its immediate context (typically 5-10 words on each side), creating a Key Word in Context (KWIC) view. This reveals patterns of usage, collocates, and semantic prosody.
Example: KWIC concordance for "liberty" in 18th-century political texts
...the natural LIBERTY of mankind is to be free from...
...that civil LIBERTY consists in the security of...
...enemies of LIBERTY who would enslave the nation...
...religious LIBERTY and freedom of conscience...
...took up arms for LIBERTY against tyrannical oppression...
Patterns visible: "liberty" collocates with "natural," "civil," "religious" -- different conceptual frames for the same word.
Collocation analysis identifies words that co-occur with a target word more frequently than chance would predict. Statistical measures include Mutual Information (MI), t-score, log-likelihood, and Log Dice.
Collocation measures compared:
| Measure | Favors | Best For |
|---|---|---|
| MI (Mutual Information) | Rare, exclusive collocates | Finding fixed phrases |
| t-score | Frequent collocates | Common usage patterns |
| Log-likelihood (G2) | Statistically significant collocates | Balanced analysis |
| Log Dice | Stable across corpus sizes | Comparing corpora |
Word frequency counts how often each word appears in a corpus. Raw frequency, normalized frequency (per million words), and relative frequency are all useful.
Keyness compares word frequencies between two corpora to identify words that are statistically over- or under-represented in one corpus relative to the other. This reveals what is distinctive about a text or collection.
Example: Keyness analysis comparing male vs. female authored Victorian novels
Words overrepresented in female-authored novels:
she, her, room, mother, child, dress, felt, tears, home
Words overrepresented in male-authored novels:
he, his, money, business, gentleman, sir, political, war
Interpretation: Keyness analysis reveals gendered thematic emphases
in Victorian fiction, with female authors more frequently writing about
domestic spaces and emotional states, and male authors more frequently
addressing public life and commerce. However, these are statistical
tendencies, not absolute divisions -- individual authors cross these
patterns in interesting ways.
Corpus linguistics software:
| Tool | Type | Cost | Best For |
|---|---|---|---|
| AntConc | Desktop application | Free | Concordance, collocation, keyness |
| Voyant Tools | Web-based | Free | Quick visualization, no installation |
| Sketch Engine | Web-based | Paid (free for academics) | Large corpora, SketchDiff |
| CQPweb | Web-based | Free (institutional) | Corpus query language |
| NLTK | Python library | Free | Programmable analysis |
| quanteda | R package | Free | Statistical text analysis |
Dublin Core is a 15-element metadata standard used widely in digital archives and libraries. It provides a simple, universal vocabulary for describing digital resources.
The 15 Dublin Core elements:
| Element | Description | Example |
|---|---|---|
| Title | Name of the resource | "Letter from Thomas Jefferson to John Adams" |
| Creator | Entity primarily responsible | "Jefferson, Thomas" |
| Subject | Topic of the resource | "American politics; Enlightenment philosophy" |
| Description | Account of the resource | "Personal letter discussing agrarian policy..." |
| Publisher | Entity making resource available | "Library of Congress" |
| Contributor | Entity contributing to the resource | "Adams, John (recipient)" |
| Date | Date associated with the resource | "1812-06-11" |
| Type | Nature or genre | "Text; Correspondence" |
| Format | Physical or digital format | "image/tiff; 2 pages" |
| Identifier | Unambiguous reference | "loc.gov/item/mtjbib024567" |
| Source | Derived-from resource | "Thomas Jefferson Papers, Series 1" |
| Language | Language of the resource | "en" |
| Relation | Related resources | "Reply to Adams letter of 1812-05-28" |
| Coverage | Spatial or temporal coverage | "Monticello, Virginia; 1812" |
| Rights | Rights information | "Public domain" |
The Text Encoding Initiative (TEI) provides an XML-based standard for encoding literary, historical, and linguistic texts with rich structural and interpretive markup. TEI is the standard for digital scholarly editions.
TEI document structure:
<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Letter from Mary Shelley to Leigh Hunt</title>
<author>Shelley, Mary Wollstonecraft, 1797-1851</author>
<editor>Digital editor name</editor>
</titleStmt>
<publicationStmt>
<publisher>Digital Archive Name</publisher>
<date>2026</date>
<availability>
<licence target="https://creativecommons.org/licenses/by/4.0/">
CC-BY 4.0
</licence>
</availability>
</publicationStmt>
<sourceDesc>
<msDesc>
<msIdentifier>
<repository>Bodleian Library</repository>
<idno>MS. Shelley c.1, f.234</idno>
</msIdentifier>
</msDesc>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<opener>
<dateline><placeName>Genoa</placeName>,
<date when="1823-02-15">15 February 1823</date>
</dateline>
<salute>My dear <persName ref="#hunt">Hunt</persName>,</salute>
</opener>
<p>I write to you in great haste, having just received
your letter from <placeName ref="#london">London</placeName>.
The news of <persName ref="#byron">Lord Byron</persName>'s
departure for <placeName ref="#greece">Greece</placeName>
has left us all in a state of considerable anxiety.</p>
<closer>
<salute>Yours most affectionately,</salute>
<signed><persName ref="#mshelley">Mary Shelley</persName></signed>
</closer>
</body>
</text>
</TEI>
Key TEI elements for humanities encoding:
| Element | Purpose | Example Use |
|---|---|---|
<persName> | Personal name | Tagging historical figures |
<placeName> | Place name | Geographic references |
<date> | Date (with @when for normalization) | Temporal references |
<note> | Editorial annotation | Footnotes, commentary |
<app> and <rdg> | Apparatus (textual variants) | Critical editions |
<del> and <add> | Deletions and additions | Manuscript editing |
<unclear> | Uncertain reading | Damaged or illegible text |
<gap> | Omitted material | Lost or censored text |
<choice> | Alternative encodings | Original/regularized spelling |
Geographic Information Systems (GIS) enable spatial analysis of historical data -- mapping events, tracking movements, analyzing spatial patterns, and overlaying historical information on geographic space.
Common GIS applications in humanities:
GIS tools for humanities:
| Tool | Type | Cost | Best For |
|---|---|---|---|
| QGIS | Desktop | Free | Full GIS functionality, open source |
| ArcGIS | Desktop + cloud | Paid (free for students) | Industry standard, extensive tools |
| Google Earth Pro | Desktop | Free | Visualization, KML import |
| Palladio | Web-based | Free | Network + map visualization for humanities |
| Mapbox | Web + API | Free tier | Custom interactive web maps |
| Leaflet | JavaScript library | Free | Lightweight web maps |
| kepler.gl | Web-based | Free | Large-scale geospatial data visualization |
Example: Georeferencing a historical map
Workflow in QGIS:
1. Load the scanned historical map as a raster layer
2. Add a modern basemap (OpenStreetMap) for reference
3. Identify Ground Control Points (GCPs) -- locations identifiable
on both the historical map and modern basemap
4. Place at least 4 GCPs (more is better, spread across the map)
5. Choose a transformation type:
- Linear: 3 GCPs minimum (shift, rotate, scale)
- Polynomial 1: 3 GCPs minimum (affine transformation)
- Polynomial 2: 6 GCPs minimum (handles distortion)
- Thin Plate Spline: many GCPs (flexible, handles local distortion)
6. Run the transformation and inspect the result
7. Save the georeferenced map with spatial reference metadata
Network analysis reveals patterns of connection, influence, and community structure among historical actors. Applied to correspondence networks, co-appearance in documents, intellectual citation, or organizational membership, it can reveal hidden structures in historical social worlds.
Types of historical networks:
| Network Type | Nodes | Edges | Example |
|---|---|---|---|
| Correspondence | People | Letters exchanged | Republic of Letters network |
| Co-occurrence | People | Mentioned in same document | Colonial administration officials |
| Citation | Texts/authors | One cites another | Intellectual influence networks |
| Kinship | People | Family relations | Dynastic networks |
| Trade | Places/merchants | Commercial exchange | Mediterranean trade network |
| Organizational | People/orgs | Membership/affiliation | Reform movement networks |
Building a historical network from archival sources:
Step 1: Define nodes and edges
- What counts as a node? (person, place, text, organization)
- What counts as an edge? (letter, co-occurrence, citation, transaction)
- Is the edge directed or undirected?
- What edge attributes to record? (date, type, weight)
Step 2: Extract data from sources
- Manual extraction from archival documents
- Semi-automated extraction using NER on digitized texts
- Structured databases (EMLO for early modern letters, SNAP for prosopography)
Step 3: Create edge list
Format: Source, Target, Weight, Date, Type
"Jefferson", "Adams", 1, "1812-06-11", "letter"
"Jefferson", "Madison", 1, "1812-06-15", "letter"
Step 4: Analyze in Gephi, NetworkX, or igraph
- Calculate centrality measures
- Detect communities (Louvain, modularity)
- Visualize with meaningful layout (ForceAtlas2, geographic)
- Filter by time period for temporal analysis
Stylometry uses statistical analysis of writing style to attribute authorship of anonymous or disputed texts. It relies on the principle that authors have measurable stylistic habits -- especially in function words, sentence length, and vocabulary richness -- that are unconscious and therefore difficult to imitate.
Key stylometric features:
| Feature | Description | Why It Works |
|---|---|---|
| Function word frequencies | the, of, and, to, a, in, is, it | Unconscious, content-independent |
| Word length distribution | Average and variance of word lengths | Reflects vocabulary preferences |
| Sentence length | Average and variance | Reflects syntactic habits |
| Vocabulary richness | Type-token ratio, hapax legomena | Lexical diversity |
| Character n-grams | Sequences of n characters | Captures sub-word patterns |
| POS tag n-grams | Sequences of part-of-speech tags | Syntactic patterns |
Stylometry tools:
| Tool | Language | Method | Best For |
|---|---|---|---|
| Stylo (R package) | R | Delta, PCA, cluster analysis | Literary stylometry |
| JGAAP | Java | Multiple classifiers | General authorship attribution |
| PyDelta | Python | Burrows Delta variants | Python-based workflows |
| Signature | Web-based | Visualization | Quick exploration |
Burrows Delta method:
Delta (Burrows, 2002) is the most widely used stylometric method. It measures the "distance" between texts based on z-scores of the most frequent words:
Algorithm:
1. Select the n most frequent words across all texts (typically 100-500)
2. For each word, calculate z-scores across all texts
3. For each pair of texts, calculate the mean absolute difference
of z-scores (this is Delta)
4. The text with the smallest Delta to the anonymous text is the
most likely author
Variants:
- Classic Delta (Burrows, 2002): Mean absolute z-score difference
- Cosine Delta (Wurzburg group): Cosine distance on z-scores
- Eder Delta: Emphasis on very frequent words
- Argamon Linear Delta: Manhattan distance
Example: Stylometric analysis in R (stylo package)
library(stylo)
# Place texts in corpus/ subdirectory
# Filename format: AuthorName_TextTitle.txt
# Run cluster analysis
results <- stylo(
gui = FALSE,
corpus.dir = "corpus",
corpus.lang = "English",
mfw.min = 100, # Minimum most frequent words
mfw.max = 500, # Maximum most frequent words
mfw.incr = 100, # Increment
analysis.type = "CA", # Cluster Analysis
distance.measure = "wurzburg", # Cosine Delta
write.png.file = TRUE
)
Optical Character Recognition (OCR) converts images of text (scanned documents, photographs of manuscripts, historical newspapers) into machine-readable text. OCR quality is critical for all downstream text analysis.
OCR tools comparison:
| Tool | Type | Best For | Languages | Historical Text |
|---|---|---|---|---|
| Tesseract | Open source | General purpose | 100+ | Moderate (needs training) |
| Kraken | Open source | Historical/non-Latin scripts | Many | Excellent (designed for it) |
| Transkribus | Free platform | Handwritten text (HTR) | Many | Excellent |
| ABBYY FineReader | Commercial | High-volume production | Many | Good |
| Google Cloud Vision | API | Large-scale, cloud | Many | Good |
| Amazon Textract | API | Structured documents | English primarily | Moderate |
OCR workflow for historical documents:
1. IMAGE PREPARATION
- Scan at 300-400 DPI minimum (600 DPI for small text)
- Use grayscale or binary (not color unless needed)
- Deskew rotated pages
- Crop to text area
- Binarize (convert to black and white) using adaptive thresholding
2. OCR PROCESSING
- Select appropriate engine and language model
- For historical text: use period-appropriate training data if available
- Process page by page
- Maintain page/document structure
3. POST-PROCESSING
- Spell-check against period-appropriate dictionaries
- Correct common OCR errors (rn -> m, cl -> d, etc.)
- Validate against spot-checks of original images
- Preserve original line/page breaks in metadata
4. QUALITY ASSESSMENT
- Character Error Rate (CER): % of characters incorrectly recognized
- Word Error Rate (WER): % of words with at least one error
- Acceptable CER for research: < 5% (ideally < 2%)
- Always report OCR quality in publications using the data
Tesseract command-line example:
# Basic OCR
tesseract input.tiff output -l eng
# With page segmentation mode for single column
tesseract input.tiff output -l eng --psm 6
# With custom trained model for historical English
tesseract input.tiff output -l eng_hist --psm 6 --oem 1
Digital editions present primary texts with critical apparatus, annotations, and multimedia in a digital environment. They go beyond digitized facsimiles by adding scholarly interpretation, textual variants, and interactive features.
Components of a digital scholarly edition:
Digital edition platforms:
| Platform | Type | Best For |
|---|---|---|
| Edition Visualization Technology (EVT) | Open source | TEI-based critical editions |
| Versioning Machine | Open source | Parallel text comparison |
| TextGrid | Platform | German-language editions |
| FromThePage | Web platform | Collaborative transcription |
| Scripto | Plugin (Omeka) | Crowdsourced transcription |
| IIIF (protocol) | Standard | Interoperable image delivery |
Visualization in the humanities serves both analytical and communicative purposes -- revealing patterns in data and presenting arguments visually.
Humanities-specific visualization tools:
| Tool | Best For | Output |
|---|---|---|
| Palladio | Historical data (maps, networks, timelines) | Interactive web |
| Gephi | Network visualization | Static images, interactive (via plugins) |
| Voyant Tools | Text visualization (word clouds, trends, contexts) | Interactive web |
| StoryMapJS | Narrative maps | Interactive web |
| TimelineJS | Chronological narratives | Interactive web |
| Flourish | General data storytelling | Interactive web |
| RAWGraphs | Unconventional chart types | SVG export |
| D3.js | Custom interactive visualizations | Web (requires JavaScript) |
| matplotlib/seaborn | Statistical plots | Static images |
Visualization principles for humanities data:
Distant reading, as theorized by Franco Moretti (2005, 2013), proposes that we can understand literary history not only by close reading individual texts but by analyzing large numbers of texts through quantitative and computational methods. Instead of reading a few canonical works closely, distant reading examines hundreds or thousands of texts to reveal patterns of genre, form, theme, and cultural evolution.
Key distant reading methods:
Moretti's key arguments:
Cultural analytics, developed by Lev Manovich (2020), applies computational analysis to large collections of cultural artifacts -- images, video, music, design, social media, and other digital media. It extends distant reading beyond text to the full spectrum of human cultural production.
Cultural analytics methods:
Python tools for cultural analytics:
| Library | Purpose |
|---|---|
| OpenCV | Image processing, feature extraction |
| Pillow (PIL) | Image manipulation |
| scikit-image | Scientific image analysis |
| face_recognition | Face detection and recognition |
| ImageAI | Object detection |
| matplotlib / seaborn | Visualization |
| plotly | Interactive visualization |
spaCy -- Industrial-strength NLP:
import spacy
# Load English model
nlp = spacy.load("en_core_web_sm")
# Process text
doc = nlp("Mary Shelley wrote Frankenstein in Geneva in 1816.")
# Named entities
for ent in doc.ents:
print(f"{ent.text} -> {ent.label_}")
# Mary Shelley -> PERSON
# Frankenstein -> WORK_OF_ART
# Geneva -> GPE
# 1816 -> DATE
# Part-of-speech tags
for token in doc:
print(f"{token.text}: {token.pos_} ({token.dep_})")
# Sentence segmentation, dependency parsing, lemmatization
NLTK -- Natural Language Toolkit:
import nltk
from nltk.corpus import gutenberg
from nltk import FreqDist, ConditionalFreqDist
# Load a Gutenberg text
text = gutenberg.words("austen-emma.txt")
# Frequency distribution
fdist = FreqDist(text)
fdist.most_common(20)
# Concordance
from nltk.text import Text
emma = Text(text)
emma.concordance("marriage", width=80, lines=10)
# Collocations
emma.collocations()
Voyant Tools (no-code option):
Voyant Tools (voyant-tools.org) provides browser-based text analysis with no programming required:
AntConc (desktop corpus tool):
AntConc (laurenceanthony.net/software/antconc) provides: