Efficiently process multiple documents with batch optimization
/plugin marketplace add SpillwaveSolutions/spacy-nlp-agentic-skill/plugin install spacy-nlp@spacy-nlp-agentic-skillEfficient batch processing using nlp.pipe optimization.
/batch-nlp --input docs/ --output results/ --task entities
/batch-nlp --input texts.jsonl --output out.json --batch-size 100
/batch-nlp --input corpus.txt --output analysis.json --workers 4
# SLOW - don't do this
for text in texts:
doc = nlp(text)
# FAST - 5-10x speedup
for doc in nlp.pipe(texts, batch_size=50):
process(doc)
| Technique | Speedup |
|---|---|
| nlp.pipe() | 5-10x |
| Disable components | 2-3x |
| Multiprocessing | 2-4x |
| Combined | 10-40x |
[████████████░░░░░░] 67% | 10,050/15,000
Speed: 245 docs/sec | ETA: 20s