Remove Hebrew vowel points (nikkud) and/or cantillation marks (te'amim) from Hebrew text. Use when the user asks for "unvocalized", "consonantal", "without nikkud", "ktiv male", "strip vowels", "remove te'amim/trope", or asks to clean Hebrew text fetched from Sefaria (which ships with full nikkud and often te'amim) for plain reading, search, or copy-paste.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin jewish-texts-referenceThis skill uses the workspace's default tool permissions.
Two paths — pick based on environment:
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Share bugs, ideas, or general feedback.
Two paths — pick based on environment:
https://removenikud.dicta.org.il/'s public backend. Use when the user explicitly asks for "Dicta", or when handling unusual edge cases (Yiddish, mixed corpora) where you want their tool's behaviour rather than a raw regex.find-text call where the user wants the consonantal text only| Range | Block | Default action |
|---|---|---|
U+0591–U+05AF | Cantillation marks (te'amim / trope) | Remove |
U+05B0–U+05BD | Nikkud (sheva, vowels, dagesh, meteg, etc.) | Remove |
U+05BF | Rafe | Remove |
U+05C1, U+05C2 | Shin/sin dot | Remove |
U+05C4, U+05C5 | Upper/lower dot (rare, masoretic) | Remove |
U+05C7 | Qamats qatan | Remove |
Preserve these — they are punctuation, not diacritics:
U+05BE maqaf (־)U+05C0 paseq (׀)U+05C3 sof pasuk (׃)U+05C6 nun hafukha (׆)U+05F3, U+05F4 geresh, gershayim (׳, ״)Confirm scope — strip nikkud only, teamim only, or both (default for "unvocalized").
Run via python3 -c in Bash. Use explicit \u escapes — the literal-character form silently includes punctuation code points and strips them too. The character class below has deliberate gaps at U+05BE / U+05C0 / U+05C3 / U+05C6 to preserve maqaf, paseq, sof pasuk, and nun hafukha:
import re
# Te'amim (cantillation) only — U+0591 to U+05AF
TEAMIM = r'[֑-֯]'
# Nikkud only — vowels, dagesh, sheva, meteg, rafe, shin/sin dot, qamats qatan
NIKKUD = r'[ְ-ׇֽֿׁׂׅׄ]'
# Both, in one pass — note the gaps that preserve Hebrew punctuation
BOTH = r'[֑-ׇֽֿׁׂׅׄ]'
def strip(text, mode='both'):
pattern = {'nikkud': NIKKUD, 'teamim': TEAMIM, 'both': BOTH}[mode]
return re.sub(pattern, '', text)
Verified: applied to בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃ (Genesis 1:1), BOTH mode yields בראשית ברא אלהים את השמים ואת הארץ׃ — sof pasuk preserved.
Normalise whitespace afterwards: re.sub(r' {2,}', ' ', text).strip().
Don't use NFD-decompose-and-drop-combining-marks. It over-strips and breaks mixed-script text. Always use the explicit ranges above.
Public, no auth, no key. The frontend at removenikud.dicta.org.il calls this directly:
curl -sS -X POST https://remove-nikud-2-0.loadbalancer2.dicta.org.il/api \
-H 'Content-Type: application/json' \
-d '{"task":"remove_nikud","data":"<HEBREW TEXT>"}'
# => {"results":"<stripped text>"}
Notes:
Return the cleaned text in a fenced code block so RTL rendering doesn't reflow the user's terminal. Cite which path was used.