summaryrefslogtreecommitdiff
path: root/docs/reference/analysis/tokenizers/letter-tokenizer.asciidoc
blob: 03025ccd30367971aef0bf81bbd52324b80ca85c (plain)
1
2
3
4
5
6
7
[[analysis-letter-tokenizer]]
=== Letter Tokenizer

A tokenizer of type `letter` that divides text at non-letters. That's to
say, it defines tokens as maximal strings of adjacent letters. Note,
this does a decent job for most European languages, but does a terrible
job for some Asian languages, where words are not separated by spaces.