diff options
author | scole <scole@pkgsrc.org> | 2020-08-13 20:52:08 +0000 |
---|---|---|
committer | scole <scole@pkgsrc.org> | 2020-08-13 20:52:08 +0000 |
commit | b07548647c6d46aaebfeef8c155cdf784f9e87e8 (patch) | |
tree | c06d7ce614ed4894cd6746e47a3d746758d5135b /textproc/split-thai/DESCR | |
parent | a95644a520c61d223d2bfb0c621815d947551513 (diff) | |
download | pkgsrc-b07548647c6d46aaebfeef8c155cdf784f9e87e8.tar.gz |
Add split-thai 0.1, a set of utilities for splitting Thai UTF8 text by word boundaries
Diffstat (limited to 'textproc/split-thai/DESCR')
-rw-r--r-- | textproc/split-thai/DESCR | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/textproc/split-thai/DESCR b/textproc/split-thai/DESCR new file mode 100644 index 00000000000..5dba7ad4417 --- /dev/null +++ b/textproc/split-thai/DESCR @@ -0,0 +1,6 @@ +A collection of utilities to split Thai Unicode UTF-8 text by word +boundaries, also known as word tokenization. The utilities use emacs, +swath, and a c++ icu-project program. All use dictionary-based word +splitting. + +Also included is merged dictionary file of thai words. |