summaryrefslogtreecommitdiff
path: root/textproc/split-thai/DESCR
diff options
context:
space:
mode:
authorscole <scole@pkgsrc.org>2020-08-13 20:52:08 +0000
committerscole <scole@pkgsrc.org>2020-08-13 20:52:08 +0000
commitb07548647c6d46aaebfeef8c155cdf784f9e87e8 (patch)
treec06d7ce614ed4894cd6746e47a3d746758d5135b /textproc/split-thai/DESCR
parenta95644a520c61d223d2bfb0c621815d947551513 (diff)
downloadpkgsrc-b07548647c6d46aaebfeef8c155cdf784f9e87e8.tar.gz
Add split-thai 0.1, a set of utilities for splitting Thai UTF8 text by word boundaries
Diffstat (limited to 'textproc/split-thai/DESCR')
-rw-r--r--textproc/split-thai/DESCR6
1 files changed, 6 insertions, 0 deletions
diff --git a/textproc/split-thai/DESCR b/textproc/split-thai/DESCR
new file mode 100644
index 00000000000..5dba7ad4417
--- /dev/null
+++ b/textproc/split-thai/DESCR
@@ -0,0 +1,6 @@
+A collection of utilities to split Thai Unicode UTF-8 text by word
+boundaries, also known as word tokenization. The utilities use emacs,
+swath, and a c++ icu-project program. All use dictionary-based word
+splitting.
+
+Also included is merged dictionary file of thai words.