summaryrefslogtreecommitdiff
path: root/textproc/hunspell
AgeCommit message (Collapse)AuthorFilesLines
2019-08-11Bump PKGREVISIONs for perl 5.30.0wiz1-1/+2
2018-11-16Update hunspell to 1.7.0.bsiegert5-24/+22
Bump ABI_DEPENDS in bl3.mk. New features and bug fixes by Laszlo Nemeth, supported by FSF.hu Foundation: • No annoying suggestion times any more, especially in languages with compound word handling and complex morphology. By adding balanced multi-level time limits, now the guaranteed suggestion time is there within half a second, not seconds (nor dozen of seconds or more in extreme cases) for longer misspellings, too. • add SPELLML support for run-time dictionary extension with optional affixation of user words. See new "Grammar By" feature of language-specific user dictionaries of LibreOffice 6.0: News: https://wiki.documentfoundation.org/ReleaseNotes/6.0#.E2.80.9CGrammar_By.E2.80.9D_spell_checking Screencast with English example: https://www.youtube.com/watch?v=EsS3gaBTfOo Screencast with German example: https://www.youtube.com/watch?v=aYVFDqCUb6I • Improved, highly customizable suggestions on level of dictionary words: Pronunciations and typical misspellings defined by optional "ph:" fields of the dictionary words are used not only in n-gram suggestions, but as elements of the REP replacement list getting the highest priority in normal suggestions, also giving the best suggestions for short words, too. More information: see "ph:" in man 5 hunspell. • Handling multiple word suggestions is much more easier. Like in a traditional spelling dictionary, for example, to get the correct suggestion "a lot" for the typical misspelling "alot" at the first place, now it's enough to put the following line to the dic(tionary) file: a lot • Limit compound overgeneration by dictionary based word pairs: Now it's possible to filter bad compound words by listing the correct word pairs with space in the dictionary, as in a traditional spelling dictionary. • clean-up suggestion: □ no n-gram and compound word suggestions, if "good" suggestion exists, ie. uppercase, REP, ph: or dictionary word pair suggestions □ word pairs are always suggested, if they exist in the dic file □ word pairs have top priority in suggestions, and these are the only suggestions if there is no other good suggestion. □ also dictionary word pairs separated by dash instead of space are handled specially in two-word suggestion (depending from the language) • limit bad suggestions by improved n-gram suggestion rules: don't suggest capitalized dictionary words for lower case misspellings in n-gram suggestions, except □ PHONE usage, or □ in the case of German, where not only proper nouns are capitalized, or □ the capitalized word has special pronunciation and don't suggest if the difference of lengths of misspellings and suggestions is 5 or more characters. • Extend dotless i and dotted I rules to Crimean Tatar language Allow dotted I in dictionary, and disable bad capitalization of i. • BREAK: extended recursive word breaking algorithm to handle words or words with suffixes when they already contain word break characters, for example, "e-mail" is a dictionary word with a word break character, and it wasn't accepted before in compounds in some languages. • FORBIDDENWORD precedes BREAK: Now it's possible to forbid compound forms recognized by BREAK word breaking by adding the bad compounds to the dictionary with FORBIDDENWORD flags. • lower limit for "doubletwochars" suggestion algorithm: one of the typical misspellings recognized by Hunspell suggestion mechanism is the syllable duplication. Along the old pattern ABABA -> ABA, for example nutrITITIon -> nutrITIon, now also the simpler ABAB -> AB pattern is recognized in non-starting position, for example, regretTETEd -> regretTEd. • lower limit for longswapchar and movechar: recognized only max. 4-character distances to avoid slow and bad suggestions. • fix compound handling for new Hungarian orthography reform • Allow suggestion search for prefix + two suffixes: Remove artificial performance limit to get correct suggestions for relatively simple misspellings in Hungarian, etc., when the word form contains prefix and both derivative and inflectional suffixes, too: lefikszálása -> lefixálása Improvements for command-line Hunspell: • Remove false alarms during checking OpenDocument (ODF) documents by ignoring <text:span> elements. (LibreOffice creates a lot of <text:span> elements also within words during text reediting, resulted often huge amount of broken words before this fix.) • List filenames during filtering multiple files in command-line: Examples: $ hunspell -l *.odt a.odt: mispelling b.odt: egzample $ hunspell -l -G *.odt a.odt: good b.odt: words • Dictionary search by option -D doesn't wait for the standard input (fixed by Siva Mahadevan) Other improvements: • makealias dictionary compression: add option --minimize-diff to reuse free positions of alias lists to create minimal and readable diffs for alias compressed dictionaries stored in revision control systems, as dictionaries of LibreOffice. • Brazilian-Portuguese translation by Rafael Fontenelle • Catalan translation by robert dot buj at gmail • Minor bug fixes by several contributors, see git log
2018-10-26hunspell: Simplify distfile handling (NFC)leot1-4/+2
GITHUB_PROJECT by default is already PKGBASE, no need to reinitialize it. Reuse PKGVERSION_NOREV for GITHUB_TAG. Remove commented out WRKSRC while here.
2018-10-23Update hunspell to 1.6.2.bsiegert10-109/+75
1.6.2 Library changes: no. Same as 1.6.1. Command line tool: - Added German translation - Fixed bug with wrong output encoding, not respecting system locale. 1.6.1 Library changes: - Performance improvements in suggest() - Fixes regressions for Hungarian related to compounding. - Fixes regressions for Korean related to ICONV. Command line tool: - Added Tajik translation - Fix regarding serching of OOo dicts installed in user folder Manpages: - Fix microsoft-cp1251 to cp1251. Dicts should not use the first. - Typos. 1.6.0 Changes in the library: - Performance improvement in ngsuggest(), suggestions should be faster. - Revert MAXWORDLEN to 100 as in 1.3.3 for performance reasons. - MAXWORDLEN can be set during build time with -D defines. - Fix crash when word with 102 consecutive X is spelled. Changes in the command line tool: - -D shows all loaded dictionares insted of only the first. - -D properly lists all available dictionaries on Windows. 1.5.4 Fixes bug related to the Hungarian dictionary and the command COMPOUNDSYLLABLE 1.5.3 Remove a unneded #include header in the public hunspell.hxx 1.5.2 Fixes backward compatibility with 1.4 at API level. Now it should be complete. 1.5.1 - Lot of stability fixes - Fixed compilation errors on various systems (Windows, FreeBSD) - Small performance improvement compared to 1.4.0 - Added new API with C++ types (string, vector), yet full API backward compatibility with 1.4 is kept 1.4.1 Past begin() iterator decrement error VS Debug build threw error on decrement past begin. 1.4.0 New release that strips out fixed length buffers from large parts of the library Note: dictmgr.hxx header is dropped
2018-10-19Rename analyze, munch and unmunch tools.bsiegert4-7/+50
These names are way too generic to go into bin/, and folks on the mailing list agreed. Now they have a "hunspell-" prefix. Bump revision.
2018-08-22Recursive bump for perl5-5.28.0wiz1-2/+2
2018-08-07hunspell: Specify C++03.jperkin1-2/+2
2018-05-23hunspell: for wide character support, use ncursesw.wiz2-5/+6
The configure script checks for the library name and accepts only ncursesw. Bump PKGREVISION.
2018-01-25hunspell: Fix clang -Wreserved-user-defined-literal error.jperkin2-1/+17
2017-01-04Use the curses framework.roy2-15/+7
2016-07-09Bump PKGREVISION for perl-5.24.0 for everything mentioning perl.wiz1-2/+2
2015-11-04Add SHA512 digests for distfiles for textproc categoryagc1-1/+2
Problems found locating distfiles: Package cabocha: missing distfile cabocha-0.68.tar.bz2 Package convertlit: missing distfile clit18src.zip Package php-enchant: missing distfile php-enchant/enchant-1.1.0.tgz Otherwise, existing SHA1 digests verified and found to be the same on the machine holding the existing distfiles (morden). All existing SHA1 digests retained for now as an audit trail.
2015-08-18Bump all packages that depend on curses.bui* or terminfo.bui* since theywiz1-2/+2
might incur ncurses dependencies on some platforms, and ncurses just bumped its shlib. Some packages were bumped twice now, sorry for that.
2015-08-17Bump PKGREVISION for ncurses shlib bump.wiz1-2/+2
2015-06-12Recursive PKGREVISION bump for all packages mentioning 'perl',wiz1-1/+2
having a PKGNAME of p5-*, or depending such a package, for perl-5.22.0.
2015-06-07Update to 1.3.3, provided by David H. Gutteridge in PR 49949:wiz3-12/+8
Change log: 2014-06-02 Németh László <nemeth at numbertext dot org>: * escape spaces in paths of ODF files 2014-05-28 Németh László <nemeth at numbertext dot org>: * add long path/Unicode path support in WIN32 environment: - hunspell#233 (reported by mahak gark) and LibreOffice fdo#48017 * flat ODF support, eg.: hunspell doc.fodt cat doc.fodt | hunspell -l -O * new options: - -X (XML) input format - -O (ODF or flat ODF) input format - --check-apostrophe: check and force Unicode apostrophe usage (ASCII or Unicode apostrophe has to be in the WORDCHARS section of the affix file) * fix ODF support: - break 1-line XML of ODT documents at </style:style>, too, not only at </text:p> (limiting tokenization problems, when fgets stops within an XML tag) - show ODF file path on the UI instead of the temporary file * fix XML support: - ', ", &, < and > in replacements converted to XML entities - recognize &apos at tokenization, depending from WORDCHARS - &apos; in tokens converted to ' before spell checking and in the output of the pipe interface * better apostrophe usage: - WORDCHARS only with one of the Unicode or ASCII apostrophe results extended word tokenization: both of them will be part of the words (if they are inside: eg. word's, but not words'). - convert Unicode apostrophes to ASCII ones for 8-bit dictionaries (eg. English dictionaries), or for UTF-8 dictionaries only with ASCII apostrophe supports (eg. French dictionaries). * updated manual: - hunspell.4 renamed to hunspell.5, see hunspell#241 reported by Cristopher Yeleighton - updated translations - note about long/Unicode paths in WIN32 (hunspell.3) 2014-04-25 Németh László <nemeth at numbertext dot org>: * OpenDocument support, eg. hunspell *.odt hunspell -l *.odt * always load default personal dictionary (fix filtering bad words - reduce this word list - using it as a personal dictionary workflow) * fix parsing/URL recognition problem (bad tokens with aposthrophes) 2013-07-25 pchang9@cs.wisc.edu * moz#897255 Wasted work in line_uniq * moz#897780 Wasted work in SuggestMgr::twowords 2013-07-25 Caolán McNamara <caolanm at LibO>: * hunspell#167 layout problems with long lines - based on the original fix by xorho adapted to HEAD * rhbz#925562 upgrade config.guess for aarch64 2013-07-24 pchang9@cs.wisc.edu * moz#896301 Wasted work in SfxEntry::checkword * moz#896844 Wasted work in AffixMgr::defcpd_check 2013-06-13 Konstantin Khlebniko * #49 HashMgr::add_word computes wrong size for struct hentry 2013-06-13 Ville Skyttä * #53 Man page syntax fixes 2013-04-19 John Thomson <john thomson at SIL> * win_api: add remove() of Hunspell API (hun#3606435) 2013-04-19 Rouslan Solomokhin <at sf.net> * fix crash in suggestions for 99-character long words by extending arrays of SuggestMgr::forgotchar_* (hun#3595024, also http://crbug.com/130128), thanks to also Pawe&#65533;<82> Hajdan to report the patch 2013-04-01 Caolán McNamara <caolanm at LibO>: * hunspell: -Werror=undef 2013-03-13 Caolán McNamara <caolanm at LibO>: * rhbz#918938 crash in interaction with danish thesaurus 2012-09-18 Németh László <nemeth at numbertext dot org>: * src/hunspell/affixmgr.*: - fix morphological analysis of compound words (hun#3544994, reported by Dávid Nemeskey, fdo#55045) 2012-06-29 Caolán McNamara <caolanm at LibO>: * fix various coverity warnings 2012-01-10 Ehsan Akhgari <ehsan at mozilla dot com> * moz#710940 Firefox Crash [@ AffixMgr::parse_file(char const*, char const*) ] 2011-12-16 Jared Wein <jwein at mozilla dot com> * moz#710967 Incorrect argument passed to strncmp in AffixMgr::parse_convtable 2011-12-06 Caolán McNamara <caolanm at LibO>: * rhbz#759647 fixed tempname of hunSPELL.bak collides with other users when multiple edits in one dir 2011-10-13 Caolán McNamara <caolanm at LibO>: * moz#694002 crash in hunspell affixmgr on exit with bad .aff * leak in hunspell affixmgr with bad .aff 2011-09-19 Caolán McNamara <caolanm at LibO>: * make libparsers.a not installed thanks to TomᚠChvátal 2011-06-23 Caolán McNamara <caolanm at LibO>: * fix some windows compiler warnings 2011-05-24 Németh László <nemeth at numbertext dot org>: * src/hunspell/affixmgr.*: allow twofold suffixes in compounds by extended version of Arno Teigseth's patch, see hun#3288562. - new option for this feature: COMPOUNDMORESUFFIXES 2011-02-16 Németh László <nemeth at numbertext dot org>: * src/*/Makefile.am: fix library versioning, the probem reported by Rene Engerhald and Simon Brouwer. * man/hunspell.4: new version based on the revised version of Ruud Baars
2015-02-20Update MASTER_SITES (for all of hunspell-*), using MASTER_SITE_OPENOFFICE.mef1-2/+3
Previous one was no longer in DNS record. Thanks wiz for review (at pkgsrc-users).
2014-10-05Fix ``Please add a line "# used by foo/bar/Makefile" here.'' warnings.wiz1-3/+10
2014-05-29Bump for perl-5.20.0.wiz1-2/+2
Do it for all packages that * mention perl, or * have a directory name starting with p5-*, or * depend on a package starting with p5- like last time, for 5.18, where this didn't lead to complaints. Let me know if you have any this time.
2013-08-27solaris fix for wide-curses build of hunspellrichard1-1/+2
2013-05-31Bump all packages for perl-5.18, thatwiz1-2/+2
a) refer 'perl' in their Makefile, or b) have a directory name of p5-*, or c) have any dependency on any p5-* package Like last time, where this caused no complaints.
2012-10-25Drop superfluous PKG_DESTDIR_SUPPORT, "user-destdir" is default these days.asau2-6/+2
2012-10-03Bump all packages that use perl, or depend on a p5-* package, orwiz1-1/+2
are called p5-*. I hope that's all of them.
2012-02-13hunspell shlib name change -> recursive bumpwiz1-1/+2
2012-02-13Update to 1.3.2:wiz5-21/+25
2011-02-02: Hunspell 1.3.2 release: - fix library versioning - improved manual 2011-02-02: Hunspell 1.3.1 release: - bug fixes 2011-01-26: Hunspell 1.2.15/1.3 release: - new features: MAXDIFF, ONLYMAXDIFF, MAXCPDSUGS, FORBIDWARN, see manual - bug fixes 2011-01-21: - new features: FORCEUCASE and WARN, see manual - new options: -r to filter potential mistakes (rare words signed by flag WARN in the dictionary) - limited and optimized suggestions 2011-01-06: Hunspell 1.2.14 release: - bug fix 2011-01-03: Hunspell 1.2.13 release: - bug fixes - improved compound handling and other improvements supported by OpenTaal Foundation, Netherlands 2010-07-15: Hunspell 1.2.12 release 2010-05-06: Hunspell 1.2.11 release: - Maintenance release bug fixes 2010-04-30: Hunspell 1.2.10 release: - Maintenance release bug fixes 2010-03-03: Hunspell 1.2.9 release: - Maintenance release bug fixes and warnings - MAP support for composed characters or character sequences
2011-04-22recursive bump from gettext-lib shlib bump.obache1-1/+2
2010-02-19Added LICENSE information.heinz1-1/+2
2009-10-29Support for OO.o dicts without readme files.ahoka2-3/+13
2009-09-22More patching to get package building with Sun Studio.sketch2-1/+22
2009-08-25Get rid of now unnecessary EXTRACT_OPTS_ZIP.wiz1-2/+1
2009-06-14Convert @exec/@unexec to @pkgdir or drop it.joerg1-2/+1
2009-06-14Remove @dirrm entries from PLISTsjoerg2-7/+2
2009-03-20Simply and speed up buildlink3.mk files and processing.joerg1-13/+6
This changes the buildlink3.mk files to use an include guard for the recursive include. The use of BUILDLINK_DEPTH, BUILDLINK_DEPENDS, BUILDLINK_PACKAGES and BUILDLINK_ORDER is handled by a single new variable BUILDLINK_TREE. Each buildlink3.mk file adds a pair of enter/exit marker, which can be used to reconstruct the tree and to determine first level includes. Avoiding := for large variables (BUILDLINK_ORDER) speeds up parse time as += has linear complexity. The include guard reduces system time by avoiding reading files over and over again. For complex packages this reduces both %user and %sys time to half of the former time.
2009-02-08Update to 1.2.8.ahoka4-11/+23
No longer needs ncurses (at least on NetBSD 5.0). Official changelog: 2008-11-01: Hunspell 1.2.8 release: - Default BREAK feature and better hyphenated word suggestion to accept and fix (compound) words with hyphen characters by spell checker instead of by work breaking code of OpenOffice.org. With this feature it's possible to accept hyphenated compound words, such as "scot-free", where "scot" is not a correct English word. - ICONV & OCONV: input and output conversion tables for optional character handling or using special inner format. Example: # Accepting de facto replacements of the Romanian comma acuted letters SET UTF-8 ICONV 4 ICONV ş ș ICONV ţ ț ICONV Ş Ș ICONV Ţ Ț Typical usage of ICONV/OCONV is to manage an inner format for a segmental writing system, like the Ethiopic script of the Amharic language. - Extended CHECKCOMPOUNDPATTERN to handle conpound word alternations, like sandhi feature of Telugu and other writing systems. - SIMPLIFIEDTRIPLE compound word feature: allow simplified Swedish and Norwegian compound word forms, like tillåta (till|låta) and bussjåfør (buss|sjåfør) - wordforms: word generator script for dictionary developers (Hunspell version of unmunch). - bug fixes 2008-08-15: Hunspell 1.2.7 release: - FULLSTRIP: new option for affix handling. With FULLSTRIP, affix rules can strip full words, not only one less characters. - COMPOUNDRULE works with all flag types. (COMPOUNDRULE is for pattern matching. For example, en_US dictionary of OpenOffice.org uses COMPOUNDRULE for ordinal number recognition: 1st, 2nd, 11th, 12th, 22nd, 112th, 1000122nd etc.). - optimized suggestions: - modified 1-character distance suggestion algorithms: search a TRY character in all position instead of all TRY characters in a character position (it can give more readable suggestion order, also better suggestions in the first positions, when TRY characters are sorted by frequency.) For example, suggestions for "moze": ooze, doze, Roze, maze, more etc. (Hunspell 1.2.6), maze, more, mote, ooze, mole etc. (Hunspell 1.2.7). - extended compound word checking for better COMPOUNDRULE related suggestions, for example English ordinal numbers: 121323th -> 121323rd (it needs also a th->rd REP definition). - bug fixes 2008-07-15: Hunspell 1.2.6 release: - bug fix release (fix affix rule condition checking of sk_SK dictionary, iconv support in stemming and morphological analysis of the Hunspell utility, see also Changelog) 2008-07-09: Hunspell 1.2.5 release: - bug fix release (fix affix rule condition checking of en_GB dictionary, also morphological analysis by dictionaries with two-level suffixes) 2008-06-18: Hunspell 1.2.4-2 release: - fix GCC compiler warnings 2008-06-17: Hunspell 1.2.4 release: - add free_list() for C, C++ interfaces to deallocate suggestion lists - bug fixes 2008-06-17: Hunspell 1.2.3 release: - extended XML interface to use morphological functions by standard spell checking interface, spell() and suggest(). See hunspell.3 manual page. - default dash suggestions for compound words: newword-> new word and new-word - new manual pages: hunspell.3, hzip.1, hunzip.1. - bug fixes
2009-01-26Make package build using Sun Studio.sketch3-1/+38
2008-07-20fix PLISTabs1-1/+4
2008-07-19Add skeleton makefiles for handling OO.org supplied dictionaries.ahoka3-2/+55
While here: change my email address.
2008-04-18Supports DESTDIR.joerg1-1/+3
2008-04-14Update to 1.2.2.wiz4-12/+17
pkgsrc change: buildlink3.mk: Bump API_DEPENDS, since shlib name changed. No dependencies in pkgsrc. Release notes: 2008-04-12: Hunspell 1.2.2 release: - extended dictionary (dic file) support to use multiple base and special dictionaries. - new and improved options of command line hunspell: -m: morphological analysis or flag debug mode (without affix rule data it signs the flag of the affix rules) -s: stemming mode -D: list available dictionaries and search path -d: support extra dictionaries by comma separated list. Example: hunspell -d en_US,en_med,de_DE,de_med,de_geo UNESCO.txt - forbidding in personal dictionary (with asterisk, / signs affixation) - optional compressed dictionary format "hzip" for aff and dic files usage: hzip example.aff example.dic mv example.aff example.dic /tmp hunspell -d example hunzip example.aff.hz >example.aff hunzip example.dic.hz >example.dic - new affix compression tool "affixcompress": compression tool for large (millions of words) dictionaries. - support encrypted dictionaries for closed OpenOffice.org extensions or other commercial programs - improved manual - bug fixes 2007-11-01: Hunspell 1.2.1 release: - new memory efficient condition checking algorithm for affix rules - new morphological functions: - stem() for stemming - analyze() for morphological analysis - generate() for morphological generation - new demos: - analyze: stemming, morphological analysis and generation - chmorph: morphological conversion of texts
2008-02-22+ Rename the "ncursesw" option to "wide-curses" and get rid of thejlam2-11/+10
"ncurses" option. "wide-curses" now just toggles whether we use wide or narrow curses, which is a much simpler knob for users. Bump the PKGREVISION to 2.
2007-10-09Remove trailing spaces.martti1-2/+2
2007-09-21Explicitly include iconv dependency as the package would like to use it.joerg1-1/+3
Bump revision.
2007-09-11Fix default path to dictionaries. Ride import.wiz1-1/+9
2007-09-11Add buildlink3.mk file.wiz1-0/+19
2007-09-11Initial import of hunspell-1.12.2, packaged for pkgsrc-wip by Adam Hoka,wiz5-0/+92
updated to latest version by me: Hunspell is the default spell checker of OpenOffice.org office suite and expectant spell checker of Mozilla Firefox and Thunderbird. Main features: * Unicode support. * Conditional and multiple affixes for languages with rich morphology. * Extended compound word support. * Morphological analysis (in custom item and arrangement style). * Hunspell is based on MySpell and works also with MySpell dictionaries. * GPL/LGPL/MPL tri-license