summaryrefslogtreecommitdiff
path: root/textproc
AgeCommit message (Collapse)AuthorFilesLines
2008-04-14Add buildlink3.mk file for link-grammar.wiz1-0/+19
2008-04-14+ link-grammar.wiz1-1/+2
2008-04-14Import link-grammar-4.3.4 as textproc/link-grammar.wiz6-0/+145
This package contains a *patched* version of the final original release of the Link Grammar Parser. It has been patched to fix a few bugs, add a few enhancements, and, in general, make the Link Grammar Parser easier to use. This version includes Java bindings.
2008-04-14Update to 1.4.0.wiz8-56/+69
Change default backend to hunspell. aspell support is now a (disabled by default) option. Add some other options. Remove aspell from includes in buildlink3.mk. The backends are abstracted into dynamically loaded modules and don't need to be pulled in by buildlink3.mk. Release notes: Voikko (Finnish) language support. Zemberek (Turkish) language support. Better support for Unicode in the personal dictionaries. Personal dictionaries offer better suggestions. OpenOffice's dictionaries are used on Windows. Aspell works on Windows. This release can use a system-wide Hunspell/Myspell installation on Unix-like platforms. Hunspell 1.2.1 and NET bindings are required. This release has more lax language matching rules. It uses XDG's data-dirs spec for locating dictionaries. There are many unit tests and bugfixes.
2008-04-14Update to 1.2.2.wiz4-12/+17
pkgsrc change: buildlink3.mk: Bump API_DEPENDS, since shlib name changed. No dependencies in pkgsrc. Release notes: 2008-04-12: Hunspell 1.2.2 release: - extended dictionary (dic file) support to use multiple base and special dictionaries. - new and improved options of command line hunspell: -m: morphological analysis or flag debug mode (without affix rule data it signs the flag of the affix rules) -s: stemming mode -D: list available dictionaries and search path -d: support extra dictionaries by comma separated list. Example: hunspell -d en_US,en_med,de_DE,de_med,de_geo UNESCO.txt - forbidding in personal dictionary (with asterisk, / signs affixation) - optional compressed dictionary format "hzip" for aff and dic files usage: hzip example.aff example.dic mv example.aff example.dic /tmp hunspell -d example hunzip example.aff.hz >example.aff hunzip example.dic.hz >example.dic - new affix compression tool "affixcompress": compression tool for large (millions of words) dictionaries. - support encrypted dictionaries for closed OpenOffice.org extensions or other commercial programs - improved manual - bug fixes 2007-11-01: Hunspell 1.2.1 release: - new memory efficient condition checking algorithm for affix rules - new morphological functions: - stem() for stemming - analyze() for morphological analysis - generate() for morphological generation - new demos: - analyze: stemming, morphological analysis and generation - chmorph: morphological conversion of texts
2008-04-13Update textproc/ruby-xslt to 0.9.6. Changes from version 0.9.5 includejlam3-8/+7
plugging some severe memory leaks.
2008-04-12Convert to use PLIST_VARS instead of manually passing "@comment "jlam6-38/+37
through PLIST_SUBST to the plist module.
2008-04-11In substitution, avoid space on CONF_DIR= to make it valid /bin/sh.gdt1-4/+5
PYTHON_PATCH two more scripts. PKGREVISION++
2008-04-11update to 0.8.0drochner3-8/+9
changes: * Add new attributes for hidden (NoDisplay) and default section (DocDefaultSection) to the .document file parsing * Increase scrictness of parsing in line with the spec. * omf files now return (approximate) fd.o categories * Add new requirement to define I_KNOW_RARIAN_0_8_IS_UNSTABLE before use * rarian.h is now a general inclusion guard and main functions have moved to rarian-main.h * Update example program to use new features -bugfixes pkgsrc note: While 0.8.0 is marked unstable, it is required by the upcoming 2.22 gnome release. There are no compatibility problems for clients installing help files, but build of yelp (the help browser) will be broken until it is updated.
2008-04-10Introduce variable MECAB_CHARSET for default charset of MeCab.obache2-2/+10
close PR 38040.
2008-04-07Add & enable ruby-fastercsvseb1-1/+2
2008-04-07Initial import of ruby-fastercsv as version 1.2.3 into the NetBSDseb4-0/+58
Packages Collection. FasterCSV is intended as a complete replacement to the Ruby CSV standard library. It is significantly faster and smaller while still being pure Ruby code. It also strives for a better interface.
2008-04-07Add p5-String-Random.he1-1/+2
2008-04-07Import p5-String-Random, which generates random strings.he4-0/+24
2008-04-06Remove MASTER_SITES since now it use RubyGems' MASTER_SITES.taca1-2/+1
2008-04-05Update dictem to 0.82, per maintainer update request in PR 38339.obache2-10/+12
Notes: - FIX for emacs-22 (insert-string is replaced with insert) - Minor fixes in README
2008-04-04Add and enable new ruby-* packages.jlam1-1/+11
2008-04-04RUBY_REPLACE_DIRS is relative to ${WRKSRC} so no need to give the absolutejlam1-2/+2
path to these directories.
2008-04-04Install as a gem using the pkgsrc rubygem.mk framework instead ofjlam14-101/+168
directly into site_ruby.
2008-04-04DESTDIR support. Fix permissions.joerg3-2/+24
2008-04-04DESTDIR supported.joerg1-1/+2
2008-04-04Initial import of ruby18-xslt-0.9.5 as textproc/ruby-xslt.jlam4-0/+58
Ruby/XSLT is a simple XSLT class based on libxml <http://xmlsoft.org/> and libxslt <http://xmlsoft.org/XSLT/>.
2008-04-04Update ruby-xmlparser to version 0.6.81. Changes from version 0.6.8.1jlam5-147/+87
include: + Install as a Ruby gem. * Fix bug in openInputStream().
2008-04-04Update ruby-rttool to version 1.0.2.0. Changes from version 1.0.2jlam4-52/+82
are only that this now installs as a gem, but the gem has a slightly different version number.
2008-04-04Initial import of ruby18-rison-1.2.1 as textproc/ruby-rison.jlam4-0/+43
Ruby-rison is a pure Ruby parser for Rison, a data serialization format optimized for compactness in URIs. Rison is a slight variation of JSON that looks vastly superior after URI encoding. Rison still expresses exactly the same set of data structures as JSON, so data can be translated back and forth without loss or guesswork.
2008-04-04Update ruby-maruku to version 0.5.8. Chanages from version 0.5.6 include:jlam3-72/+241
+ Install as a Ruby gem. * Fixed bugs: * Fix bug in which links `<http://..>` at beginning of lines could sometimes be mistaken for HTML. * Empty cells in table are now allowed. * Now this is accepted (Maruku did not like the "." inside the link) [a. b] is a link. [a. b]: http://site.com/ * Fix bug about double-encoding of ampersands in code blocks. * Fixed compatibility bug with Ruby 1.8.6 patchlevel 110.
2008-04-04Initial import of ruby18-markaby-0.5 as textproc/ruby-markaby.jlam4-0/+41
Markaby is a templating language for Ruby, with a plugin for Rails, which allows you to write HTML templates in pure-Ruby (a la Builder.)
2008-04-04Update ruby-libxml to version 0.5.4. Changes from version 0.3.8.4 include:jlam3-20/+157
+ Install as a Ruby gem. * Added XML::Reader, a set of bindings to the xmlTextReader API. * Other changes were made, but they were done on a branch with no changelog available.
2008-04-04Initial import of ruby18-json-pure-1.1.2 as textproc/ruby-json-pure.jlam4-0/+136
This is a implementation of the JSON specification according to RFC 4627. You can think of it as a low fat alternative to XML, if you want to store data to disk or transmit it over a network rather than use a verbose markup language. The JSON generator escapes all non-ASCII an control characters with \uXXXX escape sequences and supports UTF-16 surrogate pairs in order to be able to generate the whole range of Unicode code points. This means that generated JSON text is encoded as UTF-8 (because ASCII is a subset of UTF-8) and at the same time avoids decoding problems for receiving endpoints that don't expect UTF-8 encoded texts. This package is a pure Ruby variant that relies on the iconv and the stringscan extensions, which are both part of the Ruby standard library.
2008-04-04Initial import of ruby18-json-1.1.2 as textproc/ruby-json.jlam5-0/+169
This is a implementation of the JSON specification according to RFC 4627. You can think of it as a low fat alternative to XML, if you want to store data to disk or transmit it over a network rather than use a verbose markup language. The JSON generator escapes all non-ASCII an control characters with \uXXXX escape sequences and supports UTF-16 surrogate pairs in order to be able to generate the whole range of Unicode code points. This means that generated JSON text is encoded as UTF-8 (because ASCII is a subset of UTF-8) and at the same time avoids decoding problems for receiving endpoints that don't expect UTF-8 encoded texts. This package is fast C extension variant which is in parts implemented in C and comes with its own Unicode conversion functions and a parser generated by the Ragel State Machine Compiler.
2008-04-04Update ruby-hpricot to version 0.6. Changes from version 0.5.140 include:jlam4-116/+75
+ Install as a Ruby gem. * Hpricot for JRuby * Inline Markaby for Hpricot documents. * XML tags and attributes are no longer downcased like HTML is. * new syntax for grabbing everything between two elements using a Range in the s earch method: (doc/("font".."font/br")) or in nodes_at like so: (doc/"font").nod es_at("*".."br"). Only works with either a pair of siblings or a set of a parent and a sibling. * Ignore self-closing endings on tags (such as form) which are containers. Treat them like open parent tags. * Escaping of attributes. * Element#raw_attributes gives unescaped data. Element#attributes gives escaped. * Added: Elements#attr, Elements#remove_attr, Elements#remove_class. * Added: Traverse#preceding, Traverse#following, Traverse#previous, Traverse#next.
2008-04-04Initial import of ruby18-haml-1.8.2 as textproc/ruby-haml.jlam4-0/+177
Haml is a markup language that's used to cleanly and simply describe the XHTML of any web document without the use of inline code, using indentation rather than closing tags and allowing Ruby to be embedded with ease. Haml functions as a replacement for inline page templating systems such as PHP, ASP, and ERB, the templating language used in most Ruby on Rails applications. However, Haml avoids the need for explicitly coding XHTML into the template, because it itself is a description of the XHTML, with some code to generate dynamic content.
2008-04-04Update ruby-ferret to version 0.11.6. Changes from version 0.11.4jlam3-49/+232
include: + Install as a Ruby gem. * Fixed major bug in term vectors which was in turn affecting highlighting * Fixed memory leak in PerFieldAnalyzer * Fixed range query highlighter * Fixed memory alignment issues on Solaris * Added :use_keywords option to query parser so you can now turn of keywords so a search for OR will work * multiple other bug fixes
2008-04-04Update ruby-feed-normalizer to version 1.5.1. Changes from version 1.3.0jlam3-28/+38
include: + Install as a Ruby gem. * Add support for new fields: * Atom 0.3: issued is now available through entry.date_published. * RSS: feed.skip_hours, feed.skip_days, feed.ttl * All: entry.last_updated, this is an alias to entry.date_published for RSS. * Rewrite relative links in content * Handle CDATA sections consistently across all formats. * Prevent SimpleRSS from doing its own escaping. * Reparse Time classes * Support content:encoded. Accessible via Entry#content. * Support categories. Accessible via Entry#categories. * Introduces a new parsing feature 'loose parsing'. * Add support for applicable dublin core elements. (dc:date and dc:creator) * Feeds can now be dumped to YAML. * Reduced the greediness of a regexp that was removing html comments.
2008-04-04Initial import of ruby18-diff-lcs-1.1.2 as textproc/ruby-diff-lcs.jlam4-0/+47
Diff::LCS is a port of Perl's Algorithm::Diff that uses the McIlroy-Hunt longest common subsequence (LCS) algorithm to compute intelligent differences between two sequenced enumerable containers.
2008-04-04Initial import of ruby18-coderay-0.7.4.215 as textproc/ruby-coderay.jlam4-0/+83
CodeRay is fast syntax highlighter for Ruby and other languages. It produces colorful, valid XHTML. CodeRay's design goal: simple, beautiful code highlighting for your board/wiki/blog/doc/website.
2008-04-04Initial import of ruby18-builder-2.1.2 as textproc/ruby-builder.jlam4-0/+51
Builder provide a simple way programmatically create XML markup and data structures within Ruby.
2008-04-04Initial import of ruby18-bluecloth-1.0.0 as textproc/ruby-bluecloth.jlam4-0/+49
BlueCloth is a Ruby implementation of Markdown, a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).
2008-03-28Add dependencies which I managed to miss earlier. Bump PKGREVISION.kleink1-1/+5
2008-03-20fix RE vulnerabilities (CVE-2007-(4770|4771)), patch from redhatdrochner8-3/+322
via Gentoo bug #208001, bump PKGREVISION
2008-03-15Add DESTDIR support, fixing normal installation as well.joerg4-21/+38
2008-03-15Python itself is supposed to be DESTDIR safe.joerg1-1/+3
2008-03-14Update to 2.0:wiz2-6/+6
iso-codes 2.0 ------------- Tobias Toedter <t.toedter@gmx.net> Sat, 8 March 2008 [ General ] * Recommend the Translation Project in README for adding or updating translations. [ ISO-639 ] * Update to 2008-04-05 (Alsatian, alsacien, suisse alémanique added as alternative names) [ ISO-639 translations ] * Thai by Theppitak Karoonboonyanan. * French by C. Perrier (consistent capitalization) * Vietnamese by Clytie Siddall (TP) [ ISO-3166 translations ] * Traditional Chinese by Tetralet. Closes: #464858 * Korean by Changwoo Ryu (TP) * Thai by Theppitak Karoonboonyanan. Closes: #465560 * Galician by Jacobo Tarrío. Closes: #469091 * Hebrew by Lior Kaplan * Indonesian by Arief S Fitrianto. Closes: #469947 [ ISO-3166-2 ] * Fixed country codes for Belarus, Jersey and Serbia, thanks to Kamal Mostafa. * Removed geographical_region elements from Cuba, United Kingdom and USA. [ ISO-15924 translations ] * Thai by Theppitak Karoonboonyanan. Closes: #465242 [ ISO-4217 translations ] * Estonian by Ain Vagula (TP)
2008-03-13Support user-destdir installation.jlam1-6/+8
2008-03-13Update namazu package to 2.0.18.taca4-17/+19
Overview of Changes in Namazu 2.0.18 - May 12, 2008 * Add 'Charset' directive. * "charset" was added to "ContentType" of the example in conf/namazurc-sample. * "charset" was added to the response header in Error messages for namazu.cgi. * Add HTML, BODY tags in Error messages for namazu.cgi. * '\'', '(', ')' is converted into "&#39;", "&#40;", "&#41;" respectively. * Add po/{de, pl}.po files. (But, it doesn't translate.) * Change charset from SJIS to Shift_JIS in po/ja_SJIS.po. * Change soname (LTVERSION 8:0:1) * pltests/env.pl: The checked environment variable and version of the checked Perl module is added. * pltests/mknmz-8.pl.in: The confirmation whether the index has been updated is added. * pltests/namazu-cgi-12.pl.in: Add new test. * tests/mknmz-9: Expand test file. * filter/hnf.pl: Correspondence GRP and bug fix. * conf/*.win32: Add new files.
2008-03-13Update to 2.9:wiz6-69/+71
Version 2.9 * language definition for C (not C++) files * language definition for properties files * language definition for KDE desktop and ini files * language definition for lsm files (Linux Software Map) * language definition for rpm spec files * language definition for Haxe files (thanks to Jos Hirth) * style.defaults for associating a style for an element (whose style is not specified) to the style of another element * highlight some KDE programming files (e.g., .rc, .kcfg, etc.) * correctly highlight for less when filenames contain paths * fixed a bug in file inclusion of langdef files * fixed compilation problems for fileutil.cc (thanks to Adrian Reber) * xml elements are correctly recognized when containing . (thanks to Toby White) * references for xhtml output files Version 2.8 * lang definition for slang (by John E. Davis) * correctly handle words in ' and ` regular expression strings * the right delimiter of a delim element can refer to marked subexpressions in the left delimiter * fixed the definition of C-style comments which are not nested * improved perl syntax highlighting * javascript regular expression highlighting * padding character for line numbers can be specified (thanks to Roger Nilsson) * removed non standard % make rules Version 2.7 * fixed language association for log files. * use standard sed arguments * check that the ctags program supports the options used by source-highlight and disable ctags tests if it does not * removed some memory leaks from scanners and parsers * fixed regular expression highlighting strings in perl (thanks to Elias Pipping) * regexp language element * infer script languages also checking for the env specification * improved error reporting for lang definition files * ` ` syntax for regular expressions that permits backreferences and conditionals * explicit naming for subexpressions syntax * added a program, check-regexp, for checking regular expressions on the command line * fix html tag definitions * fix ruby regexp definition * --doc option and references for docbook output * xhtml output with non fixed font Version 2.6 * language definition for makefiles * language definition for css files * language definition for m4 files * fixed some problems in xml.lang * fixed some problems in sh.lang * the ctags found during configure is correctly used in makefiles * --quiet option showing no progress information (thanks to C. Michael Pilato) * handle direct color specifications in double quotes in style files * in style files can specify formatting options for more than one element on the same line * accept css specifications as style specifications (limited support) * handle background color for some output formats (e.g., xhtml) * in style files the background color (for the entire output) can be specified * fixed a problem in configure script on some BSD systems (thanks to Thomas Klausner)
2008-03-12update to 0.12.2drochner3-7/+8
changes: -Added DOCTYPE declarations to HTML and XHTML pages -Added EXSLT set to the excluded prefixes -translation updates
2008-03-12+ WordNet.wiz1-1/+2
2008-03-12Initial import of WordNet-3.0:wiz4-0/+171
WordNet is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet's structure makes it a useful tool for computational linguistics and natural language processing.
2008-03-11Put back a couple of IRIX conditionals the way they used to behave,tnn1-2/+2
e.g. match IRIX 5.x but not 6.x. Some of these may indeed apply to 6.x too, but let's be conservative. PR pkg/38224.