summaryrefslogtreecommitdiff
path: root/graphics/tesseract
AgeCommit message (Collapse)AuthorFilesLines
2008-05-30Update to 2.03:wiz5-22/+53
January 23 2008 - V2.02 Improvements to clustering, training and classifier. Major internationalization improvements for large-character-set languages, eg Kannada. Removed some compiler warnings. Added multipage tiff support for training and running. Updated graphics output to talk to new java-based viewer. Added ability to save n-best lists. Added leptonica support for more file types. Improved Init/End to make them safe. Reduced memory use of dictionaries. Added some new APIs to TessBaseAPI. April 21 2008 - V2.02 (again) Fixed namespace collisions with jpeg library (INT32). Portability fixes for Windows for new code. Updates to autoconf system for new code. April 22 2008 - V2.03 Fixed crash introduced in 2.02. Fixed lack of tessembedded.cpp in distribution. Added test for leptonica header files and conditional test for lib.
2007-11-29Update to 2.01:wiz3-7/+9
August 27 2007 - V2.01 Fixed UTF8 input problems with box file reader. Fixed various infinite loops and crashes in dawg code. Removed include of config_auto.h from host.h. Added automatic wctype encoding to unicharset_extractor. Fixed dawg table too full error. Removed svn files from tarball. Added new functions to tessdll. Increased maximum utf8 string in a classification result to 8.
2007-07-28Update to 2.00, provided by Rumko on pkgsrc-users.wiz7-73/+67
July 02 2007 - V2.00 Converted internal character handling to UTF8. Trained with 6 languages. Added unicharset_extractor, wordlist2dawg. Added boxfile creation mode. Added UNLV regression test capability. Fixed problems with copyright and registered symbols. Fixed extern "C" declarations problem.
2007-05-18Initial import of tesseract-1.04b from pkgsrc-wip (packaged by heinz@wiz9-0/+396
and myself): This code is a raw OCR engine. It has NO PAGE LAYOUT ANALYSIS, NO OUTPUT FORMATTING, and NO UI. It can only process an image of a single column and create text from it. It can detect fixed pitch vs proportional text. Having said that, in 1995, this engine was in the top 3 in terms of character accuracy, and it compiles and runs on both Linux and Windows. Another current limitation is that it only recognizes English and its character set is only US-ASCII. Training code IS included in the open source release however, and will be included in a future release.