summaryrefslogtreecommitdiff
path: root/graphics/tesseract
AgeCommit message (Collapse)AuthorFilesLines
2007-07-28Update to 2.00, provided by Rumko on pkgsrc-users.wiz7-73/+67
July 02 2007 - V2.00 Converted internal character handling to UTF8. Trained with 6 languages. Added unicharset_extractor, wordlist2dawg. Added boxfile creation mode. Added UNLV regression test capability. Fixed problems with copyright and registered symbols. Fixed extern "C" declarations problem.
2007-05-18Initial import of tesseract-1.04b from pkgsrc-wip (packaged by heinz@wiz9-0/+396
and myself): This code is a raw OCR engine. It has NO PAGE LAYOUT ANALYSIS, NO OUTPUT FORMATTING, and NO UI. It can only process an image of a single column and create text from it. It can detect fixed pitch vs proportional text. Having said that, in 1995, this engine was in the top 3 in terms of character accuracy, and it compiles and runs on both Linux and Windows. Another current limitation is that it only recognizes English and its character set is only US-ASCII. Training code IS included in the open source release however, and will be included in a future release.