summaryrefslogtreecommitdiff
path: root/textproc/xapian-omega
AgeCommit message (Collapse)AuthorFilesLines
2017-01-01Update to 1.4.2. From the changelog:schmonz2-7/+7
documentation: * Replace auto-generated list of the supported MIME types with an auto-generated table showing the extensions that are mapped to each MIME type by default. Partly addresses #569, reported by catkin. indexers: * omindex: Add support for indexing markdown files (extension .md or .markdown, mime-type text/markdown, using "markdown" to convert to HTML). testsuite: * Add support for "make installcheck" to run tests against installed version. build system: * configure: Fail with clear error with xapian-core < 1.4.0. portability: * Fix GCC -Wimplicit-fallthrough warning. * Add missing <ctime> for time_t. * Avoid snprintf_for formatting fixed-width integers - it results in warnings about possible output truncation with GCC7 (which aren't actually possible due to limited input range) and it's a bit heavyweight for this job anyway.
2016-11-07Recursive bump for xapian shlib major bump.wiz1-2/+1
2016-11-07Update to 1.4.1. From the changelog:schmonz5-27/+35
omindex: + Also index leafname with _ and & replaced by spaces. Literal spaces are often avoided in filenames, and "hello_world.txt" ought to be searchable for via "hello" and "world". Partly addresses #618, reported by Julien Pfefferkorn. + Make named entity look-up (e.g. &eacute; -> 233) use the same keyword-lookup table approach we already use for HTML tags and built-in MIME content-types, rather than a std::map, which makes it faster while using less memory. + Avoid using the shell to run most external commands as it's unnecessary overhead. For the built-in filters, the only cases which now use a shell are where we run two unzip commands. For user-specified commands, a simple and slightly conservative test is used, which should avoid a shell in most common cases where it isn't needed. Notably, environment variables set before the command are handled. + Track files which couldn't be indexed in the user metadata and skip them by default on subsequent runs to avoid the costs of repeatedly running a filter on a file it can't handle. Run omindex with --retry-failed to retry such files. + Overhaul the "per-site" terms: - 'H' prefix is hostname as before, except that if the term would be > 240 bytes (unlikely but possible) the end is hashed is the same way 'U' prefix terms are. - 'P' terms are now added for every directory level, not just the start URL's path. - A new 'J' prefix term is added with the start URL (less any trailing '/'), which means all files indexed from a particular "site" are now indexed by one term. See #376. + Add 'skip' pseudo-mimetype which extensions can be mapped to, and they will then be reported and skipped (to complement the existing 'ignore' pseudo-mimetype which causes files with the specified extension to be quietly ignored). + Treat a command of 'true' specially as meaning make the text extraction a no-op (as actually running /bin/true effectively would). This provides a way to index some file types by only meta-data. Fixes #519, reported by Brian Burton. + Add support for wildcard mimetypes */* and *. Combined with filter command ``true`` for indexing by meta-data only, you can specify a fall back case of indexing by meta-data only using ``--filter '*:true'``. From a suggestion by Brian Burton on xapian-discuss. + Index message/rfc822 and message/news. These are individually saved email messages and news articles. + Index archived web page formats MAFF and MHTML. + Handle .xla, yet another XL extension. + Handle metadata in LibreOffice HTML export (dcterms.subject, dcterms.description, dcterms.creator and dcterms.contributor). + Use zlib's gzopen() instead of invoking "gzip -dc" for compressed Abiword documents. + Add support for %f in command passed to --filter to allow specifying commands where the input file is not the final argument. Fixed #570, reported by Charles Atkinson. + Allow --filter to handle commands which produce output in a temporary file rather than on stdout. + Allow --filter to specify the character set of the output the filter produces. + Handle application/vnd.ms-excel, text/x-perl and application/x-dvi via default --filter settings instead of hardcoded cases (now possible thanks to the new abilities that --filter has). + Add support for specifying a MIME subtype of '*' in --filter arguments. + Add -track-ctime option to allow omindex to pick up changes to file ownership and permissions. + Index terms from the leafname with an 'F' prefix, rather than treating them as more body text. (Fixes #633, reported by Emmanuel Garette) + The starting URL wasn't previously URL encoded. In 1.2.18, a minimally intrusive fix was implemented. In 1.3.2, we now encode the starting URL as we do for the rest of the filename. + Don't assume .doc is application/msword but let libmagic decide, since .doc files may actually be RTF, and sometimes people use .doc for plain-text documentation. + Add support for indexing 'topic' and 'created date' meta-data for OpenDocument format and HTML. + Index "topic" for PDF documents. + Commit changes and exit, rather than skipping the current file on most unexpected errors reading directories or initialising libmagic - otherwise we can end up deleting a lot of database entries on errors like EHOSTDOWN when indexing network mounts. + Add --opendir-sleep=SECS option to allow working around problems with indexing files on Microsoft DFS shares. + If we get ENOTDIR trying to index a file, skip it quietly (unless in verbose mode) as we already do if we get ENOENT, since ENOTDIR is what we get if the file and the directory it was in got removed between us getting the filename and trying to open it. + Handle ENOENT, ENOTDIR and EACCES from readdir(). + If we've already opened the file (as we often will have if using a modern libmagic with magic_descriptor() available), then use fstat() on that fd rather than stat()/lstat() on the pathname. + Pass error message string and errno value in ReadError exceptions. + Report strerror(errno) if we can't read a file. + Filtering via text/html now handles HTML documents which specify a charset. + Add support for indexing Microsoft Publisher files using pub2xhtml. + Restrict the length of what we consider to be an extension, currently to 7 characters or whatever the longest extension in the mime_map is if it is longer. + Avoid '//' in temporary filenames (cosmetic only). + Extend --filter to handle commands which produce HTML on stdout. + Don't report an error if a file is deleted (or renamed) between us reading the directory entry for it and trying to read the file itself by default. In --verbose mode, the situation is still reported, but now with a specific message. + If omindex receives any of the signals SIGHUP, SIGINT, SIGQUIT or SIGTERM, then kill any active external filter child process, then handle the signal as we did before. If setpgid() is available, put each external filter in its own process group and kill the whole process group when we get a signal. + Use magic_descriptor() if the version of libmagic we're building against is new enough to have it. This eliminates an extra opening of a file being indexed in certain cases. + Use rst2html to handle .rst and .rest files. + Index title with an 'S' prefix rather than no prefix. + If the document with the highest existing docid before the run was updated, we were reporting it as "added", but now we correctly report it as "updated". + Catch and report std::exception explicitly, so failing to allocate memory is no longer reported as "Unknown exception". omindex-list: New tool to list URLs of all the documents in a database (or list of databases) indexed by omindex. * The HTML parser now explicitly handles <APPLET>, <OBJECT> and <TR>. * Use a generated compact and efficient table to convert HTML tag names to enum codes - this is both faster and smaller than the approach we were using, with the benefit that the table is auto-generated. * Always use our built-in conversion code for the character sets it can handle (previously we'd use iconv if available; now we only use iconv for other character sets). This gives us more consistent results, and in particular means we now handle BOMs better (at least when using GNU iconv). * A lot of data labelled as "iso-8859-1" is actually "windows-1252". The two only differ in characters which are control characters in iso-8859-1, so assume the latter when we see the former. scriptindex: + Remove special error handling case noting that index=nopos was replaced with indexnopos - this was removed in 1.1.0 so there's been enough time to upgrade. omega: * Add support for sorting by more than one value - e.g. SORT=+1,-2 * Add $msizelower and $msizeupper which provide access to the lower and upper bounds on the number of matches. * Add support for $set{weighting,coord}. * Add weightingpurefilter option. Normally a query consisting only of filter terms won't have relevance weights calculated. This new option allows you to specify a weighting scheme to use for such queries, with the same values supported as for the existing weighting option. For example, $set{weightingpurefilter,coord} will weight such queries by how many filter terms match each document. * $filters now includes DATEVALUE, which means we'll force the first page when reloading or changing page starting from existing URLs upon upgrade to 1.4.1, but the exact same existing URL could be for a search without the date filter where we want to force the first page, so there's an inherent ambiguity there. Forcing first page in this case seems the least problematic side-effect. * Implement $match command for omegascript. Patch from Richhiey Thomas. * Add optional prefix argument to $terms. * $snippet now uses MSet::snippet() instead of the Snipper class. * Add $contains{STRING1,STRING2}. Contributed by Ayush Gupta. * Add support for negated boolean filter terms, specified by CGI parameter "N". * Support a direction prefix on SORT: '+' for ascending, '-' for descending. SORTREVERSE set to non-0 now flips the direction. Fixes #697, reported by Andy Chilton. * Add options argument to $transform. * Cache compiled regexps used in $transform. * Add $ord OmegaScript command which returns the Unicode codepoint for the first character of a UTF-8 string. * Add $chr OmegaScript command which returns the UTF-8 string for given Unicode codepoint. * Add $csv OmegaScript command which escapes a string for use as a field in a CSV file ("always quote" mode inspired by patch from Gaurav Arora.) * New $filters encoding which avoids collisions. We also compare CGI parameter xFILTERS to what $filters would have returned in previous releases, so that on upgrades old format serialised filters are handled correctly. * Fix $jsonarray not to prepend ']' to the first array element. * Skip weighting scheme setup for a pure date range query - it won't be weighted anyway, so we can avoid having to parse weighting scheme parameters, etc. * Use value ranges when date range filtering by value. Should be more efficient than a MatchDecider, and will automatically take advantage of any future value range optimisations in xapian-core. * Add default_db and default_template config options. These allow the default template and default database name to be set via the config file, rather than being stuck with the respective defaults of "default" and "query". Fixes #310, reported by Marco Hennigs. * Add support for non-exclusive filters. Fixes #234, reported by Thomas Viehmann. * Fix handling of multiple P.<prefix> fields - previously only the first seen was used. These fields are also now taken into account when deciding if the query has changed. $query now returns an OmegaScript list with one entry for each CGI parameter passed. * Allow setting query expansion scheme to "bo1". * Make the $json and $jsonarray force the text to be valid UTF-8, since otherwise the output isn't valid JSON. * Check parameters to $set{weighting,bm25 ...} and $set{weighting,trad ...} converted OK. Based on patch from Aarsh Shah. * Add support to $set{weighting,...} for bb2, dlh, dph, ifb2, ineb2, inl2, lm, pl2 when we're built against a xapian-core which is new enough to have these schemes. * Add $snippet to generate a snippet of text tailored to the search. * Add new $json and $jsonarray OmegaScript commands to support producing JSON output. * Add $truncate command which truncates a string after a word. * Add support for $set{weighting,tfidf} to allow the new TfIdfWeight weighting scheme to be used. + DEFAULTOP now defaults to AND rather than OR, since that matches what pretty much every search engine does these days. Closes ticket#512. * Allow mapping a query string prefix to more than one term prefix (which xapian-core has supported since 1.0.4). * Add support for search inputs for multiple probabilistic prefixes, with support for per-prefix stemmers. * Drop legacy support for handling '.' separated terms in xP - that changed in Omega 0.9.7, more than 5 years ago now. * Remove support for OLDP CGI parameter which was superseded by xP approximately a decade ago, and isn't even documented! * Drop special handling for R-prefixed terms in $prettyterm - we stopped generating these in Xapian 1.0. templates: * Lower case all HTML tags, attributes and values; explicitly close <option> tags. Patches from Vivek Pal and Nirmal Singhania. * Migrate Omega Templates to HTML5. Patch from Nirmal Sighania. * templates/query: Remove stray double quote from generated URL for spelling suggestion when THRESHOLD is set. Patch from Nirmal Singhania. * templates/opensearch: Change response feeds to support OpenSearch 1.1. Patch from Nirmal Singhania. * templates/query: Fix setting setting of prefix map for P - in 1.3.2, this would failed to also search in the subject. Now it also searches in the subject and topic. * templates/query: + We now map unprefixed queries to include S-prefixed terms to match the change in omindex to prefixing terms from the title with S. You may want to make the same update to your own templates. + Set up prefixes for 'author:' and 'title:'.
2016-07-09Bump PKGREVISION for perl-5.24.0 for everything mentioning perl.wiz1-1/+2
2016-04-30Update to 1.2.23. From the changelog:schmonz2-7/+7
documentation: * Update links to Xapian website and trac to use https, which is now supported, thanks to James Aylett. indexers: * Fix HTML/XML entity decoding to be O(n) not O(n²) - processing HTML/XML with a lot of entities is now much faster. templates: * Remove unused country code to name maps. These were intended as examples, but they aren't very useful as such, and really just bloat the templates needlessly.
2016-01-13Update to 1.2.22. From the changelog:schmonz2-8/+7
documentation: * Stop maintaining ChangeLog files. They make merging patches harder, and stop 'git cherry-pick' from working as it should. The git repo history should be sufficient for complying with GPLv2 2(a). * Clarify help text for omindex --mime-type option. * docs/omegascript.rst: + Fix documentation of $last to say it's the MSet index *one beyond* the end of the current page. Reported by Andrew Chilton. + Clarify that $split and $substr work in bytes. Previously we said "characters" which could be taken as meaning they work with UTF-8 characters. + Update documentation for $filters - it was missing these CGI parameters from the list of those serialised: COLLAPSE, DOCIDORDER, SORT, SORTREVERSE, SORTAFTER + Explicitly note user can use $setmap to create their own maps. * docs/overview.rst: + SVG extraction is built-in too. + Expand paragraph about command `false`. Note the versions where explicit support was added, and that this will also work with any version on Unix, where `false` is a command. + Document `cdb_dir`. * docs/cgiparams.rst: Document behaviour if xDB is not set. * Change "characters" to "bytes" in a few places to clarify that we don't mean Unicode code points. indexers: * omindex: + Add '--title-size' option. + Handle .oft the same way as .msg - it's some sort of template email, and has essentially the same format. omega: * Make $querydescription ensure the match has been run, so that it includes filters. * Avoid $allterms, $cgilist, $filterterms and $terms being O(n²) in the number of items in the returned list. * If xFILTERS is not set, don't force the first page as that's unhelpful if someone fails to set it in their template. * When environment variable SERVER_PROTOCOL is set to INCLUDED (as it is when we're being included in a page), we already suppress the HTTP headers, but now we suppress the blank line after the header too. * Support option flag_cjk_ngram if built against xapian-core >= 1.2.22. testsuite: * Add test coverage for parsing of HTML entities. build system: * Fix error reporting if PCRE isn't installed. Fixes #693, reported by lhz7370. portability: * Avoid warning when building with glibc >= 2.21. * Don't provide our own implementation of sleep() under __WIN32__ if there already is one - mingw provides one, and in some situations it seems to clash with ours. Reported to xapian-discuss by John Alveris. * Stop trying to use O_STREAMING - the patch to implement it was never merged into the Linux kernel, and I can't find any evidence that other platforms implement it. The constant value O_STREAMING used now seems to be used for the part of O_SYNC which isn't covered by O_DSYNC, which seems likely to hurt performance if anything.
2015-11-04Add SHA512 digests for distfiles for textproc categoryagc1-1/+2
Problems found locating distfiles: Package cabocha: missing distfile cabocha-0.68.tar.bz2 Package convertlit: missing distfile clit18src.zip Package php-enchant: missing distfile php-enchant/enchant-1.1.0.tgz Otherwise, existing SHA1 digests verified and found to be the same on the machine holding the existing distfiles (morden). All existing SHA1 digests retained for now as an audit trail.
2015-06-12Recursive PKGREVISION bump for all packages mentioning 'perl',wiz1-1/+2
having a PKGNAME of p5-*, or depending such a package, for perl-5.22.0.
2015-05-23Update to 1.2.21. From the changelog:schmonz4-18/+19
documentation: * docs/overview.rst: Document 'E' prefixed boolean terms for filtering by extension (see #668, reported by bramvdh). * docs/encodings.rst: Add a document about character encoding, as suggested by James Aylett in #550. * docs/cgiparams.rst: Improve wording of docs for SORT parameter. * docs/omegascript.rst: Update documentation references to DATE1, DATE2, and DAYSMINUS which were renamed in 0.6.x and the compatibility aliases removed in 1.0.0. indexers: * omindex: + outlookmsg2html: Fix handling of message/rfc822 subparts. + Ignore extensions .msi and .msp, which are Microsoft installer files, but which libmagic sometimes incorrectly identifies as application/msword. + Interpret a command of "false" in "--filter" as meaning to ignore files with that MIME type. omega: * $prettyurl now decodes valid UTF-8 sequences, and some additional ASCII characters in the path part: []@!$&'()*+.;= (Fixes #550 and #644, reported by catkin and terencz.) * $prettyurl now leaves the query and fragment parts of the URL alone and won't decode an escaped "/" (omindex doesn't create URLs with any of these, so we only risk breaking other URLs which have them). * Drop compilation date and time from output when run from the command line - they prevent reproducible builds and the version number is sufficient information. * Handle CGI parameter [=0 as [=1. templates: * templates/query: When listing matching terms, don't make the commas italic. * templates/query: Eliminate blank line before <html>. * templates/xml: Add XML declaration. * templates/godmode: Specify charset utf-8 in the content-type. * templates/xml: Update handling of DATE1, DATE2 and DAYSMINUS which were renamed in 0.6.x and the compatibility aliases removed in 1.0.0. build system: * Link test programs with libtool's '-no-install' or '-no-fast-install', like we already do in xapian-core, which means that libtool doesn't need to generate shell script wrappers for them on most platforms. * configure: Use pkg-config in preference to determine flags needed to compile and link with PCRE, as this will just work when cross-compiling (at least under MXE). * configure: Define MINGW_HAS_SECURE_API under mingw to get _putenv_s() declared in stdlib.h. * Enable automake option 'subdir-objects' to avoid warning from newer automake. portability: * Add spaces between literal strings and macros which expand to literal strings for C++11 compatibility. * Remove 'register' as it's deprecated and clang spits out warnings because of that. Any modern compiler likely just ignores it as an optimisation hint anyway. * Avoid doing link tests with libmagic in configure as they fail on mingw due to not automatically picking up libraries which libmagic itself depends on.
2014-11-17Update to 1.2.19. From the changelog:schmonz2-6/+6
documentation: * docs/overview.rst: Note that pdftotext is part of poppler as well as xpdf. (Noted by Paul Wise)
2014-07-06Update to 1.2.18. From the changelog:schmonz3-11/+10
indexers: * omindex: + Work around libmagic returning a MIME content-type of "Composite Document File V2 Document[...]" or "application/CDFV2-corrupt" by returning a more suitable filetype based on looking at the file's extension. + The starting URL wasn't previously URL encoded. In 1.3.2, this will be fixed by URL encoding it as we do for the rest of the path, for the 1.2 branch we only URL encode it if it contains a character <= 31 or at least one of '#', '%', ':' or '?'. This avoids a one-off reindex of every document in the database in cases which work OK in practice. + When we skip a file because it exceeds the configured size limit, include that size limit in the message. omega: * Add support for setting the query expansion scheme to use. portability: * Don't compile in unixperm.cc - it isn't currently used, and it fails to build with mingw. (fixes #635, reported by Alexis Denis) * Fix warning when built with GCC 4.7.2 using -Os. * Removed unused inline function, fixing compiler warning.
2014-05-29Bump for perl-5.20.0.wiz1-1/+2
Do it for all packages that * mention perl, or * have a directory name starting with p5-*, or * depend on a package starting with p5- like last time, for 5.18, where this didn't lead to complaints. Let me know if you have any this time.
2014-02-20Update to 1.2.17. From the changelog:schmonz3-10/+11
documentation: * docs/overview.html: Add Abiword as an example use of --filter, based on patch from Frank J Bruzzaniti (fixes#383). portability: * Fix "no previous declaration" warning on platforms which don't have mkdtemp(). indexers: * omindex: + Fix off-by-one when finding documents to delete which would sometimes cause omindex to fail to delete documents from the database when they weren't refound during an index update. + Decode dates in xlsx files. + Ignore extensions 'adm', 'cur', and 'ico' by default. + Group-readable files which are owner-readable but not world-readable should still get a "readable by owner" term added. Reported by Emmanuel Garette. build system: * Compress source tarballs with xz instead of gzip. * configure: Sync compiler warning flag machinery against xapian-core. The changes are special handling for clang, passing -fshow-column where supported, and handling for new warning flags in GCC 4.6 and 4.7.
2013-06-04Update to 1.2.15. From the changelog:schmonz4-25/+25
Omega 1.2.15 (2013-04-16): omega: * Don't pointlessly link utf8convert.o into the omega CGI. Omega 1.2.14 (2013-03-14): indexers: * omindex: + Correct "max" -> "min" when reserving space for shared strings in .xlsx files. This just means we now reserve a more appropriate amount of space to start with. + Ignore .com files by default. Omega 1.2.13 (2013-01-09): indexers: * omindex: + Extracting text using external filters now works for filenames containing a newline character - previously the newline got lost during escaping for the shell. + Fix segfault when -F option without a ':' is passed. + Skip a file if we get a read error while calculating the MD5 checksum (used for duplicate detection) - previously we used a checksum of the file up to that point. + Avoid rereading SVG and Atom files when we calculate their MD5 checksums. + Improvement --help output and man page, most notably: - Say explicitly that --sample-size accepts the same formats as --max-size. - Note default size limit on files to index is unlimited. + When generating a sample for a CSV file, limit the size we pre-allocate to the CSV file size if that's smaller than the requested sample size, in case the user sets that limit very high. omega: * Fix to decode %-encoded character at the end of the query string. Omega 1.2.12 (2012-06-27): No changes since 1.2.11 except to bump the version - this release was made to fix an incorrect library version information update in xapian-core 1.2.11. Omega 1.2.11 (2012-06-26): indexers: * Change HTML parser's handling of multiple <body> tags and of text outside of <body> to match the behaviour of modern web browsers. (ticket#599) * omindex: + Add command line option to control the size of the document sample stored. Patch from Mihai Bivol. + Rework .xlsx parsing to substitute the shared strings into the positions they are used in, so that the sample actually matches what appears in the spreadsheet, and to index calculated cell contents. + Improve handling of headers and footers in OpenDocument documents. + pdftotext outputs a formfeed between each page, which messes up our "empty body" check, so trim any trailing formfeeds before this check. Omega 1.2.10 (2012-05-09): indexers: * Add support for CDATA to HTML/XML parser. * omindex: + Add --max-size option, based on patch from ndaley in ticket#587. + Add support for atom feed files, patch from Mihai Bivol in ticket#595. + If the document with the highest existing docid before the run was updated, we were reporting it as "added", but now we correctly report it as "updated". (Backported from 1.3.0). + Catch and report std::exception explicitly, so failing to allocate memory is no longer reported as "Unknown exception". (Backported from 1.3.0). Omega 1.2.9 (2012-03-08): documentation: * docs/overview.html: + Document that libmagic is used to determine the MIME type if the extension isn't known. Partly addresses ticket#569. + We now limit time as well as CPU and memory for external filters. indexers: * Our HTML parser now ignores sections bracketed by <!--UdmComment--> and <!--/UdmComment-->, like we already do for <!--htdig_noindex-->. * omindex: Add more extensions to the default ignore list: bin dat db fon jar lnk pyc pyd pyo sqlite sqlite3 sqlite-journal tmp ttf
2013-05-31Bump all packages for perl-5.18, thatwiz1-2/+2
a) refer 'perl' in their Makefile, or b) have a directory name of p5-*, or c) have any dependency on any p5-* package Like last time, where this caused no complaints.
2012-10-25Drop superfluous PKG_DESTDIR_SUPPORT, "user-destdir" is default these days.asau1-3/+1
2012-10-03Bump all packages that use perl, or depend on a p5-* package, orwiz1-2/+2
are called p5-*. I hope that's all of them.
2012-03-03Recursive bump for pcre-8.30* (shlib major change)wiz1-2/+2
2012-01-27Add missing sysutils/file buildlinksbd1-1/+3
Bump PKGREVISION
2012-01-10Update to 1.2.8. Changelog since 1.0.18 is way too long and highlightsschmonz5-21/+32
aren't obvious. Lots of bug fixes.
2011-12-03Recursive bump for textproc/xapian buildlink additions.sbd1-1/+2
2010-02-16Update to 1.0.18.wiz5-123/+7
The rlimit issue adressed in patches ac,ad,ae was already addressed in release 1.0.11, so remove them. Omega 1.0.18 (2010-02-14): indexers: * Make the default charset "utf-8" not "UTF-8" as we lower case explicitly specified character sets to compare to see if we need to reparse. Previously XML documents which explicitly specified their character set as UTF-8 would cause needless restart or the parser. * omindex: + Increase the wdf boost for the document title from 2 to 5, since 2 isn't really enough. * scriptindex: + Don't abort with "Unknown Exception" if indexing is disallowed or we hit </body> for a document which had an overridden character set. Fixes ticket#410. Omega 1.0.17 (2009-11-18): indexers: * omindex: + On Linux, change the memory limit on external filters to use _SC_PHYS_PAGES since _SC_AVPHYS_PAGES excludes pages used by the OS cache and so will often report a really low value. Fixes Debian bug#548987 and ticket#358. + Fix likely crash when reading output from external filter program if read() is interrupted by a signal. + Fix potential crash when indexing PostScript files (fixed by using delete[] (not delete) for array allocated by new[]). testsuite: * utf8converttest: Charset "8859_1" isn't understood by Solaris libiconv, and isn't a standard charset name, so just test it when using our built-in converter and GNU libc. portability: * Fix build failure on Mac OS X 10.6. * Also check for socketpair() in -lxnet if it isn't found without, which enables resource limits on external filter programs called by omindex on Solaris, and possibly some other platforms. Fixes ticket#412.
2009-09-10Update to 1.0.16. From the changelog:schmonz3-11/+11
* Fix cross-site scripting vulnerability in reporting of exceptions (CVE-2009-2947).
2009-08-27Update to 1.0.15. From the changelog:schmonz3-12/+12
general: * omegascript.vim: The list of OmegaScript commands in the vim mode was rather out of date, and a few commands were misclassified. Fix both problems and avoid future recurrences by automatically generating those lists from the command list in query.cc. documentation: * omegascript.html: Document that $date uses UTC. (ticket#314) templates: * query: Link to "xapian.org" rather than "www.xapian.org". * inc/toptermsjs: Use double-quotes rather than single quotes for parameter values on the <script> tag. portability: * omindex: Implement correct handling of paths when calling external filter programs on Microsoft Windows.
2009-07-23Update to 1.0.14:schmonz2-7/+7
indexers: * omindex: Make sure that output is flushed after every message, not just after some of them. portability: * Avoid infinite loop in omindex and scriptindex when reading files under Cygwin with automatic end of line translation enabled. This same bug can also manifest on Unix platforms if the file is truncated by another process while being read.
2009-07-18Update to 1.0.13. From the changelog:schmonz3-11/+11
* omindex: + If the filter program needed for a file format isn't installed, report this explicitly when skipping subsequent files with the extension instead of misleadingly reporting "Unknown extension". + Make -s actually work as a short-form for --stemmer (as documented by "omindex --help" and "man omindex"). + Drop the copyright info from the output of --version as it's perennially out of date and we don't report it for any other Xapian programs. * scriptindex: + Add new "valuenumeric" action to add a document value using Xapian::sortable_serialise() to allow numeric sorting (ticket#260).
2009-07-07user-destdir supportjoerg1-1/+3
2009-06-14Convert @exec/@unexec to @pkgdir or drop it.joerg1-2/+1
2009-06-14Remove @dirrm entries from PLISTsjoerg1-4/+1
2009-05-01Needs zlib.schmonz1-1/+2
2009-04-20Meant to add LICENSE in previous (gnu-gpl-v2).schmonz1-1/+2
2009-04-20Update to 1.0.12. From the changelog:schmonz6-35/+22
* $log now retries a partial write, or one interrupted by a system call. * cgiparams.html: Note the technique of using a stub database file to allow a default of searching over multiple databases. * omindex: + Add support for indexing Microsoft Office 2007 formats and XPS files (bug#290). + Fix the extraction of metadata from OpenDocument formats. + Fix "-l" which would previously always cause a segmentation fault if used ("--depth-limit" wasn't affected). * Fix to compile when RLIMIT_AS isn't available (as on NetBSD and OpenBSD). Instead use RLIMIT_VMEM or RLIMIT_DATA if either is available, else don't try to limit the memory the filter process can use.
2009-01-07Update to 1.0.10:wiz3-20/+11
Omega 1.0.10 (2008-12-23): build system: * This release now uses newer versions of the autotools (autoconf 2.62 -> 2.63; automake 1.10.1 -> 1.10.2). The newer autoconf fixes a regression in autoconf 2.62 (and so Omega 1.0.7) with detecting the endian-ness of some platforms. Omega 1.0.9 (2008-10-31): documentation: * docs/overview.html: Document HTML parsing a bit, including robots meta and htdig_noindex. omega: * omega: Catch std::exception and report what its what() method returns. * omega: Remove undocumented and non-functional support for numeric sorting via CGI parameter SORT=#<slot> (SORT=<slot> works as before). build system: * configure: Sync warning flag handling changes from xapian-core to eliminate many warnings from GCC 4.3. Omega 1.0.8 (2008-09-04): documentation: * Fix a few typos and improve wording in a few places. indexers: * omindex: + If the character encoding is specified using <meta http-equiv=...> in an HTML document then reparse the document if it isn't the encoding we're already using so that any preceding <title> is converted correctly (bug#292). + Convert text from meta tag parameters to UTF-8 (bug#293). + Handle <meta charset="..."> (new in HTML 5). + Fix bug in HTML tag parameter parsing which was probably just a small performance penalty in real world cases, but could perhaps result in parsing bogus extra parameters in carefully contrived situations. portability: * Add missing <signal.h>, noted on FreeBSD by Henrik Brix Andersen.
2008-07-31Add missing dependency on Perl, found by joerg's bulk build. Bumpschmonz1-2/+3
PKGREVISION.
2008-07-27Fix build on NetBSD (4.0, at least): include <signal.h> and avoidschmonz6-2/+144
RLIMIT_AS on systems without it. Also fix path to Perl interpreter in installed scripts, and as a result, bump PKGREVISION.
2008-07-26Initial import of Omega, which operates on a set of Xapian databases.schmonz6-0/+131
Each database is created and updated separately using either omindex or scriptindex. You can search these databases (or any other Xapian database with suitable contents) via a web front-end provided by omega, a CGI application. A search can also be done over more than one database at once.