summaryrefslogtreecommitdiff
path: root/textproc
AgeCommit message (Collapse)AuthorFilesLines
2019-12-25Bulk builds with ruby25 don't seem to install jquery.js.schmonz2-8/+3
2019-12-25jo: fix PLIST; bump revisionadam2-7/+7
2019-12-25py-xapian: fix PLIST for Pythin 3.x/Sphinx 2; mark as self-conflictingadam2-12/+14
2019-12-23textproc/libxlsxwriter: update to 0.8.9sjmulder3-9/+9
Changes: - Added support for default hyperlink style in worksheet_write_url(). - Added support for hyperlink in images, see worksheet_insert_image_opt(). - Fixed several worksheet_write_url() edge cases.
2019-12-21Remove non-ASCII characters for Python 3.6.joerg2-1/+35
2019-12-19Uses Python 2 syntax.joerg1-1/+3
2019-12-19Add missing errno.hjoerg2-1/+14
2019-12-19Package makefile is spelled with lower case m.joerg1-1/+3
2019-12-19Fix compatability with regex restrictions in current Perl. Bumpjoerg3-3/+20
revision.
2019-12-19grep: Avoid conflict with sys/limits.h guard.jperkin2-1/+19
2019-12-18Resolve conflict with STL.joerg2-1/+17
2019-12-18Fix rpath in DSO, don't test on runtime due to $DESTDIR.joerg3-2/+21
2019-12-18Update to 3.54 (during the freeze, for the bugfixes). From the changelog:schmonz6-28/+30
- fixed default colour output in BBCode (https://gitlab.com/saalen/highlight/issues/134) - fixed corner case in sh.lang - fixed syntax tests with UTF-8 input (https://gitlab.com/saalen/highlight/issues/123) - added support for Bash in outhtml_codefold.lua plug-in - added ballerina.lang - added block strings to java.lang - added author hints in themes and language definitions - added C++20 reserved words in c.lang - added editorconfig file and validated all files accordingly (thanks to Tristano Ajmone) - CLI: fixed `--list-scripts` with `-d` or HIGHLIGHT_DATADIR env variable (https://gitlab.com/saalen/highlight/issues/139) - GUI W32: replaced multibyte path trace window by startup hint if NtfsDisable8dot3NameCreation is set - GUI: removed AsciiDoc instruction lines from the README popup window
2019-12-18textproc/word2vec: Fix a typo in COMMENTminskim1-2/+2
2019-12-17Update to 1.4.14. From the changelog:schmonz2-7/+6
documentation: * Improve omindex --help docs for --duplicates. * Document that $log will start to return an error message in 1.5.0, and that one can wrap it using a $if with no action now to be future-proof. indexers: * Add built-in support for iso-8859-15 so we can handle it without iconv. This charset is a variant of iso-8859-1 with 8 characters changed, most notably including the euro currency symbol. It's the most commonly seen charset we didn't have built-in support for. * Optimise converting us-ascii to UTF-8 to do nothing, like we already do when converting UTF-8 to UTF-8. * scriptindex: + Add new 'gap' action which provides a way to leave a gap in the term positions between fields to prevent phrases and positional operators from matching across fields. omega: * Fix error handling in $lookup. We now check for errors from cdb_init() and cdb_get(). We've never checked for errors from cdb_init(), while for cdb_get() this bug was introduced by a warning fix in 1.2.20. templates: * Future-proof use of $log against changes in 1.5.0.
2019-12-17Reset PKGREVISION for xapian update.schmonz2-4/+2
2019-12-17Update to 1.4.14. From the changelog:schmonz3-12/+12
API: * Xapian::QueryParser: Handle "" inside a quoted phrase better. In a quoted boolean term, "" is treated as an escaped ", so handle it in a compatible way for quoted phrases. Previously we'd drop out of the phrase and start a new phrase. Fixes #630, reported by Austin Clements. * Xapian::Stem: The constructor which takes a stemmer name now takes an optional second bool parameter - if this is true, then an unknown stemmer name falls back to using the "none" stemmer instead of throwing an exception. This allows simply constructing a stemmer from an ISO language code without having to worry about whether there's a stemmer for that language, and without having to handle an exception if there isn't. * Xapian::Stem: Fix a bug with handling 4-byte UTF-8 sequences which potentially affects most of the stemmers. None of the stemmers work in languages where 4-byte UTF-8 sequences are part of the alphabet, but this bug could result in invalid UTF-8 sequences in terms generated from text containing high Unicode codepoints such as emoji, which can cause issues (for example, in some language bindings). Fix synced from Snowball git post 2.0.0. Reported by Ilari Nieminen in https://github.com/snowballstem/snowball/issues/89. * Xapian::Stem: Add a new is_none() method which tests if this is a "none" stemmer. * Xapian::Weight: The total length of all documents is now made available to Xapian::Weight subclasses, and this is now used by DLHWeight, DPHWeight and LMWeight. To maintain ABI compatibility, internally this still fetches the average length and the number of documents, multiplies them, then rounds the result, but in the next release series this will be handled directly. * Xapian::Database::locked() on an inmemory database used to always return false, but an inmemory Database is always actually a WritableDatabase underneath, so now we always report true in this case because it's really always report being locked for writing. * Fix write one past end of std::vector on certain QueryParser parser errors. This is undefined behaviour, but the write was always into reserved space, so in practice we'd actually get away with it (it was noticed because it triggers an error when running under ubsan and using libc++). Reported by Germán M. Bravo. * MSet::get_matches_estimated(): Improve rounding of result - a bug meant we would almost always round down. * Optimise test for UTF-8 continuation character. Performing a signed char comparison shaves an instruction or two on most architectures. * Database::get_revision(): Return revision 0 for a Database with no shards rather that throwing InvalidOperationError. * DPHWeight: Avoid dividing by 0 when searching a sharded database when one shard is empty. The result wasn't used in this case, but it's still undefined behaviour. Detected by UBSan. testsuite: * Fix failing multi_glass_remoteprog_glass tests on x86. When the tests are run under valgrind, remote servers should be run using the runsrv wrapper script, but this wasn't happening for remote servers in multi-databases - now it is. Also, previously runsrv only used valgrind for the remote for an x86 build that didn't use SSE, but it seems there are x87 instructions in libc that are affected by valgrind not providing excess precision, so do this for x86 builds which use SSE too. Together these changes fix failures of topercent2, xor2, tradweight1 under backend multi_glass_remoteprog_glass on x86. * Fix C++ One-Definition Rule (ODR) violation in testsuite code. Two different source files linked into apitest were each defining a different `struct test`. Wrap each in an anonymous namespace to localise it to the file it is defined and used in. This was probably harmless in practice, unless trying to build with Link-Time Optimisation or similar (which is how it was detected). * Test all language codes in stemlangs1. The testsuite hardcodes a list of supported language codes which hadn't been updated since 2008. * Improve DateRangeProcessor test coverage. * The "singlefile" test harness backend manager now creates databases by compacting the corresponding underlying backend database (creating it first if need be) rather than always creating a temporary database to compact. * Enable compaction testcases for multi and singlefile test harness backends. * Add generated database support for remoteprog and remotetcp test harness backends. Implemented by Tanmay Sachan. * Add test harness support for running testcases using a multi database comprised of one local and one remote shard, or two remote shards. Implemented by Tanmay Sachan. * Check if removing existing multi stub failed. Previously if removing an existing stub failed, the test harness would create a temporary new stub and then try to rename it over the old one, which will always fail on Microsoft Windows. * Wait for xapian-tcpsrv processes to finish before moving on to the next testcase under __WIN32__ like we already do on POSIX platforms. matcher: * Handle pruning under a positional check. This used to be impossible, but since 1.4.13 it can happen as we now hoist AND_NOT to just below where we hoist the positional checks. The code on master already handles pruning here so this bug is specific to the RELEASE/1.4 branch. Fixes #796, reported by Oliver Runge. * When searching with collapsing over multiple shards, at least some of which are remote, uncollapsed_upper_bound could be too low and uncollapsed_lower_bound too high. This was causing assertion failures in testcases msize1 and msize2 under test harness backends multi_glass_remoteprog_glass and multi_remoteprog_glass. * Internally we no longer calculate a bogus total_term_count as the sum of total_length * doc_count for all shards. Instead we just use the sum of total_length, which gives the total number of term occurrences. This change should improve the estimated collection_freq values for synonyms. * Several places where we might divide zero by zero in a database where wdf was always zero have been fixed. * Optimise OP_AND_NOT better. We now combine its left argument with other connected and-like subqueries, and gather up and hoist the negated subqueries and apply them together above the combined and-like subqueries, just below any positional filters. * Optimise OP_AND_MAYBE better. We now combine its left argument with other connected and-like subqueries, and gather up and hoist the optional subqueries and apply them together above the combined and-like subqueries and any hoisted positional filters. * Treat all BoolWeight queries as scaled by 0 - we can optimise better if we know the query is unweighted. build system: * configure: Stop using AC_FUNC_MEMCMP. The autoconf manual marks it as "obsolescent", and it seems clear that nobody's relying on it as we're missing the "'AC_LIBOBJ' replacement for 'memcmp'" which it would try to use if needed. glass backend: * Allow zlib compression to reduce size by one byte. We were specifying an output buffer size one byte smaller than the input, but it appears zlib won't use the final byte in the buffer, so we actually need to pass the input size as the output buffer size. * Only try to compress Btree item values > 18 bytes, which saves CPU time without sacrificing any significant size savings. remote backend: * Fix match stats when searching with collapsing over multiple shards and at least some shards are remote. Bug discovered by Tanmay Sachan's test harness improvements. * Ignore orphaned remote protocol replies which can happen when searching with a remote shard if an exception is thrown by another shard. Bug discovered by Tanmay Sachan's test harness improvements. * Wait for xapian-progsrv child to exit when a remote Database or WritableDatabase object is closed under __WIN32__ like we already do for POSIX platforms. documentation: * HACKING: Replace release docs with pointer to the developer guide where they are now maintained. * Correct documentation of initial messages in replication protocol. tools: * quest: Report bounds and estimate of number of matches. * xapian-delve: Improve output when database revision information is not available. We now specially handle the cases of a DB with multiple shards and a backend which doesn't support get_revision(). portability: * Eliminate 2 uses of atoi(). These are potentially problematic in a multithreaded application if setlocale() is called by another thread at the same time. See #665. * Don't check __GNUC__ in visibility.h as the configure probe before defining XAPIAN_ENABLE_VISIBILITY checks that the visibility attributes work. This probably makes no difference in practice, as all compilers we're aware of which support symbol visibility also define __GNUC__. * Document Sun C++ requires --disable-shared. Closes #631. * Fix warning from GCC 9 with -Wdeprecated-copy (which is enabled by -Wextra) if a reference to an Error object is thrown. * Suppress GCC warning in our API headers when compiling code using Xapian with GCC and -Wduplicated-branches. * Mark some internal classes as final (following GCC -Wsuggest-final-types suggestions to allow some method calls to be devirtualised). * Fix to build with --enable-maintainer-mode and Perl < 5.10, which doesn't have the `//=` operator. It's unlikely developers will have such an old Perl, but the mingw environment on appveyor CI does. The use of `//=` was introduced by changes in 1.4.10.
2019-12-16Drop php71 supporttaca1-2/+2
Drop php71 support mechanically.
2019-12-15grep: reset PKGREVISION after updatewiz1-2/+1
2019-12-15textproc/grep: update to 3.3rhialto2-7/+7
* Noteworthy changes in release 3.3 (2018-12-20) [stable] ** Bug fixes Some uses of \b in the C locale and with the DFA matcher would fail, e.g., the following would print nothing (it should print the input line): echo 123-x|LC_ALL=C grep '.\bx' Using a multibyte locale, using certain regexp constructs (some ranges, backreferences), or forcing use of the PCRE matcher via --perl-regexp (-P) would avoid the bug. [bug introduced in grep 2.3] * Noteworthy changes in release 3.2 (2018-12-20) [stable] ** Changes in behavior The --files-without-match (-L) option now causes grep to succeed when a file is listed, instead of when a line is selected. This resembles what git-grep does. ** Bug fixes The --recursive (-r) option no longer fails on MS-Windows. [bug introduced in grep 2.11] ** Improvements An over-30x performance improvement when many 'or'd expressions share a common prefix, thanks to improvements in gnulib's dfa.c, by Norihiro Tanaka. See gnulib commits v0.1-2110-ge648401be, v0.1-2111-g4299106ce, v0.1-2117-g617a60974 An additional 3-23% speed-up when searching large files, via increased initial buffer size. grep now diagnoses stack overflow. Before grep-2.6, the included regexp code would detect it. Since 2.6, grep defaulted to using glibc's regexp, which lost that capability.
2019-12-15textproc/php-wddx: wddx is dropped from php74taca1-1/+2
Wddx extension for PHP is dropped from PHP 7.4.
2019-12-15py-yaml: updated to 5.2adam2-7/+7
5.2: * Repair incompatibilities introduced with 5.1. The default Loader was changed, but several methods like add_constructor still used the old default https://github.com/yaml/pyyaml/pull/279 -- A more flexible fix for custom tag constructors https://github.com/yaml/pyyaml/pull/287 -- Change default loader for yaml.add_constructor https://github.com/yaml/pyyaml/pull/305 -- Change default loader for add_implicit_resolver, add_path_resolver * Make FullLoader safer by removing python/object/apply from the default FullLoader https://github.com/yaml/pyyaml/pull/347 -- Move constructor for object/apply to UnsafeConstructor * Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff https://github.com/yaml/pyyaml/pull/276 -- Fix logic for quoting special characters * Other PRs: https://github.com/yaml/pyyaml/pull/280 -- Update CHANGES for 5.1
2019-12-14Convert some drobilla.net packages to waf.mk.nia4-57/+16
This removes a lot of do-X: targets and redundant python runtime deps.
2019-12-14sord: Update to 0.16.4nia2-10/+9
sord (0.16.4) stable; * Update build system
2019-12-14serd: Update to 0.30.2nia2-10/+9
serd (0.30.2) stable; * Fix GCC 4 build * Fix colliding blank nodes when parsing TriG * Fix missing parse error messages * Fix parsing "a" abbreviation without padding whitespace * Fix parsing TriG graphs with several squashed trailing dots * Fix resolving some URIs against base URIs with no trailing slash * Improve build system and CI integration * Improve documentation
2019-12-13Revbump all Go packages after Go 1.12.14 update.bsiegert13-26/+26
2019-12-12lowdown: update to 0.4.6.fcambus2-7/+7
ChangeLog: lowdown 0.4.6: - Make sure stdint is everywhere. lowdown 0.4.5: - Use BSD.lv's mandoc.css. - Use BSD.lv's style and clean up. - Newest sblg. - Add stdint. lowdown 0.4.4: - Use Makefile.configure's macros. - Bring up to speed w/other bsd.lv sites. - Newest oconfigure. - Document list-start for commonmark. - All outputs process start list item for commonmark. - Capture list start iff commonmark. lowdown 0.4.3: - Add atom version feed. - Use new oconfigure. - Bound reading into the value buffer. lowdown 0.4.2: - Document maths. - Remove MATHEXP ("mathexp") facility. - Disable MATHEXP mode. This offers nothing, but ends up complicating the user experience. Stick with only one "math mode" and document it properly. - Have math blocks reported in -Ttree and accumulate the content in a text node. - Begin incorporating better math mode. - Fix pdfhref links in lists. - In nroff mode, stretch table to fit width and embolden header. - Add table documentation. - Typo found by Anton Lindqvist.
2019-12-12textproc/ruby-builder30: remove pacakgetaca4-47/+0
Remove ruby-builder30 package since it was kept for Ruby on Rails 3. rails3 related packages had already gone.
2019-12-12textproc/Makefile: remove ruby-builder30taca1-2/+1
2019-12-12py-regex: updated to 2019.12.9adam2-7/+7
2019.12.9: Unknown changes
2019-12-11py-colored: updated to 1.4.2adam2-7/+7
version 1.4.2: - Python 3 compatibility - setup.py file Version 1.4.1: - UnsupportedOperation: fileno when detecting TTY
2019-12-11oniguruma: updated to 6.9.4adam2-7/+7
Release 6.9.4 (Almost same as Release Candidate 3) NEW API: RegSet (set of regexes) Fixed CVE-2019-19012 Fixed CVE-2019-19203 (Does not affect UTF-8, UTF-16 and UTF-32 encodings) Fixed CVE-2019-19204 (Affects only PosixBasic, Emacs and Grep syntaxes) Fixed CVE-2019-19246 Fixed some problems (found by libFuzzer test)
2019-12-11fmtlib: updated to 6.1.2adam2-7/+7
6.1.2: Fixed ABI compatibility with libfmt.so.6.0.0. Fixed handling types convertible to std::string_view. Made CUDA test an opt-in enabled via the FMT_CUDA_TEST CMake option. Fixed sign conversion warnings.
2019-12-11ruby-nokogiri: update to 1.10.7.tsutsui3-11/+11
Upstream changes (from CHANGELOG.md): ## 1.10.7 / 2019-12-03 ### Bug * [MRI] Ensure the patch applied in v1.10.6 works with GNU `patch`. [#1954] ## 1.10.6 / 2019-12-03 ### Bug * [MRI] Fix FreeBSD installation of vendored libxml2. [#1941, #1953] (Thanks, @nurse!) ## 1.10.5 / 2019-10-31 ### Security [MRI] Vendored libxslt upgraded to v1.1.34 which addresses three CVEs for libxslt: * CVE-2019-13117 * CVE-2019-13118 * CVE-2019-18197 More details are available at #1943. ### Dependencies * [MRI] vendored libxml2 is updated from 2.9.9 to 2.9.10 * [MRI] vendored libxslt is updated from 1.1.33 to 1.1.34
2019-12-11py-tabulate: updated to 0.8.6adam2-7/+7
0.8.6: Bug fixes. Stop supporting Python 3.3, 3.4.
2019-12-11py-sphinxcontrib-websupport: updated to 1.1.2adam3-8/+18
Release 1.1.2: * sphinxcontrib-websupport doesn't work with Sphinx 2.0 Release 1.1.1: * sphinxcontrib-websupport doesn't work with Sphinx 2.0.0b1
2019-12-10textproc/guile-json: Update to version 3.3.0ng02-7/+7
Changelog extracted from Changelog file: bump version to 3.3.0 builder: use string instead of bytevector when throwing exception Add info to json invalid exception builder: add #:validate key argument to skip validation json-builder: throw sensible error warning parser: make sure empty array slots are considered invalid added unit tests for scheme object validations validate scheme object when building JSON document bump version to 3.2.0 builder: small simplification add a case for building the JSON of empty JSON objects builder: document the use of symbols and numbers as JSON object keys tests: added unit tests for invalid numbers builder: don't allow complex numbers, inf and nan bump version to 3.1.0
2019-12-08py-sphinx: updated to 2.2.2adam3-8/+11
Release 2.2.2: Incompatible changes * For security reason of python, parallel mode is disabled on macOS and Python3.8+ Bugs fixed * LaTeX: 2019-10-01 LaTeX release breaks :file:`sphinxcyrillic.sty` * i18n: French, Hindi, Chinese, Japanese and Korean translation messages has been broken * parallel build causes AttributeError on macOS and Python3.8
2019-12-07fmtlib: updated to 6.1.1adam3-9/+8
6.1.1: Fixed shared library build on Windows. Added a missing decimal point in exponent notation with trailing zeros. Removed deprecated format_arg_store::TYPES. 6.1.0: {fmt} now formats IEEE 754 float and double using the shortest decimal representation with correct rounding by default. Made the fast binary to decimal floating-point formatter the default, simplified it and improved performance. {fmt} is now 15 times faster than libc++'s std::ostringstream, 11 times faster than printf and 10% faster than double-conversion on dtoa-benchmark. {fmt} no longer converts float arguments to double. In particular this improves the default (shortest) representation of floats and makes fmt::format consistent with std::for. Made floating-point formatting output consistent with printf/iostreams. Added support for 128-bit integers. The overload of print that takes text_style is now atomic, i.e. the output from different threads doesn't interleave. Made compile time in the header-only mode ~20% faster by reducing the number of template instantiations. wchar_t overload of vprint was moved from fmt/core.h to fmt/format.h. Added an overload of fmt::join that works with tuples. Changed formatting of octal zero with prefix from "00" to "0. The locale is now passed to ostream insertion (<<) operators. Locale-specific number formatting now uses groupin. Fixed handling of types with deleted implicit rvalue conversion to const char**. Enums are now mapped to correct underlying types instead of int. Enum classes are no longer implicitly converted to int. Added basic_format_parse_context for consistency with C++20 std::format and deprecated basic_parse_context. Fixed handling of UTF-8 in precision. {fmt} can now be installed on Linux, macOS and Windows with Conda using its conda-forge package. Added a CUDA test.
2019-12-06tex-*: add TEXLIVE_UNVERSIONED=yesmarkd13-13/+26
2019-12-05py-validators: updated to 0.14.1adam2-7/+7
0.14.1: - Updated domain validator regex to not allow numeric only TLDs - Allow for idna encoded domains
2019-12-03py-jsonschema: updated to 3.2.0adam2-9/+13
v3.2.0 * Added a ``format_nongpl`` setuptools extra, which installs only ``format`` dependencies that are non-GPL
2019-12-03textproc/ripgrep: Update to 11.0.2minskim2-384/+316
Breaking changes since 0.10.0: - ripgrep has tweaked its exit status codes to be more like GNU grep's. Namely, if a non-fatal error occurs during a search, then ripgrep will now always emit a 2 exit status code, regardless of whether a match is found or not. Previously, ripgrep would only emit a 2 exit status code for a catastrophic error (e.g., regex syntax error). One exception to this is if ripgrep is run with -q/--quiet. In that case, if an error occurs and a match is found, then ripgrep will exit with a 0 exit status code. - Supplying the -u/--unrestricted flag three times is now equivalent to supplying --no-ignore --hidden --binary. Previously, -uuu was equivalent to --no-ignore --hidden --text. The difference is that --binary disables binary file filtering without potentially dumping binary data into your terminal. That is, rg -uuu foo should now be equivalent to grep -r foo. - The avx-accel feature of ripgrep has been removed since it is no longer necessary. All uses of AVX in ripgrep are now enabled automatically via runtime CPU feature detection. The simd-accel feature does remain available (only for enabling SIMD for transcoding), however, it does increase compilation times substantially at the moment. See the release announcement for the complete list. https://github.com/BurntSushi/ripgrep/releases
2019-12-02textproc/py-jsonschema: Add a missing dependencyminskim1-1/+3
2019-12-02textproc/Makefile: Add word2vecminskim1-1/+2
2019-12-02textproc/word2vec: Import version 0.1cminskim7-0/+114
word2vec is an implementation of the Continuous Bag-of-Words (CBOW) and the Skip-gram model (SG), as well as several demo scripts. Given a text corpus, the word2vec tool learns a vector for every word in the vocabulary using the Continuous Bag-of-Words or the Skip-Gram neural network architectures.
2019-11-30textproc/libxlsxwriter: Update to 0.8.8sjmulder3-9/+9
Changes: - Added option to allow a user defined, or overridden, image description used with `worksheet_insert_image()`. By default it uses the filename as the description. - Added Windows portable version of `fopen` to handle utf8 filenames when working with images. - Added an option to allow chart fonts to be rotation to 270 deg to give a stacked orientation. Also added support for East Asian vertical chart fonts. - Refactored struct types used in pubic APIs to remove or document hidden fields. NOTE: This change introduces backward incompatible API changes. However, it should minimize any future changes of this nature.
2019-11-27jsoncpp: updated to 1.9.2adam4-15/+35
1.9.2: Medium size pre-release containing lots of build fixes.
2019-11-26Add textproc/guile-commonmark version 0.1.2ng05-1/+44
Guile-commonmark is a library for parsing CommonMark, a fully specified variant of Markdown.
2019-11-26py-lxml: updated to 4.4.2adam2-7/+7
4.4.2: Bugs fixed * ``ElementInclude`` incorrectly rejected repeated non-recursive includes as recursive.