diff options
author | fcambus <fcambus@pkgsrc.org> | 2020-03-06 08:18:31 +0000 |
---|---|---|
committer | fcambus <fcambus@pkgsrc.org> | 2020-03-06 08:18:31 +0000 |
commit | 294c9e5627a312b28f9df5622996fb93a76fc2c9 (patch) | |
tree | 91fae1b4d1c296c67f685ab7eb65cf4baa22d885 /textproc/miller | |
parent | 57e7e9c16d3b99723116b89c38775d04779c1d55 (diff) | |
download | pkgsrc-294c9e5627a312b28f9df5622996fb93a76fc2c9.tar.gz |
miller: update to 5.6.2.
ChangeLog:
v5.6.2
Bug fixes:
#271 fixes a corner-case bug with more than 100 CSV/TSV files with
headers of varying lengths.
Documentation:
The new http://johnkerl.org/miller/doc/whyc-details.html is an
elaboration on http://johnkerl.org/miller/doc/whyc.html which answers
a question posed by @BurntSushi on Reddit a couple years ago which
I did not address in detail at the time.
v5.6.1
The only change is that http://johnkerl.org/miller/doc is now
more mobile-friendly. All build artifacts are the same as at
https://github.com/johnkerl/miller/releases/tag/v5.6.0
v5.6.0
The new system DSL function allows you to run arbitrary shell commands
and store them in field values. Some example usages are documented
here. This is in response to issues #246 and #209.
There is now support for ASV and USV file formats. This is in response
to issue #245.
The new format-values verb allows you to apply numerical formatting
across all record values. This is in response to issue #252.
Documentation:
The new DKVP I/O in Python sample code now works for Python 2 as
well as Python 3.
There is a new cookbook entry on doing multiple joins. This is in
response to issue #235.
Bugfixes:
The toupper, tolower, and capitalize DSL functions
are now UTF-8 aware, thanks to @sheredom's marvelous
https://github.com/sheredom/utf8.h. The internationalization page
has also been expanded. This is in response to issue #254.
#250 fixes a bug using in-place mode in conjunction with verbs
(such as rename or sort) which take field-name lists as arguments.
#253 fixes a bug in the label when one or more names are common
between old and new.
#251 fixes a corner-case bug when (a) input is CSV; (b) the last
field ends with a comma and no newline; (c) input is from standard
input and/or --no-mmap is supplied.
v5.5.0
The new positional-indexing feature resolves #236 from @aborruso. You
can now get the name of the 3rd field of each record via $[[3]], and
its value by $[[[3]]]. These are both usable on either the left-hand
or right-hand side of assignment statements, so you can more easily
do things like renaming fields progrmatically within the DSL.
There is a new capitalize DSL function, complementing the
already-existing toupper. This stems from #236.
There is a new skip-trivial-records verb, resolving #197. Similarly,
there is a new remove-empty-columns verb, resolving #206. Both are
useful for data-cleaning use-cases.
Another pair is #181 and #256. While Miller uses mmap internally
(and invisibily) to get approximately a 20% performance boost over
not using it, this can cause out-of-memory issues with reading either
large files, or too many small ones. Now, Miller automatically avoids
mmap in these cases. You can still use --mmap or --no-mmap if you
want manual control of this.
There is a new --ivar option for the nest verb which complements
the already-existing --evar. This is from #260 thanks to @jgreely.
There is a new keystroke-saving urandrange DSL function:
urandrange(low, high) is the same as low + (high - low) *
urand(). This arose from #243.
There is a new -v option for the cat verb which writes a low-level
record-structure dump to standard error.
There is a new -N option for mlr which is a keystroke-saver for
--implicit-csv-header --headerless-csv-output.
Documentation:
The new FAQ entry
http://johnkerl.org/miller/doc/faq.html#How_to_escape_'%3F'_in_regexes%3F
resolves #203.
The new FAQ entry
http://johnkerl.org/miller/doc/faq.html#How_can_I_filter_by_date%3F
resolves #208.
#244 fixes a documentation issue while highlighting the need for #241.
Bugfixes:
There was a SEGV using nest within then-chains, fixed in response
to #220.
Quotes and backslashes weren't being escaped in JSON output with
--jvquoteall; reported on #222.
v5.4.0
The new clean-whitespace verb resolves #190 from @aborruso. Along with
the new functions strip, lstrip, rstrip, collapse_whitespace, and
clean_whitespace, there is now both coarse-grained and fine-grained
control over whitespace within field names and/or values. See the
linked-to documentation for examples.
The new altkv verb resolves #184 which was originally opened via an
email request. This supports mapping value-lists such as a,b,c,d to
alternating key-value pairs such as a=b,c=d.
The new fill-down verb resolves #189 by @aborruso. See the linked-to
documentation for examples.
The uniq verb now has a uniq -a which resolves #168 from @sjackman.
The new regextract and regextract_or_else functions resolve #183
by @aborruso.
The new ssub function arises from #171 by @dohse, as a simplified way
to avoid escaping characters which are special to regular-expression
parsers.
There are new localtime functions in response to #170 by
@sitaramc. However note that as discussed on #170 these do
not undo one another in all circumstances. This is a non-issue
for timezones which do not do DST. Otherwise, please use with
disclaimers: localdate, localtime2sec, sec2localdate, sec2localtime,
strftime_local, and strptime_local.
Builds:
Windows build-artifacts are now available in Appveyor at
https://ci.appveyor.com/project/johnkerl/miller/build/artifacts,
and will be attached to this and future releases. This resolves #167,
#148, and #109.
Travis builds at https://travis-ci.org/johnkerl/miller/builds now
run on OSX as well as Linux.
An Ubuntu 17 build issue was fixed by @singalen on #164.
Documentation:
put/filter documentation was confusing as reported by @NikosAlexandris
on #169.
The new FAQ entry
http://johnkerl.org/miller-releases/miller-head/doc/faq.html#How_to_rectangularize_after_joins_with_unpaired?
resolves #193 by @aborruso.
The new cookbook entry
http://johnkerl.org/miller/doc/cookbook.html#Options_for_dealing_with_duplicate_rows
arises from #168 from @sjackman.
The unsparsify documentation had some words missing as reported by
@tst2005 on #194.
There was a typo in the cookpage page
http://johnkerl.org/miller/doc/cookbook.html#Full_field_renames_and_reassigns
as fixed by @tst2005 in #192.
Bugfixes:
There was a memory leak for TSV-format files only as reported by
@treynr on #181.
Dollar sign in regular expressions were not being escaped properly
as reported by @dohse on #171.
v5.3.0
Comment strings in data files: mlr --skip-comments allows
you to filter out input lines starting with #, for all file
formats. Likewise, mlr --skip-comments-with X lets you specify
the comment-string X. Comments are only supported at start of data
line. mlr --pass-comments and mlr --pass-comments-with X allow you
to forward comments to program output as they are read.
The count-similar verb lets you compute cluster sizes by cluster
labels.
While Miller DSL arithmetic gracefully overflows from 64-integer
to double-precision float (see also here), there are now the
integer-preserving arithmetic operators .+ .- .* ./ .// for those
times when you want integer overflow.
There is a new bitcount function: for example, echo x=0xf0000206 |
mlr put '$y=bitcount($x)' produces x=0xf0000206,y=7.
Issue 158: mlr -T is an alias for --nidx --fs tab, and mlr -t is an
alias for mlr --tsvlite.
The mathematical constants π and e have been renamed from PI and
E to M_PI and M_E, respectively. (It's annoying to get a syntax
error when you try to define a variable named E in the DSL, when
A through D work just fine.) This is a backward incompatibility,
but not enough of us to justify calling this release Miller 6.0.0.
Documentation:
As noted here, while Miller has its own DSL there will always be
things better expressible in a general-purpose language. The new page
Sharing data with other languages shows how to seamlessly share data
back and forth between Miller, Ruby, and Python. SQL-input examples
and SQL-output examples contain detailed information the interplay
between Miller and SQL.
Issue 150 raised a question about suppressing numeric conversion. This
resulted in a new FAQ entry How do I suppress numeric conversion?,
as well as the longer-term follow-on issue 151 which will make
numeric conversion happen on a just-in-time basis.
To my surprise, csvlite format options weren’t listed in mlr --help
or the manpage. This has been fixed.
Documentation for auxiliary commands has been expanded, including
within the manpage.
Bugfixes:
Issue 159 fixes regex-match of literal dot.
Issue 160 fixes out-of-memory cases for huge files. This is an old
bug, as old as Miller, and is due to inadequate testing of huge-file
cases. The problem is simple: Miller prefers memory-mapped I/O
(using mmap) over stdio since mmap is fractionally faster. Yet as
any processing (even mlr cat) steps through an input file, more and
more pages are faulted in -- and, unfortunately, previous pages are
not paged out once memory pressure increases. (This despite gallant
attempts with madvise.) Once all processing is done, the memory is
released; there is no leak per se. But the Miller process can crash
before the entire file is read. The solution is equally simple: to
prefer stdio over mmap for files over 4GB in size. (This 4GB threshold
is tunable via the --mmap-below flag as described in the manpage.)
Issue 161 fixes a CSV-parse error (with error message "unwrapped
double quote at line 0") when a CSV file starts with the UTF-8
byte-order-mark ("BOM") sequence 0xef 0xbb 0xbf and the header line
has double-quoted fields. (Release 5.2.0 introduced handling for
UTF-8 BOMs, but missed the case of double-quoted header line.)
Issue 162 fixes a corner case doing multi-emit of aggregate variables
when the first variable name is a typo.
The Miller JSON parser used to error with Unable to parse JSON data:
Line 1 column 0: Unexpected 0x00 when seeking value on empty input,
or input with trailing whitespace; this has been fixed.
Diffstat (limited to 'textproc/miller')
-rw-r--r-- | textproc/miller/Makefile | 4 | ||||
-rw-r--r-- | textproc/miller/distinfo | 10 |
2 files changed, 7 insertions, 7 deletions
diff --git a/textproc/miller/Makefile b/textproc/miller/Makefile index 54c24d0f5d9..ec3463a4507 100644 --- a/textproc/miller/Makefile +++ b/textproc/miller/Makefile @@ -1,6 +1,6 @@ -# $NetBSD: Makefile,v 1.15 2019/03/28 23:52:09 leot Exp $ +# $NetBSD: Makefile,v 1.16 2020/03/06 08:18:31 fcambus Exp $ -DISTNAME= mlr-5.2.2 +DISTNAME= mlr-5.6.2 PKGNAME= ${DISTNAME:S/mlr/miller/} CATEGORIES= devel MASTER_SITES= ${MASTER_SITE_GITHUB:=johnkerl/} diff --git a/textproc/miller/distinfo b/textproc/miller/distinfo index 203b15f13e3..8dd8a690673 100644 --- a/textproc/miller/distinfo +++ b/textproc/miller/distinfo @@ -1,6 +1,6 @@ -$NetBSD: distinfo,v 1.14 2017/08/14 21:22:55 wiz Exp $ +$NetBSD: distinfo,v 1.15 2020/03/06 08:18:31 fcambus Exp $ -SHA1 (mlr-5.2.2.tar.gz) = 1b130238401ae30096d984961af0e1f88d583a1a -RMD160 (mlr-5.2.2.tar.gz) = 8147e4ff0a7125ece80246b35e0b54c1c8c50833 -SHA512 (mlr-5.2.2.tar.gz) = 1f6843fb08e3e3c59912e673636fc7d52246ab9a49a0df25c4b11a17ed7576e0c27e10c06f164a9df8e4b30d8f1715088161187b8126fecc84ef50774dcf7b93 -Size (mlr-5.2.2.tar.gz) = 1191162 bytes +SHA1 (mlr-5.6.2.tar.gz) = 4a3fb995a65a9960bb2e53bd565081d491aba8b1 +RMD160 (mlr-5.6.2.tar.gz) = 51e6d16ca6d012e47d8cad29d643c7da943a0535 +SHA512 (mlr-5.6.2.tar.gz) = d5c984c1db045219c79564251193ec4887582987cde980df6705e10e246d230d92fd9197e2c207545133f96e7cd292fc1eb494e8c57384d6ba0a90a83c4f1dd9 +Size (mlr-5.6.2.tar.gz) = 1280257 bytes |