Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
User Journal

Journal TTK Ciar's Journal: Quick interlude: select

I just posted the latest iteration of the "select" utility (for unixy systems) here.

"select" selects rows and columns from STDIN like the SQL "SELECT" statement selects rows and columns from a table (more or less). It understands how to parse tab-delimited text, CSV, hash, XML (poorly), and JSON. It can also emit any of these formats to STDOUT .. even if the format coming in on STDIN is different. This can make it also handy for simply translating between different text formats.

At The Internet Archive, I was living in data format hell. Our third-party contributors were pushing metadata at us in all manner of formats, the (very powerful) Petabox department refused to deal with anything other than XML, and the Collections department (where I worked for the last couple of years) had a whole lot of existing tools which used hash-format, but I was personally transitioning from hash to JSON. I already had a "selectcol" utility which could read tab and hash formats, so I evolved it in the direction I so very desperately needed.

At my current position at Discovery Mining, I've needed to translate between even more data formats, and the need for selecting rows and columns from large tabulated datasets has become even more pressing, so I forked "select" into a public domain part and a proprietary part. Improvements I develop on my own time will go into both, while improvements made during company hours will go into DM's copy. Since I'll be actively using this tool to get real work done, its development is likely to stay lively. So if you find the need, feel free to check on its status from time to time. I'll try to push out changes at least every other month.

Another recent update: prefix, which is mostly a timestamping tool. I'll write some documentation when I get around to it. Without arguments, it simply loops through STDIN, prepending a timestamp and emitting the result to STDOUT. With arguments it can prepend/append many other things to its input to generate realtime-annotated output. Yeah, I'll write some documentation.

-- TTK

This discussion has been archived. No new comments can be posted.

Quick interlude: select

Comments Filter:

Two can Live as Cheaply as One for Half as Long. -- Howard Kandel

Working...