Comment Re:On Perl and command-line utilities (Score 1) 267
Suso Banderas should follow up on this goal to implement the ability to simultaneously operate on columns of data.
A month ago, I wrote an awk script which calculates mean, standard deviation, variance, min, max, sum, and count (see below) for a given stream of numbers.
#!/bin/nawk
$1 ~
if (N>1) {
if (min>x) {min=x};
if (x>max) {max=x};
sumx = x + sumx;
oldavgx = avgx;
avgx = avgx + (x-avgx)/N;
varx = (N-2)/(N-1)*varx + N(avgx - oldavgx)^2;
}
else {
min = x;
max = x;
sumx = x;
avgx = x;
varx = 0;
}
}
END { print avgx,sqrt(varx),varx,min,max,sumx,N }
This took me very little time to write, and it covers half of numutils scope of effort. The numutils package should shift focus away from calculating means and bounds.
This is my suggestion: For each utility, determine what numutils does which is a pain to accomplish in awk or perl. Focus on those areas.
Some of these scripts are excellent examples of what can be accomplished in Perl, though. And better commented than most.
Personally, i'm interested in finding something that would compute the median and percentiles for a given stream of input data. I was excited to see "numutils" but was dismayed as not finding the variance. I would like to see an open source version of something like the NAG utilities such as nag_summary_stats_1var or nag_5pt_summary_stats.
I guess I'm just waiting for the the Commons-Math Jakarta Mathematics Library project to get released.