Comment: Re:I agree Python (Score 3, Informative)

Ask Slashdot: Best Rapid Development Language To Learn Today?

I've gotten a lot of mileage out of Python for cleaning and pre-processing CSV and JSON datasets, using the obviously named "csv" and "json" modules. ... However, if you are doing very much manipulation of tabular data, I'd recommend learning a bit of SQL too.

You may want to look into pandas as a middle ground. It's great for sucking in tabular or csv data and then applying statistical analysis tools to it. It has a native "dataframe" object which is similar to database tables, and has efficient merge, join, and groupby semantics. If you have a ton of data then a database and SQL is the right answer, but for a decent range of use cases in between pandas is extremely powerful and effective.

Comment: Re:Programming language in 2 hours ? Yeah, right. (Score 1)

Ask Slashdot: Best Rapid Development Language To Learn Today?

Because Ruby is my preference and I am more familiar with it, I can tell you that it is in continuous development, and bytecode-compiled versions are available (JRuby, which uses the JVM, and others). I do not know about Python in this respect because I haven't used it nearly as much.

Python has the default implementation CPython which compiles python to an interpreted bytecode; there's also Jython which compiles to JVM, and IronPython which compiles Microsoft's CLR. There's also Cython (which requires extra annotations) which compiles to C and thence to machine code, and numba which does compilation to LLVM. Finally there's Pypy which is a python JIT compiler/interpreter written in a restricted subset of Python.

Comment: Re:worthless top five phrases (Score 2)

Algorithm Distinguishes Memes From Ordinary Information

So they mined the journal for words and phrases... meh, those aren't memes

They are memes in the sense that they are specifically finding words and phrases that are frequently inherited by papers (where "descendant" is determined by citation links), and rarely appear spontaneously (i.e. without appearing in any of the papers cites by a paper). An important feature is that their method used zero linguistic information, didn't bother with pruning out stopwords, or indeed, do any preprocessing other than simple tokenisation by whitespace and punctuation. Managing to come out with nouns and complex phrases under such conditions is actually very impressive. You should try actually reading the paper.

Comment: Re:now if only people can stop calling netmemes me (Score 1)

Algorithm Distinguishes Memes From Ordinary Information

But the writers of TFA are still misusing the word

Actually no, they are not. By using citations to create a directed graph of papers they are specifically looking for words or phrases that are highly likely to be inherited by descendant documents and also much less frequently spontaneously appear in documents (i.e. not used in any of the cited documents). They really are interested in the heritability of words and phrases.

Comment: Re:We should all like this Bitcoin *concept* (Score 1)

This Whole Bitcoin Thing Could Be Big, Says Bank of America

You are solely focused on bitcoin as an investment opportunity rather than its intrinsic utility.

Sure, but as far as intrinsic utility is concerned it doesn't matter when I get involved with bitcoin ... well, in fact it does: right now the price instability and general uncertainty mean it is far better to not get involved, wait for all this nonsense to sort itself out and join the game once everything is settled, stable, and bitcoin is actually being used purely for its intrinsic utility. In other words, it's better for me to ignore it for a few more years at least.

Comment: Re:A link between DPR and an early Bitcoiner (Score 5, Insightful)

Study Suggests Link Between Dread Pirate Roberts and Satoshi Nakamoto

I think the more interesting part is the fact that we have some decent mathematicians (in this case Adi Shamir among others) are setting about pulling the entire bitcoin transaction graph and doing some serious data-mining on it. The reported result sounds like a mildly interesting result that happened to pop up in the first pass.

Given the advanced tools available these days for graph mining (largely developed for social network analysis among other things) I suspect some rather more interesting results may start coming out soon. What may seem hard to track on an individual basis may fall somewhat more easily to powerful analysis tools that get to make use of the big picture. I bet there's some interesting info on cliques and exchanges that could be teased out by serious researchers with some decent compute power at their disposal. Pseudonymity may be even weaker than you might think.

Comment: Re:a skeptic says "wow bitcoin is serious ". Hope (Score 2)

195K Bitcoin Transaction

Try pricing in Zimbabwean dollars - you'll see the same problem.

Well, you won't anymore because the Zimbabwe dollars were discontinued and the country now uses US dollars as its currency because price volatility made continued use of Zimbabwe dollars as a currency effectively impossible.

Now Zimbabwe had inflation not deflation, but the issue of volatility is the same: it makes things ultimately unworkable if it gets too high (even if it moves in a predictable way). When prices change significantly* by the minute and transactions take several minutes to complete then trouble may set in.

* significantly here means, say, double digit percentage change in price every minute. Bit coin is a long way from that currently, but is headed in that direction.

Comment: Re:yet another programming language (Score 1)

Stephen Wolfram Developing New Programming Language

Being primarily a mathematician and not a computer scientist or engineer I've used Maple, Mathematica, Matlab, Magma and R. I've also programmed in Python, Perl, C, and Java and dabbled in things like Lisp and Haskell.

All the "math" programs on that list are terrible programming languages; they work great as interactive environments for doing (potentially symbolic) computation, but writing code in them? Ugh. If I actually have to write scientific computing code it's going to be in Python using numpy and sympy, or C if I need performance.

All the different math programs all have their strengths and weaknesses: Matlab kicks the crap out of the other for anything numerical or linear algebra related, both for ease of expression and performance; R has far more capabilities statistically than any of the others -- data frames as a fundamental data type make that clear; Magma is incomparable for the breadth and power of its algebra, none of the other come remotely close; Mathematica and Maple are ... well, sort of a poor jack of all trades that do most things but none of it very well.

Comment: Re:Remove CTRL + C as well (Score 1)

Middle-Click Paste? Not For Long

Especially in an environment like Gnome 3 where the preferred method of working is full screen apps. Drag and drop to what?

I'm not really sure full screen is "the preferred method" in gnome 3 (I use gnome 3 and never full screen apps). Anyway, presuming you want to drag and drop you can drag to the Activities corner which will take you to the expose style overview from which you can select any window and drop into it. I've never done this until just now to see if it works and it does and is quite smooth (hover over a window for a second to have it restore as the front window if you want to drop to a particular location within the window).

Comment: Re:FUCK OFF (Score 3, Insightful)

Middle-Click Paste? Not For Long

I try 'desktops' from time to time but they don't really give me much beyond managing windows. you know, the thing that fvwm does well enough and with 1/10 the memory and cpu.

A lot of 'desktops' these days are things you don't see immediately; the toolkits, internationalization/localization, canvases, setting centralization and management, advanced font handling, notification plumbing etc. that most GUI applications make use of these days (from one desktop or another). Presuming you're using apps other than xterm (and perhaps you are not) you are actually making use of most of this stuff; the part of the `desktop`you`re not using is simply the window manager and the panels which are, ultimately, the tip of the iceberg.

Comment: Re:interesting take. (Score 3, Insightful)

Mozilla Labs Experiment Distills Your History Into Interests

It could work; it's not sending any data that couldn't be extracted from your history anyway (which they are largely getting now via blanket tracking) so it's not especially detrimental to the user.

Well, depending on how much you are blocking cookies and trying to keep information out of the hands of advertisers and other internet douchebags, you may feel differently.

Mozilla has said this is something you can opt out of, so it's no worse than blocking cookies etc. (and, in fact, is probably easier).

How about you develop tools to keep my information out of the hands of those 3rd parties? Instead they just seem to be looking to become yet another broker of your information.

Looked at the right way, this is almost exactly that. Presume for a moment that it works (a big if) and advertisers take to using this instead of pervasive tracking. Now we're is a place where we have a single central point of data release to advertisers; you can turn it off; you can potentially drop in a plug in that publishes a hand-crafted/approved list of "interests" instead of mining your history for it; etc. If it works it does give more control to users over their privacy.

The reality is that information is currency these days, and people will mine for this sort of data because it is valuable. You won't have much luck just blocking everything because the incentives to find a way around whatever blocks are put in place are high. So, assuming information is going to be given, trying to give the user more control over what information is handed over seems like a good thing. I doubt this particular plan will actually work, but I expect something along these lines will happen eventually.

Comment: Re:interesting take. (Score 3, Interesting)

Mozilla Labs Experiment Distills Your History Into Interests

It makes sense if advertising companies were nice people, but please never turn this on by default. Most likely they will just add the info that you supply them to their trove of tracking data.

It could work; it's not sending any data that couldn't be extracted from your history anyway (which they are largely getting now via blanket tracking) so it's not especially detrimental to the user. On the other hand it is essentially doing the data mining and summarisation that the advertisers are going to have to do on the client side ahead of time. Getting your product to do some of your compute work for you may be enough of a carrot to get advertisers to end up taking this is preference to all the raw data collected by pervasive tracking.

Comment: Re:Gawd (Score 5, Insightful)

Love and Hate For Java 8

And it doesn't mean Java doesn't have serious flaws. There's something deeply ingrained in Java that encourages over-engineering. But every language has its pitfalls.

I don't think there's much in Java the language that encourages over-engineering; it's more in the community that surrounds Java. It's in the tutorials, and books, and code examples and discussion groups . It's in the frameworks and libraries.

The reality is that a "language" is as much shaped by the community that grows up around it as by the actual language itself. Perl doesn't have to be particularly unreadable, but the culture that grew up around perl in the late 90's that was obsessed with cute hacks, fewest keystrokes, and self created obscurity created a state where anyone learning perl was immersed in that culture and came out writing a lot of unreadable stuff. It is my understanding that since many of those programmers left perl for other languages perl has been remade as "Modern Perl" which is largely the same core language, just with a different and libraries, and is quite readable.

Conversely python can be made quite diabolical (just through together chains of nested list comprehensions and single character variables for example), but because it grew up with a culture of "one obvious way to do it" and readability most code you'll see tends to eschew such things, and strive to read like pseudo-code. Again, there's not that much inherent in the language, it's the cultural conventions surrounding the language that enforce much of that.

Java fell in with the Enterprise crowd, and consequently found itself immersed in a culture obsessed with design patterns and over-engineering. Had things gone a little differently with, say, in browser applets somehow becoming the primary driving force for java (let's assume they ran better say) then I doubt java would be known for over-engineering.

Comment: Re:Honesty? (Score 1)

How Climate Scientists Parallel Early Atomic Scientists

No spatial variation is treated as spatial variation, but the central limit theorem still applies wrt the mean temperature over spatial variation. Temporal variation is treated as temporal variation, but when deteermining the mean over a time period the central limit theorem stil applies and gives greater accuracy for more measurements over a time period. Etc. Apply a little bit of common sense.

