Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×

Comment Re:CRC (Score 0) 440

Other commenters have observed your big-O inefficiency here. I had the same initial approach in my script http://gnosis.cx/bin/find-duplicate-contents (that I've mentioned in another post), but changed it when I realized it took hundreds of times as long to run as the correct hash-and-sort approach.

The problem is that if you have many files of the same size, you need to do a pairwise comparison of every pair from among them. This multiplies the operations enormously in what I found to be typical cases on my filesystem. For example, if you have 50 files that all are *exactly* 1GB in size, you need to do "50 choose 2" comparisons, i.e. 1225 of them. If the files wind up differing in the first few bytes (few meaning even first few megabytes), that's still possibly cheaper than hashing the whole files. However, if genuine duplicates exist in many of those cases--which is more typical--you wind up having to read through the entire identical content many times.

In contrast, if when you find same-sized files you commit to making exactly one read of each of them (but indeed, the entirety of them), you can store an MD5 or other hash of the whole thing, and then just compare those short MD5 sums. In actual testing and refinement, I have found this to be a hell of a lot cheaper (as in tens or hundreds of times cheaper in my actual test cases). Of course, the actual answer depends on what files of what sizes you actually have, and it is easy to construct artificial cases in which my approach (I didn't invent it, of course) loses. But not in my real-life experience.

Comment My own script (feel free to change) (Score 2) 440

My home-rolled solution to exactly this problem is: http://gnosis.cx/bin/find-duplicate-contents.

This script is efficient algorithmically and has a variety of options to work incrementally and to optimize common cases. It's not excessively user-friendly, possibly, but the --help screen gives reasonable guidance. And the whole thing is short and readable Python code (which doesn't matter for speed, since the expensive steps like MD5 are callouts to fast C code in the standard library).

Comment Re:CUZ MOTHERFUCKERS WILL STEAL NO MATTER WHAT !! (Score 1) 272

This comment by cptdondo is actually quite good. But the weird reference to "Grandma Moses" (http://en.wikipedia.org/wiki/Grandma_Moses) is just plain weird, I notice. I suspect that s/he heard the name somewhere without knowing who it was, and took it as a generic rhetorical figure rather than the name of a specific American artist... unless I'm missing something subtle about Anna Mary Robertson Moses' views on copyright. :-).

Comment Re:Ho ho ho, that's rich. (Score 0) 388

Estonia is a shining example of likely large scale fraud and hacking in internet voting. The online system was used by around 30% of voters, and the online vote was in favor of the ruling party at a far higher rate than in the ballots cast and counted on paper. This is pretty damn strong evidence that fraud occurred, and for any country of over 1.5 million people, the scale and likelihood of fraud would be even more motivated, even greater scale, and even more likely. There is hardly a clearer example of why online voting is a HORRIBLE idea that compromises democracy.

Comment The really scary thing (Score 1) 114

I'm not actually much concerned about Iran's nuclear program. Deterrence and MAD actually worked pretty well during the Cold War, and if Iran had nukes (which there isn't any evidence they are actually developing, but there's just enough of a hint of that to have some possible deterrent effect) the chance of Israel launching a war of aggression would be less.

But it scares the shit out of me to think that Iran is running WINDOWS on sensitive installations, for Uranium processing (even for reactors it is not exactly *safe* material) or other important security/safety functions. If this stupidity exists elsewhere in the world, we live in a VERY SCARY world (like most of the people in the world, probably, I don't live that many miles from a nuclear plant).

Comment Base rate fallacy (Score 5, Informative) 998

It seems good to read http://en.wikipedia.org/wiki/Base_rate_fallacy (or other articles on the topic). If the article is correct that 2.4% of new cars sold are hybrids (which sounds reasonable) then the base rate expectation for a "random person" buying a hybrid is low. If the probability of a previous owner of a hybrid buying one next time is 35%, that's still around 14 times the base rate expectation.

Now clearly, car buying habits are hardly monte carlo style distributions. There is a considerably greater "loyalty" to specific cars than just the random assignment of an available vehicle to a driver. Most of that is probably pretty closely tied with income and socio-economic status. Also, obviously occupational effects matter; and also regional ones do. But consistency in brand or style in repeated car purchases is most certainly far lower than 100%.

It is not at all clear from the evidence given whether hybrid-loyalty is greater or less than other types. For example, I *just* bought a Honda Insight (which seems a lot less common than Toyota Prius, despite what seem to be even more favorable reviews; name recognition does seem big here). Like literally days ago, so I'm probably not good evidence in any direction about next vehicle purchase. But prior to that (and still), my partner and I own an Audi A4--a brand that probably sells no more than 2.4% of cars in the US (i.e. the brand as a whole, not the specific model which must be lower still). Even if a hybrid were out of consideration and I could only consider a conventional gasoline engine, I think there's much less than 35% chance I'd choose an Audi for my next car. Not because I have any particular criticism of Audi, but just because there are lots of other choices, even given similar driving patterns and socio-economic status. I could buy a Saab, or Volvo, or Acura, or maybe on a bit pricier side a BMW, Mercedes, Lexis, or slightly downscale a Buick or Lincoln, or a VW which comes from the same factory even. All of these are pretty comparable, and brand loyalty might lean my decision slightly, but there's a long way to go between the base rate--even of only "semi-luxury sedans"--to get to 35% brand retention.

Twitter

How Will You React To Twitter's Regional Censorship Plan? 181

Despite (and probably partly because of) its much-touted role as a communications link in the Arab Spring protest movements of the last year, Twitter announced a few days ago that it could be (which I take to mean "will be, and probably are") selectively blocking tweets based on local governments' requests. This AP story (as carried by stuff.co.nz) gives an overview of the negative reaction this move has drawn; unsurprisingly, there's talk of a boycott. The EFF has what seems to be a fair look at the reality of Twitter take-downs, noting that for various reasons they remove certain content already, but not as much as some parties would like; VentureBeat looks at the thousands of take-down notices the company received last year. If you use Twitter, does the recently announced region-specific blocking change what you'll use it for?

Comment Re:More paid disinformation from Apple? (Score 1) 463

It *is* true that I find a few faults with the Kindle Fire in my early use of it. Most of these could indeed be fixed with software updates, and I think at least some of them *will* be. In part this is just that the Amazon Marketplace has much less than the Android Marketplace... I confess I have not tried side-loading applications, and am not sure how hard that would/will be.

* I'd really like a native GMail app like I have on my Android phone. The mobile website is kind of OK, but it completely depends on Wifi access, unlike the phone app which caches the latest emails.
* Possibly a really native Facebook app would be nicer than the webpage too, although the latest update to my Android phone FB app looks ever more similar to the web page anyway.
* Multi-user would definitely be nice, but I think this is unlikely to happen.
* I think the Words-to-Go app is really nice as a PDF reader (Acrobat is OK, but I don't like it as well in the Kindle/Android version). However, in a slightly annoying inconsistency in user interface, the latest Books/Documents/Apps/Webpage/Music/etc. all appear on the "top shelf" but getting to a PDF document (or the various other formats, mostly MS-Office related) requires the different (more desktop-like) process of launch-application/open-recent-file. The logical thing would be to let the documents read by applications like this get "top shelf" icons too. Possibly the same is true of the latest picture/video/whatever that I viewed in Gallery (which is also a launch-then-run procedure like a desktop).

I think I might prefer an external volume rocker like some users have said, and I'm not in love with the placement of the on/sleep/off button. But those are minor issues, and I don't *hate* either choice.

Comment More paid disinformation from Apple? (Score 0) 463

I am extremely happy with my Kindle Fire, far more than I would be with an iPad if someone had given me one for free. The form factor is right for the airplane and for reading in bed, much more useful for what I want it for than a 10" tablet would be.

It's true that exactly like the iPad, the iPhone, every Android phone, every other Android tablet, HP's ill fated WebOS tablet, most default OSX, Linux, and Windows installation with auto-login enabled, etc. that there is no privacy protection. It's a single user device, and anyone who sees the device can pretty easily determine what its user was doing on it recently (and in general). That is indeed perhaps a weakness, and I wouldn't mind having Android devices (especially tablets, but perhaps phones also) be multiuser (likewise for the iWhatever stuff).

During my most recent plane trip with my Kindle Fire, which unlike an iPad fits in my pocket, I:

* Read a variety of documents sent to the device from web pages using the Firefox Readability plugin
* Read some PDF documents
* Read (part of) some books that I purchased from Amazon
* Watched a video that I downloaded directly onto the device from a 3rd party website (in anticipation of flight)
* Listened to some music I had put locally onto the device
* Played a few moves of Words with Friends before takeoff
* Played Plants vs. Zombies while in flight
* Checked GMail and Facebook and Google+ quickly before takeoff (using Wifi connection to hotspot)

In every respect that I can see, not least including price, but even more so including Freedom, the Kindle Fire is a far better device than the iPad is.

Comment More paid disinformation from Apple? (Score 2) 463

I am extremely happy with my Kindle Fire, far more than I would be with an iPad if someone had given me one for free. The form factor is right for the airplane and for reading in bed, much more useful for what I want it for than a 10" tablet would be.

It's true that exactly like the iPad, the iPhone, every Android phone, every other Android tablet, HP's ill fated WebOS tablet, most default OSX, Linux, and Windows installation with auto-login enabled, etc. that there is no privacy protection. It's a single user device, and anyone who sees the device can pretty easily determine what its user was doing on it recently (and in general). That is indeed perhaps a weakness, and I wouldn't mind having Android devices (especially tablets, but perhaps phones also) be multiuser (likewise for the iWhatever stuff).

During my most recent plane trip with my Kindle Fire, which unlike an iPad fits in my pocket, I:

* Read a variety of documents sent to the device from web pages using the Firefox Readability plugin
* Read some PDF documents
* Read (part of) some books that I purchased from Amazon
* Watched a video that I downloaded directly onto the device from a 3rd party website (in anticipation of flight)
* Listened to some music I had put locally onto the device
* Played a few moves of Words with Friends before takeoff
* Played Plants vs. Zombies while in flight
* Checked GMail and Facebook and Google+ quickly before takeoff (using Wifi connection to hotspot)

In every respect that I can see, not least including price, but even more so including Freedom, the Kindle Fire is a far better device than the iPad is.

Comment Technical incompetence of parties (Score 3, Interesting) 332

I can actually see a reasonable discovery purpose in looking at the contents of FB pages, and that is mentioned in the article. For example, if the parties have made comments about how responsible they might be in a custodial role (something suggested in article), that could be germane.

But FB isn't really a walled garden anymore. Now there is a quite good "export my data" functionality within it. A reasonable judge's order would simply be for exchange of that downloaded data, which will contain all the relevant background that might exist with past posts. Obviously, this is contingent on parties not deleting old posts first, but other posters have already noted how doing that would be spoilation of evidence (and if parties would do that, they could equally do so with a live account after passwords were shared).

I do recognize that the article mentioned "dating sites" too. Those sites may still be walled gardens, and may well not provide easy data export capabilities. For those, the only way to look at relevant posts/emails/profiles/etc. might indeed be password sharing. Of course, who knows what general data policies those sites have--i.e. are messages automatically deleted after N days, and archives inaccessible to users? Access to password may or may not reveal the full history of site usage.

Comment Re:Not necessarily. (Score 1) 1040

I would put it in exactly the opposite way. A GUI is an efficient way of doing a small number of specialized tasks. For more general requirements, a (good) CLI always wins... and wins by many orders of magnitude. That said, editing graphics and videos is absolutely one area where a GUI often wins. That's mostly because our interaction and understanding of those data formats is inherently visual, and most work with them is interactive. There's usually no systematic or programmatic way to describe what we want done... and in fact, what we want emerges out of interactions with the data.

On the other hand, for things that can be systematized--even on an ad hoc basis--a CLI might be hundreds or thousands of times easier. I preview pictures like everyone. I also often, for example, look for data and patterns in files. To do that, I can do things like (off the cuff):

        % find /home -name *.csv | grep "Date: 2011-1[01]" | cut -d, -f1,6 > interesting.data

If I happen to have a thousand files in various directories that match the name pattern, and many of them have records for dates in Oct-Nov 2011 with a field I care about, what exactly might I do in a GUI?! Spend hours and hours hunting for the files, opening each, copying the data, etc? I could, but that would suck. Or I could also certainly write a *program* in some language other than bash to walk the directories, open the files, etc. But that's much less interactive and more clumsy than just doing it with one command (albeit, some other programming languages let you express something similar with little more text than the bash line... however, those are still matters of typing the write text, not of clicking on icons and dragging a mouse around).

Now clearly for some frequently repeated tasks, makers of applications and operating systems add in special menus and toolbars to do complex tasks like the above line. But those menus, dialogs, etc. always wind up being less flexible than the command-line and missing future uses that are easy to express with commonplace command-line simple tools (like find, grep, cut, etc).

Comment Simpler explanation (Score 1) 185

Most of the comments below--and to a large degree the source article--seem to implicitly assume that all discussion of programming languages happens on Stack Overflow. There probably is some difference in the average experience level of programmer of various languages. But it's also almost certainly the case that OTHER websites also discuss programming languages. For example, someone interested in finding a solution to Python puzzle might well go to the Python Cookbook (http://code.activestate.com/recipes/langs/python/) rather than Stack Overflow. Similarly, to varying degrees, for all the other languages mentioned (with various sites appropriate to each). All this really amounts to is that "Stack Overflow is a good place to find info on languages X, Y, Z; but not so good for A, B, C" ... and this effect is somewhat self-reinforcing, as users of the "underrepresented languages" look elsewhere for help.

The mere distribution of specialization on various websites says nothing at all about the quality, difficulty, breadth of use, or much anything else about the languages themselves.

Comment Too old and/or too stupid? (Score 1) 772

Well... at 46 yo, I've learned probably about a dozen new programming languages since I was a spry 40 yo. OK, a number of those I've "learned" relatively superficially... I wish I had more opportunities to get my hands dirtier on a daily basis with all the languages I've only played with a little in the last few years, but I get paid quite a bit to do relatively few things.

Actually, over the last few years, I've *written* at least two languages. Not quite programming languages, but one markup language and one, well "annotated grammar description" I guess you'd call it. Yes, I know that NIH syndrome is a bad thing, but there's a reason why I wrote what I did (trust me). On the shelves near me I have books on about a dozen PLs that I either haven't worked with at all, or have touched passingly; it wouldn't be true to say I'm actively reading all of those books, but I certainly glance at them.

Slashdot Top Deals

"Ninety percent of baseball is half mental." -- Yogi Berra

Working...