Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror

Slashdot videos: Now with more Slashdot!

  • View

  • Discuss

  • Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

×

Comment: Re:What's the problem? (Score 1) 167

by swillden (#49497259) Attached to: Social Science Journal 'Bans' Use of p-values

There really aren't any good ways to measure those other effects. If you knew how your experiment was biased, you'd try and fix it.

Randomized sampling goes a long way, but only if you have a large enough population. This is one of the problems of social sciences. A randomized 10% subsample from 100 subjects ain't gonna cut it. A randomized subsample from 10,000,000 people isn't going to get funded.

Why wouldn't a randomized subsample from 10M people get funded? The required sample size doesn't grow as the population does.

Comment: Re:What's the problem? (Score 4, Insightful) 167

by swillden (#49495247) Attached to: Social Science Journal 'Bans' Use of p-values

Actually, p-values are about CORRELATION. Maybe *you* aren't well-positioned to be denigrating others as not statistical experts.

I may be responding to a troll here, but, no, the GP is correct. P-values are about probability. They're often used in the context of evaluating a correlation, but they needn't be. Specifically, p-values specify the probability that the observed statistical result (which may be a correlation) could be a result of random selection of a particularly bad sample. Good sampling techniques can't eliminate the possibility that your random sample just happens to be non-representative, and the p value measures the probability that this has happened. A p value of 0.05 means that there's a 5% chance that your results are bogus in this particular way.

The problem with p values is that they only describe one way that the experiment could have gone wrong, but people interpret them to mean overall confidence -- or, even worse -- significance of the result, when they really only describe confidence that the sample wasn't biased due to bad luck in random sampling. It could have been biased because the sampling methodology wasn't good. I could have been meaningless because it finds an effect which is real, but negligibly small. It be meaningless because the experiment was just badly constructed and didn't measure what it thought it was measuring. There could be lots and lots of other problems.

There's nothing inherently wrong with p values, but people tend to believe they mean far more than they do.

Comment: Re:Does it report seller's location and ID? (Score 2) 139

by swillden (#49490627) Attached to: Google Helps Homeless Street Vendors Get Paid By Cashless Consumers

The phone then reports this seller's ID to some central server. Does it also report geolocation data?

I seriously doubt it. I don't see how location reporting for a payment transaction in which location data is irrelevant could possibly pass Google's privacy policy review process. Collection of data not relevant to the transaction is not generally allowed[*], and if the data in question is personally identifiable (mappable to some specific individual), then a really compelling reason for collection is required, as well as tight internal controls on how the data is managed and who has access. I don't see what could possibly justify it in this case, and I can see a lot of risk in collecting it.

FYI, Google product teams have to develop privacy design docs for all new products, and the designs have to be reviewed by the privacy team (or their delegates) and pass the privacy review before they can be launched. Although Google set these processes up before the FTC settlement, I believe they became part of the consent decree and are now mandated by the FTC and validated in regular audits, so Google can't skip or violate them without potentially-significant consequences.

Disclaimer: I'm not a Google spokesperson and this is not an official statement. It is my personal perspective on the process and requirements. However, I'm a Google engineer who's been involved in launching privacy-sensitive products, so I think my perspective is accurate. I also do security reviews of Google projects, which sometimes touches on privacy issues (though privacy review is separate from security review, as it should be).

[*] Just to head off a likely riposte: No, StreetView Wifi collection and the Safari do-not-track workaround are not counterexamples. They predated the privacy review processes and, as I understand it, were part of the motivation for establishing the processes.

Comment: Re:Not fully junk (Score 1) 295

In fact, by decapitating this girl and digging her brain out of her skull, they've guaranteed she is forever dead.

As opposed to what? Cremation? Burial in a box at temperatures well above freezing? You can't seriously argue that this approach makes it less likely that she could be repaired and restarted at some point in the future than typical corpse disposal methods.

Comment: Re:Push technology is for phones, not computers (Score 1) 197

by swillden (#49475799) Attached to: Chrome 42 Launches With Push Notifications

There is ZERO reason to have this on desktop PC's even for things like IM programs.

Why? Do you like having to keep browser tabs open for your IM, e-mail, calendar, etc? Or to use some extension or plugin? I always keep gmail (actually, inbox) and calendar tabs pinned, but with push notifications I might not have to bother with that any more.

Comment: Re:Debt (Score 2) 207

by swillden (#49467701) Attached to: Linux Getting Extensive x86 Assembly Code Refresh

Thats not really the debt though. The debt is when you get a giant wad of funding and dont take the time to greenfield your cludge app.

No, that is debt, and in many cases it is the biggest single source of debt, as Darinbob said. Failing to rewrite your kludge app is just failing to pay down your debt. Whether or not the moment you get a giant wad of funding is the right time to do that depends on the context.

Comment: Re:The Cloud (Score 1) 442

Encryption algorithms are constantly being tested and broken

Bah.

Oh, there's no doubt that cryptographers are continually creating and breaking new ciphers. But there are well-proven ciphers that have stood unbroken for decades. DES, for example, although the key size is now too small, has stood strong for 40 years, and in fact if you don't mind applying it three times is still proof against anyone in the world, as long as you keep the keys safe.

I have little doubt that AES will do just as well, and its much larger key size means that we won't face brute force becoming tractable, not unless actual breaks in the algorithm are discovered.

From the NSA and other governmental entities deliberately weakening the tools we use to encrypt

The fundamental algorithms, like AES, are relied upon by governmental entities, which the NSA is tasked with securing. We do have evidence that the NSA has been working in various ways to compromise implementations, but they can't really do that with ciphers, because if you modify a cipher, it won't pass the standard test vectors. The NSA can try to introduce things like side channel attacks, but if the NSA is actively monitoring your machine while you encrypt and decrypt (which is needed to exploit a side channel), you should just give up and flee to Russia now.

Really, if you use standard, widely-trusted cryptographic algorithms and are careful with how you generate and manage your keys, no one is going to get your data. Not even the NSA. Generation can be a little tricky to get right, but there are lots of good solutions. Probably the best is to randomly (e.g. with diceware) select a string of a dozen or so dictionary words, then use them with a standard key derivation function to produce a key, then use that to encrypt your data.

If that sounds complicated... it's really not. "gnupg -c" does a fine job. Run that on your files and upload them to the cloud. Since you're apparently happy with the security of your home machines (ha!), you can store a copy of the passphrase on the machine that does the backups, so you can automate everything. So that you can recover your data in the event of a fire, write down the passphrase on a piece of paper and store it somewhere offsite and secure. Give it to a friend or family member that you trust, or put it in a bank safe deposit box.

Encryption, cloud storage and an offsite copy of the decryption key are the way to go here.

If you want a nicely-automated version of all of that, you might want to look into the services provided by leastauthority.com.

Comment: Re: Tabs vs Spaces (Score 1) 427

by swillden (#49460425) Attached to: Stack Overflow 2015 Developer Survey Reveals Coder Stats

Once you verify there are no tabs in your source code (which is trivial, e.g. grep -l '\t'), then any indentation or alignment problems are visible and the needed fix obvious (insert or remove spaces). The problem with mixed tabs and spaces is that since both are non-printable characters, you can't easily see which is which, which makes finding lines with improper mixtures difficult. You can write a regexp to identify lines with tabs that follow non-tab characters, then go fix them up, and you can even do things like create commit hooks to find such problems and deny commit, but its much less effort to simply disallow tabs from the outset. In theory it seems like it's roughly the same problem (identifying files with tabs vs identifying files with tabs that follow non-tabs), but in practice if you simply ban tabs you almost never end up with files that contain them. Trying to enforce the "correct" ordering of tab and non-tab characters always results in work.

In addition to that, the point of mixing tabs and spaces is to provide variable indentation size, to match developer preference... but even when that works it creates yet another problem: disagreements about line wrapping. Developers who use larger tab sizes naturally need to wrap lines earlier (shorter lines), and those who use smaller tab sizes want to wrap them later (longer lines). This pretty much guarantees inconsistent line wrapping.

This, of course, all assumes that you're doing manual formatting. The best way to handle formatting is to use a tool (e.g. clang-format) and ban all manual formatting. If you do that, then you can actually have the tool do the tabs-and-spaces mixture thing (e.g. for clang-format "UseTab: ForIndentation"), and, assuming the tool works, your tabs and spaces will always be applied correctly. But if you're using a tool like that, then there's really no point in mixing tabs and spaces, because any developer who wants a different tab size can simply run the tool with their preferred settings (including line wrapping), work on the code, then before submitting run the tool with the project settings. Ideally, the preferred settings should be applied by an editor load hook, and the project settings applied by a commit hook.

In short, with proper tool support, mixing tabs and spaces provides no value. Without proper tool support, it creates more hassle than its worth. Particularly since it really doesn't take long for a developer to get used to any given indent size.

Finally, among the whole space of developer code formatting preferences, indentation level is a trivial thing. If you want your team's code to have a consistent look (and you do... it really matters for maintainability), then you're going to have to specify a LOT of other things that different developers will disagree on, and which their editors can't paper over. So they're going to have to suck it up and accept some team style guidelines anyway. May as well include indentation level.

"It's ten o'clock... Do you know where your AI programs are?" -- Peter Oakley

Working...