Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?

Comment: Re:Issues (Score 1) 312

by leaen (#45979371) Attached to: Why Standard Deviation Should Be Retired From Scientific Use

O(log(n)) for each and every insertion, yes... when you are doing n insertions, that becomes O(n log n). If you are trying to compute the mean deviation at every step as well, you are looking at O(n^2 log n),

No, you just failed data structure class. Insertion takes O(log(n)) with bookkeeping needed to find mean standard deviation in O(log(n)) time which gives a O(n log n) total time. All you need to know to calculate deviation is sum and number of elements above mean and sum of elements below mean. I explained it in more detail in parent post, there is standard data structure that can calculate sum of elements in of elements in given range in O(log(n)) time and supports insertion in O(log (n)) time.
A quick google query found following implementation:

because you cannot compute the mean deviation without revisiting *every* element you've collected so far, regardless of how they are stored or sorted.

Repeating a lie does not make it true.

Comment: Re:Issues (Score 1) 312

by leaen (#45976243) Attached to: Why Standard Deviation Should Be Retired From Scientific Use

Collecting the data alone is a log(n) step... and can be worse if you are trying to keep the data sorted while you collect it.

Use red-black trees these keep data sorted with O(log(n)) worst case bound for insertion.

How can you calculate the mean deviation at any time without revisiting all of the data points that you have collected so far? How can you calculate do it in any time better than O(n)? Calculating standard deviation takes O(1) and does not require reexamining the data at all if you've been keeping track of right things during data collection (which still takes O(n)).

That is typical exam question for data structures class. You maintain a red-black tree and for node you keep a sum and count of elements of its subtree (you need to update these in rotation and thats it). As red-black tree has logarithmic height you easily find sum of elements greater than given number in logarithmic time. Just do binary search and sum values for subtrees whose smallest element is greater than searched element.
Once you have that a mean absolute difference by following expression
(sum_greater(mean) - count_greater(mean) * mean) + (count_less(mean) * mean - sum_less(mean))
and you can get each term in O(log (n)) time.

Comment: Re: Basic Statistics (Score 1) 312

by leaen (#45975029) Attached to: Why Standard Deviation Should Be Retired From Scientific Use

Bzzt. Mathematically correct but practically wrong. Any real or simulated dataset from which you would want to compute a standard deviation will have the property that it will be a list of (most likely) double precision floating that is finite in size. This data defines a distribution that always has a finite first and second moment, so you will get a number that you can confidently call the standard deviation of the data. Even if it comes from physical process with a nonsense distribution like a Cauchy distribution, the standard deviation you compute will give you a bound on the spread of your data. If it's Gaussian, you can go back to your statistics class and say that 95% of the data will be within two SD's, etc. If it's not, you can use the Chebyshev rule ( to say that at least 75 percent of the data will be in two SD's, 89% will be within 3 SD's, etc, which is much coarser information, but is still reasonable to look at for worst-case analysis.

Yes but also useless when you have enough data to get reasonable mean estimate. You do not need Chebyshev inequality for getting confidence intervals, just compute appropriate 2.5percentile and 97.5percentile for 95% interval. By Glivenkoâ"Cantelli theorem these for measurable function converge regardless to distribution and are not sensitive to outliers.

Comment: Re:Issues (Score 1) 312

by leaen (#45974355) Attached to: Why Standard Deviation Should Be Retired From Scientific Use

ncorrect... you need one pass to collect the data, and a second pass to compute the mean deviation. Both passes are O(n). You do not need to do a second pass to compute the standard deviation, it can be calculated in O(1) time based on data collected in the first pass. If you are only computing this once, doing two O(n)'s is just O(n), but if you are wanting to continually recalculate the mean as you add more elements to your data set, then the difference between them becomes much larger... mean deviation with data collection ends up being quadratic with the amount of data collected, while standard deviation with data collection remains linear with the amount of data collected.

Still incorrect, you need to know data structures for that. When you use a red-black tree where you in node maintain sum of element below node then you can compute sum of elements in arbitrary interval in O(log(n)) time. That cuts complexity from quadratic to O(n log n)

Comment: Re:Good Stuff (Score 1) 92

by leaen (#45645045) Attached to: New Superconductor Theory May Revolutionize Electrical Engineering

Well, superconductors killed my dad, so I'm looking for an immediate ban. If you don't like that, you can just say that directly to distraught face of my poor widowed mother. Superconductors also stole all of the insurance money and repeatedly raped my sister. Well, she called it rape, but really there was no resistance.

Sorry, we are superconductors. Resistance is futile.

Comment: Re:I Used a Popular Online Tax Service... (Score 1) 237

by leaen (#45471407) Attached to: Ask Slashdot: Can You Trust Online Tax Software?

1. Buy a stock that you expect to decrease in value in the short term, but to make money in the long term. You pay, say, $10,000.

2. It drops to $5,000. Sell, you can mark off the $5,000 loss on your taxes.

3. Wait 30 days, then take that $5,000 and buy the same stock again. You can still take the $5,000 loss, but if (when) the stock finally appreciates, you make money there, too. :)

What about following plan.

1. Put $10000 in bank.

2. Wait 30 days, buy $7500 of stock and $2500 for taxes.

3. ???

4. Profit

Comment: Re:GCJ vs. JIT (Score 1) 181

by leaen (#45457619) Attached to: GCC 4.9 Coming With Big New Features

P.S.: While I understand that much C/C++ syntax is driven by prior choices, much of this new syntax is UGLY. That's been a problem ever since templates started appearing, but it's gotten worse with every addition. At some point they need to do a de novo redefinition of syntax, and define an isomorphism between the two syntaxes. Then a compiler switch can alternate between syntaxes until the current version can be deprecated. I'm starting to think that APL had a better design than modern C++, and that was BAD. Now, in addition to > they've got [[ ]], and I guess next will be (( )) (unless that's already in use somewhere).

Yes, (( )) is used in attributes. Also ({ }) is used to convert compound statement into expression. You are left with {{ and {( and when these will be taken we start to use {) (}.

Comment: Re:Nothing you can do? (Score 1) 99

by leaen (#45049231) Attached to: The Hail Mary Cloud and the Lessons Learned

This is one reason why people recommend sudo instead of su. The admin logs in as himself and gains root privilege using his personal password. There is no shared root password, so you only have to disable the old admin's account and sudo access.

They recommend it as safer in theory.
In practice sudo is source of jokes like:
Q: H0w d0 I h4ck ubuntu?

Comment: Re:Let me be 1 of the 1st here (Score 2) 478

by leaen (#44931827) Attached to: Utility Sets IT Department On Path To Self-destruction

Not necessarily: If it is domestic high-quality outsourcing, you know the people personally and there is a long-standing connection, it can work. But off-shoring basically never works and a cultural gap ensures that. Same wit off-shoring to China.

Quite the contrary, Chineese contractors are very intelligent. You can get a top talent from Chineese intelligence agency for cheap.

To restore a sense of reality, I think Walt Disney should have a Hardluckland. -- Jack Paar