Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror

Slashdot videos: Now with more Slashdot!

  • View

  • Discuss

  • Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

×

Comment: Re:It depends (Score 1) 479

by Trailer Trash (#49343623) Attached to: No, It's Not Always Quicker To Do Things In Memory

Even if you wrote this in C in the style in which they did it the program would be slow. Since there's no way to "extend" a C string, it would require determining the length of the current string (which involves scanning the string for a null byte), malloc'ing a new buffer with one more byte,

There is. It is called realloc. If you are unlucky, it will just divide the number of times the system actually performs by 16 or whatever the malloc implementation uses as an alignment, but once the allocation gets big enough you get a pages directly from the system, and it just maps in more pages on the end.

malloc isn't the problem, though. My point was that if you write it in the style of the code in the paper (don't keep track of the string length between character appends) then it'll still have to scan the string a million times. If you know ahead of time that you're going to append exactly one million characters to the string then you need but one malloc, right? I can make this program extremely fast in that manner but that's not what they're doing.

The Almighty Buck

Russian Official Proposes Road That Could Connect London To NYC 224

Posted by samzenpus
from the use-the-bathroom-before-you-go dept.
An anonymous reader writes There's great news coming out of Russia for epic road trip lovers. Russian Railways president Vladimir Yakunin has proposed building a highway that would reach from London to Alaska via Russia, a 13,000-mile stretch of road. "This is an inter-state, inter-civilization, project," the Siberian Times quoted Yakunin. "The project should be turned into a world 'future zone,' and it must be based on leading, not catching, technologies."

Comment: Re:It depends (Score 2) 479

by Trailer Trash (#49337747) Attached to: No, It's Not Always Quicker To Do Things In Memory

Well, yeah, but that's not going to work consistently. Worst case is if the string is on the stack you'll smash the stack and likely have a memory access error. If it's on the heap you'll likely get the error quicker.

I wouldn't even think of writing a program in the manner in which their sample was written, but if I was trying to solve their basic "problem" there are better ways to go about it.

Comment: Re:It depends (Score 3, Insightful) 479

by Trailer Trash (#49337449) Attached to: No, It's Not Always Quicker To Do Things In Memory

The real story here, is that if you don't know how to write code properly, then string concatenation can be really slow.

Was their paper peer reviewed?

I just reviewed it, but frankly, they're not my peers.

They actually understand the problem and state it near the end of the paper. The issue is pretty simple and when I read the /. summary I knew what the problem was. They're appending single bytes to a string. In both chosen languages - Java and Python - strings are immutable so the "concatenation" is way the hell more complex than simply sticking a byte in a memory location. What it involves is creating a new string object to hold both strings together. So, there's the overhead of object creation, memory copying, etc. Yes, by the time you're done it's a lot of extra work for the CPU.

I'm going to state this as nicely as I can: what they proved is that a complete moron can write code so stupidly that a modern CPU and RAM access can be slowed down to the extent that even disk access is faster. That's it.

Even if you wrote this in C in the style in which they did it the program would be slow. Since there's no way to "extend" a C string, it would require determining the length of the current string (which involves scanning the string for a null byte), malloc'ing a new buffer with one more byte, copying the old string and then adding the new character and new null byte. Scanning and copying are both going to require an operation for each byte (yeah, it could be optimized to take advantage of the computer's word length) on each iteration, with that byte count growing by "1" each time.

The sum of all integers up to N is N(N+1)/2. If N is 1,000,000 the sum is 500,000,500,000. So, counting bytes (looking for null) requires half a trillion operations and copying bytes requires another half trillion operations. Note that "operations" is multiple machine instructions for purposes of this discussion.

Yeah, modern computers are fast, but when you start throwing around a trillion operations it's going to take some time.

Writing to disk will be faster for a number of reasons, mainly because the OS is going to buffer the writes (and know the length of the buffer) and handle it much much better. It's not doing a disk operation every time they do a write. If they were to flush to disk every time they would still be waiting for it to finish.

There are a few notes, here. First, in Java and Python the string object likely holds a "length" value along with the actual character buffer. That would make it faster and not require all the operations the badly written C code that I describe above would require. But the overhead of objects, JVM, interpreter, etc. gets thrown into the mix. Second, if I were doing something like this in C I could keep the string length as part of a struct and at least make it that much faster. The point is that a good programmer wouldn't write code in this manner.

Anyway, this "paper" proves nothing except that really bad code will always suck. One would have to be an idiot to write anything close to what they've done here in a real-life scenario. I know because I've cleaned up other people's code that's on the level of this junk...

Programming

No, It's Not Always Quicker To Do Things In Memory 479

Posted by Soulskill
from the performance-that-fails-to-perform dept.
itwbennett writes: It's a commonly held belief among software developers that avoiding disk access in favor of doing as much work as possible in-memory will results in shorter runtimes. To test this assumption, researchers from the University of Calgary and the University of British Columbia compared the efficiency of alternative ways to create a 1MB string and write it to disk. The results consistently found that doing most of the work in-memory to minimize disk access was significantly slower than just writing out to disk repeatedly (PDF).

Comment: Re:Type "bush hid the facts" into Notepad. (Score 1) 119

I agree completely. There is no reason that a program cannot read UTF-8 and store as UTF-32 internally. There is a trade-off between time and memory. Note that UTF-16 is also a variable length encoding scheme so you still need to start at the start of string to find the nth character.

Comment: Re:Type "bush hid the facts" into Notepad. (Score 2) 119

by Alain Williams (#49310765) Attached to: OS X Users: 13 Characters of Assyrian Can Crash Your Chrome Tab

Unicode and how it is represented in a file are two different things. Unicode is a good idea, it solves many problems and contains all the (to me) strange characters used by: Greeks, Chinese, etc.

How to represent it in a file is different. UTF-8 is the obvious answer today, but other encodings were tried by different organisations first. The big win of UTF-8 is that you can have characters from very different regions on the same web page (or in the same file) - something that you cannot do you you adopt a purely 8 bit code like iso-8859-1.

We are still in transition: there are files encoded in various ways out there; however I think that UTF-8 will eventually become the encoding mechanism that everyone uses - so files encoded in other ways will become increasingly rare. So: a bit of patience please.

Comment: Re:"Bookish" vs Indoors (Score 1) 143

by Trailer Trash (#49308289) Attached to: Excess Time Indoors May Explain Rising Myopia Rates

FTFA :

They are challenging old ideas that myopia is the domain of the bookish child and are instead coalescing around a new notion: that spending too long indoors is placing children at risk.

Doesn't that amount to the same thing? Not spending much time on distance focussing?

Yeah, I laughed when I saw that. Someone's pretty clueless.

Comment: Re:I choose MS SQL Server (Score 1) 320

by Alain Williams (#49295603) Attached to: Why I Choose PostgreSQL Over MySQL/MariaDB

Those are the current limits. So do you build your business round the database that is free today and hope that: a) your business does not grow so that it needs more, and b) that MS does not reduce the limits and catch you. Either way you run the risk of ending up having to pay the license fees. Why not pick a database that will always be free - and keep that cash for something else ?

Comment: In my experience (Score 5, Informative) 320

by Trailer Trash (#49294693) Attached to: Why I Choose PostgreSQL Over MySQL/MariaDB

And I'm probably going to step on a lot of toes here, but people like me strongly prefer Postgres to MySQL. And by "people like me" I mean folks for whom their first real rdbms experience was theoretical or "commercial". I did both.

I used ingres in college to a small extent and then the Ingres commercial product for years after that. I have also used Sybase and Oracle professionally. PostgreSQL easily walks among the giants of that industry.

Every time this discussion comes up the MySQL side has to say "yeah, but..." about a thousand times. MySQL doesn't do ______ properly? "Yeah, but if you just install this other piece of software and change a couple of config files it *can* do it.' Well, con-fucking-gratulations!

The point is that PostgreSQL does exactly what it should do out of the box. I don't have to change a configuration file to make it ACID compliant, fast, correct, whatever. It just works and works correctly out of the box.

Every time someone tells me how easy MySQL is to set up they've betrayed their experience level in this realm.

I know a lot of you are going to mod me down - I don't care. But why not reply instead?

Comment: Re:Just 4? (Score 4, Insightful) 85

by Trailer Trash (#49287365) Attached to: New Jersey Removes Legal Impediment To Direct Tesla Sales

I lived 35 years in Jersey and my family is still mostly there. I had a few years when all I did was drive from one dealership to another doing auto insurance claims. The place is full of car dealerships. They tend to be in clusters along old highways, though sometimes embedded in urban neighborhoods too. The last thing Jersey needs is more car dealerships and lots. So I can see the numerical limits as having some merit. It's a crowded place, and more lots competing for the same number of buyers is not really an improvement, however much Elon Musk doesn't want to use existing dealer networks. Or how much people want Tesla electric vehicles out on the road.

Yeah, if only there were a way for everybody together to decide how many auto dealerships are needed. We could call it a "market".

But, yeah, silly stuff. We should centrally plan how many dealerships there should be. It'll work out much better.

Who goeth a-borrowing goeth a-sorrowing. -- Thomas Tusser

Working...