maraist - Slashdot User

Comment Re:Improve Slashdot By Rewinding To What It Grew O (Score 1) 763

by maraist on Thursday October 06, 2011 @10:00PM (#37634558) Attached to: Help Shape the Future of Slashdot

"Don't assume that any cookies you set will ever be sent back."

Don't do a lot of online banking do you? Hello 1993 called and wants it's gopher back.

Comment Re:Moderation system (Score 1) 763

by maraist on Thursday October 06, 2011 @09:53PM (#37634494) Attached to: Help Shape the Future of Slashdot

"6) Delete all accounts numbered 2,000,000+. Remove signup. Invite only"

Heck, why not start at, I don't know, say 1,000,000 Mr 1.05 ;)

Though I do get a sense of "get off my lawn, when I was young, commenters respected their elders" :) I remember from when slashdot started, people were ALWAYS complaining about poor comment quality.. But sorry guys.. I don't see it.. If there is ONE good comment in a comment stream (above level 3, let's say), and I can quick-read through 30 comments, I call it a win.. I've learned something new. If I didn't, I wouldn't waste the, oh, I don't know, 8 minutes a day it took to read those 30 comments. A little more productive than day-time-TV, I'd say.

Comment Re:This just makes sense (Score 1) 1345

by maraist on Thursday September 29, 2011 @07:18AM (#37551924) Attached to: Science and Religion Can and Do Mix, Mostly

haha! So Muslims and Christians should probably re-read it then, because NOBODY that I know actually expouses that.. Here's what a classic Muslim would say:
1) Accept Allah as the one true god
2) Accept the profit Mohammed (may peace be apon him)

Here is what a modern Christian would say
1) Jesus is God
2) Jesus is God's son [and please ignore my apparent stupidity]
3) Adam and Eve ate the forbidden fruit so we're condemned to eternal damnation - as unfixable sinners.
4) Accept Jesus as your personal savior (your sacrificial lamb to quelll your original sin), or perish in a lake of eternal hell fire.

There's no mention of love, or neighbors.. Just God and your acceptance of Salvation from an apparently 6,000 year old forbidden fruit..

All the parables are for naught, because... We should NOT help the poor, because they'll just spend it on sinful things.. We should TRY and enrich ourselves (because a passage in the old testiment referred to as the Jabez passage) says if we ask God to enrich our life (so that we may exault his), then God will do so. We SHOULD conquer other Infadel nations and spread the 'good word' (because some prick non-Jew named Paul started the trend in the new testiment). We SHOULD ready the end-times by facilitating ancient prophecy in Israel (even though for 2,000 years nobody has been even close to a fullfilment, and mathmatically everybody has their own calculation based on BS numerology). We should NOT give up our wealth, because it's better spent looking fabulous in a papal gem-studded-robe or televangelist palace - praise be to JeSUS. We should NOT love they neighbor if they are: jewish, muslim, homosexual, left-handed, speak in a different language, seem animal-like and thus not derived from Adam (e.g. black), reject Jesus openly, have some contractable disease (they're being punished by God of course), or have a tiny nuance difference in the view of how to worship rules 1 .. 4, etc. We should NOT pay taxes because Ceaser kills babys (and not enough grown men).

But, you are technically correct, If modern Christians did actually have a prioritized critical analysis skill of any sort (as opposed to meme-repetition), then it's pretty clear that the 'greatest commandments' involved love. And the parables did center around forgiveness, kindness, charity, non-violence, social-aclimation (e.g. paying taxes, and getting along with your arch-enemies across the river). You know, the crap godless Hippies and Athiests expouse.. Go figure.

Though there was some crazy period mysticism crap in there.. The whole fake-rise-from-the-dead ritual (Lazereanism), baptism ritual (zorastrianism), placebo healing rituals (gee, that would be most religious, except maybe Judaism), Jewish rituals of course. Still, most people think Jesus invented / originated those rituals (and some crazily think they were 'magic'/divine because somebody wrote about it in a biography). Why would a biography lie? If they were true, and if God wanted the 'good news' to be heard, then he wouldn't let people lie about it, right? Which is why thousands of books had to be burned prior to the canonicalization of the modern New Testiment in 300AD I guess. :) But the logic works if you squint REALLY hard.

Comment Re:Imaginary Mass! (Score 1) 412

by maraist on Wednesday September 28, 2011 @08:35PM (#37548008) Attached to: Faster-Than-Light Particle Results To Be Re-Tested

My understanding of quarks is that they were essentially of complex mass - which is why they can't exist individually, but must do so in some complementary pair or tripplet. It's the same as the root of a cube.. There are three solutions and two NEED to be conjugate pairs (e.g. imaginary).

Comment Re:Does that make any sense? (Score 1) 127

by maraist on Wednesday September 28, 2011 @07:58PM (#37547692) Attached to: Oracle Demos New SPARC T4 Processor

I am suspicious of such numbers. You need to pick an application and run it on two machines for comparison. Number of active threads is independent of the number of CPUs - and while this means overhead in context switching into and out-of CPUs, if the reason they're stalled is memory throughput or disk IO, then CPU context switching is irrelevant.. And guess what, the memory throughput has little to do with the CPU.. Namely you can construct a NUMA motherboard where each CPU has localized RAM (and thus unaffected by threads in other zones).

The main reason I call BS on 2, 4 or 16 threads per core is that this explicitly only allows a single actively running thread; it merely means that IF you can do NON RAM work in the other 15 threads, then when thread-1 stalls you can get useful work done by quickly switching without OS interaction. But that's a big IF.

Comment Re:Not the point of SPARC (Score 1) 127

by maraist on Wednesday September 28, 2011 @07:11AM (#37537886) Attached to: Oracle Demos New SPARC T4 Processor

"Essentially, this is about as efficient as you can get"
I don't know about that. alpha/AMD added the 'mesh' CPU network, where you talked to routed neighbors for memory/BUS access, and thus not every CPU needs to spin it's transistors on cache operations. I don't know how advanced that got, but there's no reason that N,S,E,W cache controller nodes can't determine the scope of dissemination of cache-reads, and thereby localize snoop and lock-cache-line calls. Compre-and-set can be implemented IN the cache-line. You've got an address-line and a data-line. Why not add a cas bit on the routed bus and say 'Here's the old data, and it's address, and set the cas bit; then on the next clock, the data line will contain the new data. The mem-controler reponds with a read-complete operation or invalid'. You already have comparison logic for the address lines in their fully-associative cache logic; no reason you can't implement CAS there too.

So now you completely avoid a BUS-lock.. The cache atomicly (within a cache-clock-tick) owns the data.

I'm not a fan of mutex-spin-locks, because most algorithms that are fast enough to warrent a context-switch-free operation generally can use an optimistic-locking algorithm. Thus, I believe OS ctx switches or CPU-oriented thread-swap operations are field (e.g. some type of yield operation). But there are integer or pair-of-integer algorithms that can make heavy use of such CAS operations. And generally, the first step of a mutex is such a CAS.

If you were crazy enough to want to use semaphores instead of mutex's (hopefully because you actually want a constrained resource count, and not because you just prefer semaphores), then this CPU operation wouldn't make sense.. I doubt you want a full-on adder in the cache-controller (though I suppose that's possible too). Namely instead of 'lock; xadd [m1], 1'. You'd have "st_add [m1], 1" where the cache-controller is required to implement read, add, re-write (atomicly) instead of having the CPU do it.

Next is the furthered notion of topological CPU configurations.. Namely, pin thread-5 to CPU-2 then pin thread-6 t CPU-3 (which is adjacent), and the OS tells both CPUs that a shared memory segment (including mutex regions) is ONLY available to those two CPUs. And thus you can pass synchronous control signals between them.. Again, special instructions are required.. But now you can use 'barrier' instructions.. Similar to an idle spin loop (wait until barrier signal arrives from co-processor indicating the pipeline of work has new data. But this now starts competing with GPUs and CELL processors.

I'm sure there are other possible operations, such as direct register to remote-CPU register transfer operations.. Similar in principle to the SUN SPARC register window, you can have an in/out suite of registers where instead of function-call communication, we're facilitating asynchronous co-processors pipeline communication operations.

There are code paths that make sense to implement these sort of pipeline co-process architectures. Any place a 'spawn task' operation happens, that can be optimized in assembly + OS as a co-process pair. You would need some specialized C function APIs:

Future f = DispatchTask(data_structure); // co-process/ co-thread dispatch ....
mytype result = f.get(); // co-process barrier (CPU might context switch to peer thread)

Comment Re:Not the point of SPARC (Score 1) 127

by maraist on Wednesday September 28, 2011 @06:52AM (#37537788) Attached to: Oracle Demos New SPARC T4 Processor

"DB are generally IO bound: they must provide guarantee that committed data were safely written to disk. This is where the most of DB's performance is wasted - waiting for the disk to do its job.".
Sometimes. When they are IO bound, then yes, locks are of trivial importance. But DBs generally also serve as massive coherent caches and lock-managers. And those caching operations are VERY susceptible to critical-region code. Namely MySQL INNODB had an inverse performance characteristic with the number of CPUs for a long while. Further, recently, it was shown that by skipping the SQL layer of MySQL-INNODB and using simple direct HTTP calls into the storage layer, performance was improved measureably.

There's no accounting for bad code (which is most likely the case with MySQL optimizations), but I definitely feel that the programming world is solely inadequately prepared for multi-threaded programming.

Basically we need to focus on lock-free algorithms (or wait-free, or read-lock-free as appropriate). They should focus on optimisitic locks instead of pessimistic locks. Use compare-and-set operations/spin-loops. Use stack / context-variables instead of globals, etc. But most people just think in their 1960s sequential style and go; oh, two guys touch this so let me throw in a mutex. Then you find when you double the CPU count (or even machine count) that the system degrades miserably.

Comment Re:Not the point of SPARC (Score 2) 127

by maraist on Tuesday September 27, 2011 @09:08PM (#37534516) Attached to: Oracle Demos New SPARC T4 Processor

"Is there a way for a CPU to make mutex handling easier and more efficient?"
mutex's are VERY efficient with cache-oriented MESI and MOESI instructions, the problem is what you do while another thread owns the mutex. You can either spin-loop or context-switch to another thread. Specifically if thread A locks then has several cache-misses, then thread B would have to spin for thousands of clock-ticks. When you have 80 CPUs that might not be too bad (though it burns power), but if you have 2 CPUs, then that's likely highly wasteful.

"trigger on event or register/memory=certain value"
I believe intel provides a block-until cache-line-updated instruction. I believe that's how Linux futuex and OS-level schedulers work if IIRC.

"I bet there's lots of code which regularly checks "is it time to do X yet?" or "wait till X happens" (e.g. wait for connection or data)."

Well, wait-for-connection is an OS thing. If you use epoll, IO-Completion, kpoll, or even ancient unix 'select' you transfer the overhead of IO to the OS which is very event-driven (and thus doesn't necessarily have a lot of blocking structures). Namely ethernet frame-driver running on CPU-3 can in theory directly transfer to thread-16 after completion which is blocking on a TCP packet when the OS determines it's received enough data to be awoken.

As for 'is it time to do X yet' isn't as bad as you might think (well, I don't have that much imperical evidence, but I've worked in this space).
Instead of 'polling', you can leverage a priority-queue, such that if there is literally nothing critical-to-run, you quickly test the head of the queue for it's execution time-stamp. Then do a time-of-day operation (all while in the OS, so no additional context switching). If a < b you flag the blocking thread for execution (possibly transfering directly to it). Here the slowdown comes when you add/remove an item from the temporal priority-queue (namely O(log(n))). So this is a function of how many temporal waits are scheduled/completed, but is independent of how long it's supposed to wait.. When there is literally no work to perform at all (all CPUs are about to go idle), then you look at your priority-queue and ask how long before the next scheduled event.. Then you can make a CPU go to sleep for that long (using interrupt controllers if need be).
Generally you're going to wake up 32 times a second anyway, and the marginal overhead of re-testing a time against the priority-heap-head 32 times a second is nominal (I can run 800,000 java-based time-of-day calls per second.. plenty of room for those 32).

To boot, the OS doesn't need to know about all the scheduled tasks. It only needs to know about one per thread at most (generally only about 0 .. 3 at any given time, with a few socket-timeout type apps bringing it into the hundreds). Apps can, for example implement their own timer logic that mimics this priority-queue model (java does, for example).. Thus one thread can be OS-bound on a timer that is the nearest temporal event in a pool of potentially thousands of scheduled events (e.g. a Java Timer or memory-based Quartz).

What I see the greatest problem with are peer-CPU's modifying common cache lines..

Namely if you have a job-queue data-structure where N threads are pulling/pushing out-of/on-to, then you have a single spot in memory that EVERY thread must modify in order to transfer work.. This is a massive bottleneck.. One that ironically you don't have in single CPU configurations. This is something that I think CPUs can work to address. Especially as co-process and message-passing systems become more prevelant (erlang type languages or message-queue NoSQL models).

One reason Intel CPUs suffer from this is that when two CPUs concurrently modify a cache-line, everybody is forced to 'flush' their cache line and re-read from RAM.. This not only makes those accesses no faster than main RAM, it's competing with everything else you want to do with RAM - increasing latancy massively. I believe AMD's MOESI allows peer CPUs to read from CPU-A's cache line instead of main RAM. But what I think would be better is to have CPUs be able to coordinate using a monitor thread for such data-structure modifications.

Namely instead of having 80 cores on a motherboard fighting to keep their cache-lines coherent, they can delegate a micro-instruction to a single CPU with a hyper-hot suite of cache (e.g. no other thread would ever read/write to that cache-line). The BUS would transfer directly from CPU-N to CPU-x (the hot CPU) the message that needs to be enqueue'd / dequeue'd. This would run at BUS speeds (e.g. pretty damn slow), but if the enqueue/dequeue was sufficiently complex (say if it was a log(n) priority enqueue/dequeue operation) that the BUS speed would be masked by the computational overhead.

So you'd need help from the motherboard, CPU, OS and programming structures (namely it can't be so exotic that nobody'd bother risking implement useful software with this programming model).

Well, we kind of already have that with NoSQL solutions.. Currently a lot of them use TCP - some of which can support UNIX-sockets (which then can be hidden as OS FIFO buffers, which could then be optimized into special kernel structures, like the tux webserver once did, then finally into hardware level messages). Incendently, Windows and MacOS-X also support a lot of 'event' structures in their basic programming models - so they are also potential candidates.

Comment Re:Single thread performance (Score 2) 127

by maraist on Tuesday September 27, 2011 @08:36PM (#37534330) Attached to: Oracle Demos New SPARC T4 Processor

The problem is that hyperthreading CPUs and x64-64 EPIC are predicated around floating point performance. The idea is that if you're FPU bound, then you want to minimize RAM latency by flipping between threads while you have FPU-load stalls.. You add speculative execution, predicate registers, pipeline execution stacks to minimize branch-misses, etc. But it's all about FPU with 200 clock execution times (e.g. divides and transcendental ops - as with FFT).

But I'm sorry, no matter how fast you make their FPUs, they're not going to beat FPGA or ASIC or raw-silicone GPU's. These bastards optimize memory paths and reduce critical path latencies.. The only advantage CPUs have over GPUs is that you can context switch unrelated tasks better than with GPUs.

A vast majority of apps in the world are NOT FPU based. They are pure integer. And moreover, these days, they are RAM constrained.. If your're writing a NoSQL DB procedure to perform zlib or merge-sorting or state-machine syntax parsing, FPU oriented architectures are of ZERO benefit. This is all RAM -> branch-prediction related. That is, read-data, make a decision, jump to new code (which triggers new RAM loads) run two or three instructions, then repeat. While SOME of the app state-tables and code-paths can get cached efficiently, the input stream is generally far larger than your L3 cache (on the order of gigabytes).

So, SOME of the memory pre-loading, branch-prediction, and on-stalled-thread-ctx-switch could be leveraged.. But MT apps suffer from barriers in critical regions.. Namely if you memory stall while holding a lock, you cripple the parallel performance.

Co-processes are very efficient (e.g. apache pre-fork, postgres co-threads with specific shared-mem-segments, erlang, ruby-unicorn, etc) in that they organize very small messages to pass between processes and keep all remaining cache-lines isolated to their single thread and thus semi-dedicated CPUs. This can very nicely leverage co-processors without necessarily saturating RAM -though if the apps themselves are RAM-bound you still have problems; BUT if you have NUMA, the CPU can segment memory spaces better with co-processes than with MT. That being said, the SUN light-weight-threads are (I believe) designed around shared memory-spaces having minimal context-switching time v.s. posix-threads or normal co-processes, so they can't really take advantage of co-processes as well as MT.. So SUN light-threads are forced to endure potentially bad programming by DB, file-IO, OS, signal-processing applications.. Namely if you can't create isolated memory regions (malloc/free-locks, IO/pipe-locks, concurrent-data-structure 'critical-region' locks, etc), you'll find yourself dirtying shared cache-lines so often, you actually find yourself running slower than if you were just single-threaded.

I know, for example, a simple merge-sort can run significantly slower (3x) when run in parallel v.s. single-threaded predominantly because of intel's MESI implementation. Well, not necessarily 3x slower human-time wise, but in consumed CPU time with little or no visible decrease in human-response-time.

As another example, mysql INNODB had a inverse performance curve for the longest time.. Meaning, the more physical CPUs you added, the SLOWER it's total throughput would be.. Predominantly due to excessive critical-region locks. Many of those locks have been replaced with less-accurate atomic spin-locks (as with sequence-counters). Namely you can now 'lose' a primary key's sequential value under the right circumstances - but at the benifit of removing a major classic stall-point. But INNODB is still full of complex algorithms that require critical-regions. Lock-free-code is really hard and is very limiting. But that isn't to say people haven't figured out how to architect good designs. 'redis' NoSQL and erlang based apps (like rabbit-MQ) are good examples.. Namely copy-on-write small data-structures.

But there are two types of apps that have lots of parallel threads. Those with MASSIVE memory requirements and those that

Comment Re:First post! (Score 1) 412

by maraist on Tuesday September 27, 2011 @06:31AM (#37524712) Attached to: Faster-Than-Light Particle Results To Be Re-Tested

none-sense. (though IANAP) You're using equations without context.. If you wanted to take a force-field who's constituent propagation rate is c (where c HAPPENS to be the speed of light) and you used that field to accelerate another object, then you have a high-school math education level equation of an asymptote. Something that takes infinite time to get to, and gets make infentesimally lesser progrress the further you go. Thus, if you assume that the field expends energy, then yes, you'd burn an infinite amount of energy accelerating.

BUT, this is assuming that no accelerating [quantum] field particle (photon, gluon, weak-particle, etc) exceeds c.

But what if Einstein was wrong.. What if 1/sqrt(epison0 * meu0) = c ISN'T the upper bound of a field energy/momentum [quantum] particle. What if neutrinos CAN travel faster for whatever bounded reason we haven't yet predicted - including the possibility that free-space doesn't represent the least-restrictive medium. Now you have a potential field with speed d, where d > c on average - and that average bit is critical, because I've been seeing reports where statistically individual particles could travel faster, but on average they had to maintain c.

Now you've got a new type of field, where Einstein's assumptions are violated.

Now I've heard people suggest maybe space-time is being warped to produce a new shorter path. Maybe.. But occam's razor kicks in at some point. And this, as I said, is pretty damn simple math on the aggregate.

Comment Re:In my opinion... (Score 1) 482

by maraist on Friday September 23, 2011 @07:21PM (#37497680) Attached to: The Great JavaScript Debate: Improve It Or Kill It

"* lets you manipulate a client-side database"
HTML 5 has SQL databases for client-side storage.

"* lets you establish a socket connection to a server"
So long as it's http, you can do a lot of advanced async RPC operations, including poll-based message notifications.

Comment Re:In my opinion... (Score 1) 482

by maraist on Friday September 23, 2011 @07:18PM (#37497638) Attached to: The Great JavaScript Debate: Improve It Or Kill It

"If you're teaching yourself javascript or html5 then the truth is that you are probably wasting your time."
Cause you know, the marketing end of a web-site doesn't need to access the 99.9% of customers or anything.

Comment Re:In my opinion... (Score 1) 482

by maraist on Friday September 23, 2011 @07:16PM (#37497616) Attached to: The Great JavaScript Debate: Improve It Or Kill It

You do realize WHY javascript in browsers don't provide native access don't you? Any why node.js does..
How many emergency flash updates have been pushed in the last year? And see if you can find any that were related to javascript security holes.

Comment Re:they should just create GLang (Score 1) 247

by maraist on Thursday September 15, 2011 @08:33PM (#37415764) Attached to: More Info On Google's Alternative To JavaScript

Have you ever written to a SOA before? It's not exactly the hardest thing in the world to abstract.. It takes like 20 lines of Java code to swap one data-store for another (and I'm talking flat-binary-file, flat-XML-fle, flat-CSV-file, sqlite, hsqldb, mysql, postgres, oracle, NoSQL vendor A,B,C).. It's called coding to abstraction layers.. And guess what, there are already tools that abstract all the major java vendors (elastic bean-stalk, app-engine, etc).

Comment Re:Syntax matters! (Score 1) 100

by maraist on Saturday September 03, 2011 @03:53PM (#37297860) Attached to: Book Review: CoffeeScript: Accelerated JavaScript Development

You make it sound like VHDL is harder to write than assembly.. I assure you it's more expressive than that primitive sequential crap that assembly (and it's higher-level derivatives) forces you to think in. :) And every bit as modular.

Slashdot Top Deals