godrik - Slashdot User

Comment Re:Yay more cores that I won't be using much of! (Score 1) 208

by godrik on Saturday January 04, 2014 @07:41PM (#45867613) Attached to: Intel's Knights Landing — 72 Cores, 3 Teraflops

That's an hpc processor. You are unlikely to deploy that on classical desktop/laptop for a while. Think about it as a classical coprocessor.

Comment Re:Have you tried ownCloud.org? (Score 1) 133

by godrik on Wednesday January 01, 2014 @12:29AM (#45834863) Attached to: Ask Slashdot: Life Organization With Free Software?

I have been using owncloud for about a year now. I must say I am not as enthusiastic as you are about it. I went through two change of major and only exporting the data to a different format and reimporting them kept my calendar safe. The application is overall fairly slow. Still it gets the job done for me.

Comment Re: GPU not 7 times faster than 32 CPU cores (Score 1) 241

by godrik on Friday December 27, 2013 @04:37PM (#45799691) Attached to: Why Don't Open Source Databases Use GPUs?

There is no good unit. It all depends on your application. For some application FLOPS is what matter. For some application bandwidth is what matters. For some application size of the memory is what matters. There is no simple performance metric that account for every single application.

I don't know how familiar you are with the computations you are mentionning. But large scale scientific (read physics/engineering) applications are essentially composed of BLAS routines. I have been working with physicist where 50% of the time of their application was eigensolving a large sparse matrix (where the other 50% is building that sparse matrix). Or with nuclear engineers where 80% of the time of their application was spent solving dense linear systems. If BLAS routines (or blas like routine) are so popular it is precisely because they are a significant part of some scientific applications.

FLOPS is a good measure of performance for SOME applications which are compute bound. For instance many combinatorial optimization algortihm rely heavily on solving dense linear systems. For these applications FLOPS is the meaningful performance metric. That is why top-500 uses linpack as its benchmark because it is meaningful for many applications. Currently the top-500 benchmark is being reassessed to take more sparsity in and the new version should include conjugate gradient algorithm on stencils. Because it is more representative of current scientific application (solving heat equation on 3D objects)

I have also been working in graph analytics and here we measure more things like edge per second (for instance the graph500 benchmark does that). Here it is meaningful because most of these application boil down to graph traversals. But that metric is quite controversial since the performance you get depends highly on the structure of the graph/matrix. And you need an instance benchmark that represents your application target. For instance, graphs of roads get typically terrible performance on the GPU because they have high diameter and little parallelism can be exposed and CPU typically get better performance there. On social networks, the diameter of the graph is much smaller, there is more parallelism so GPU can be utilize closer to its peak performance.

tl;dr, there is no one metric that represent "real applications". Each application has different requirements. And each application needs to be investigated separately. FLOP is meaningful for many "real applications".

Comment Re: GPU not 7 times faster than 32 CPU cores (Score 1) 241

by godrik on Friday December 27, 2013 @12:50PM (#45797075) Attached to: Why Don't Open Source Databases Use GPUs?

As i said it really depends on application. Dense compute intensive kernel (typically matmul but any O(n^3) on O(n^2) data typically does it) will reach 80% of peak flop on both architectures. After that the situation is really complication. I have been looking at sparse linear algebra kernel (spmv or graph traversal) and these kernel are quite catastrophic for both cpu and gpu. Most of the time they are memory bound sometimes by latency. And here most conventional insight can be discarded. Depending on the shape of the matrix, the performance of the gpu can be horrible or very good.
People love to talk "in real application" but that is mostly meaningless. In which real application? Weather forecasting can certainly exploit gpu close to peak performance. Text compression, probably not so well. In the field of relational database, it certainly heavily depend on the query. Though a factor of 7 seems a lot in that case.

Comment Re:GPU not 7 times faster than 32 CPU cores (Score 1) 241

by godrik on Friday December 27, 2013 @01:06AM (#45793609) Attached to: Why Don't Open Source Databases Use GPUs?

I did not look at the actual numbers claimed nor what they are. But a factor of 7 between a GPU and a 32 core intel system is not impossible. My BS alarm trip around a factor of 20 for a 2 processor system.

If you look at state of the art nvidia GPU, you pick a tesla K10, ( http://www.nvidia.com/object/tesla-servers.html ). You get about 4T.5flop/s single precision of performance and a bandwidth of 320GB/s. The flop is realistic for compute intensive (read dense mat mul) and the bandwidth is never reached. Probably 250GB/s is more reasonnable.

On the CPU side, if you peak a Xeon E5 such as this one ( http://ark.intel.com/products/64595/ ), you need 4 of them to get to 32 cores. you get 32core*2.6Ghz*8floatpersimd = 665Gflop/s which is actually realistic for dense kernel such as matmul. and 4*50GB/s bandwidth. But in practice you difficulty reach 30GB/s per processor so 120GB/s aggregated.

So the GPU is about 7.5 times faster floating point wise and 2 times faster bandwidth wise. but here we are talking peak, and practical performance varies a lot from application to application and depending if you can use your architecture properly. But overall for some well chosen kernel a factor of 10 still seems not too unreasonnable.

Comment You mean it has ever been alive? (Score 1) 400

by godrik on Tuesday December 24, 2013 @05:51PM (#45778425) Attached to: Is Ruby Dying?

Clearly I am not in "the web world" and I am seeing this question from an external viewpoint. But I never really saw anybody exciting about ruby or using ruby or praising ruby except one single phd student who was using it to make his experiments repeatable and automatically logged. Sure there is an occasional article on a new version of ruby, a flaw in ruby-on-rails. I heard people talk a lot about PHP, about Python, about javascript, to do pretty much anything. But quite frankly I never hear about ruby. Actually I hear more about LUA than I hear about ruby.

For that reason I never took ruby for more than an hobbyist pet project. Maybe I am wrong, but seen from my chair of low-level programming guy, no one uses ruby.

Comment Strange definitions (Score 1) 550

by godrik on Monday November 18, 2013 @09:37AM (#45453603) Attached to: Sen. Chuck Schumer Seeks To Extend Ban On 'Undetectable' 3D-Printed Guns

I don't know much about firearms, but I feel like plastic based guns are not really new. If you can enter a "high security area" with a plastic gun. Then maybe it is NOT a high security area...

Comment Re:"white collar crimes" (Score 1) 337

by godrik on Friday November 15, 2013 @04:59PM (#45437663) Attached to: Prison Is For Dangerous Criminals, Not Hacktivists

Historically, hackers have joined up with mafia or gangs for _physical_ protection, and in exchange, provide black-hat services to the groups providing them with protection.

While, I aggree with the sentiment, is there any actual evidence of that?

Comment Re:What's the point? (Score 4, Funny) 106

by godrik on Thursday November 14, 2013 @02:13PM (#45424280) Attached to: EU To Allow 3G and 4G Connections On Planes

Another question that baffles me, how were the people on the 9/11 flights able to use their cell phones during flight? Yeah they have the in seat phones, but i still remember hearing people say, "Yeah they used their phones!" Fun fact that everyone seems to forget.

And you saw how that flight ended?? DO YOU REALLY WANT THAT AGAIN?!

Comment Re:Why? (Score 1) 263

by godrik on Wednesday November 13, 2013 @12:55PM (#45413843) Attached to: Physicists Plan to Build a Bigger LHC

I have the same question. I am all for science and if asked I would be all for it.

But an important question should be answered if possible. What did we gain from discovering the higgs boson? I am sure there are thousands of really cool application that specialist can think of. I think if some could be highlighted (even if 50 years of engineering down the road), people would be much more receptive to it.

Comment Re:It Met My Basic Needs (Score 2) 233

by godrik on Monday November 11, 2013 @05:19PM (#45394635) Attached to: <em>Thor: The Dark World</em> — What Did You Think?

Pretty much the same here. You leave your brain outside the theater and then stuff happens, some funny parts are funny; some not funny parts aren't so funny. Overall I had a good time.

Comment Writing is difficult and time consuming (Score 2) 63

by godrik on Sunday November 10, 2013 @01:37PM (#45384709) Attached to: Could We "Wikify" Scholarly Canons?

(disclaimer: this being slashdot I did not RTFA.)

Speaking as a scholar, the main problem that I see is that is that communicating to the public is not my job. Writing for some wiki is not my job. My job is composed of 3 components:
1/ teaching: in class and mentoring students.
2/ research: conduct, manage and fund.
3/ service: for my university in comitees and for the community by taking part in conference/journals by submitting/reviewing paper and hleping with organization.

Moreover, writing is difficult. Especially that form of quite high level all-encompasing writting. Writing a good survey paper takes months. It is a significant endeavour.

As you can see, wikifying scholarly cannons is not really a part of my job and takes a lot of time. It is not unrelated, but it is a more abstract thing. As such, it is not directly useful to my advancement. (In other words, my tenure commitee is not going to care.) I just can not afford to spend that time if it is not part of a clearly identified project.

Comment Re:FTP? (Score 1) 336

by godrik on Thursday November 07, 2013 @08:31PM (#45362939) Attached to: GIMP, Citing Ad Policies, Moves to FTP Rather Than SourceForge Downloads

Though nowadays you just click on ftp://... link and get the right file right away. So I am not sure the file listing problem matters that much.

Comment Re:does everyone REALLY have IP-connected TV? (Score 1) 419

by godrik on Wednesday November 06, 2013 @05:55PM (#45349775) Attached to: Blockbuster To Close Remaining US Locations

maybe the tv is not IP connected, but nowadays every single gaming system comes with the ability to play videos. Most are compatible with netflix, hulu, or amazon video service. And that is ignoring all the hulu/netflix box that you find here and there. Also cable box can do VOD. And I am discouting the $50 android stick that goes into your tv hdmi port.

So I'd say that the market for "I absolutely need a DVD or I can't play it" is probably quite slim.

Comment Re:This benchmark is pointless (Score 1) 196

by godrik on Tuesday November 05, 2013 @04:19PM (#45338959) Attached to: Speed Test: Comparing Intel C++, GNU C++, and LLVM Clang Compilers

TBB is *not* implemented using the Cilk Plus runtime, either in the Intel compiler or in the Cilk Plus branch of GCC. TBB is implemented using a completely separate runtime from Cilk Plus. You can take my word for it that I know what I'm talking about, or you can confirm it by studying the sources online, since they are both publicly available. :)

Interesting. I never looked at how it is implemented by ICC. But my understanding of it is that (some parts of) TBB used a workstealing engine for execution and that it was reusing a significant portion of the cilk runtime. I might have understood wrong.

Pinning threads to cores can help on some benchmarks, but it is less useful for others. In particular, for codes implemented in TBB or Cilk Plus, which use work-stealing schedulers, the performance benefits of pinning can be modest, almost negligible, or sometimes even hurt performance.

Well, the point of pinning is to increase memory locality. Workstealing engine typically try to keep things local to avoid that problem. So you would rather have little migration. Now that I think about it more. Cilk Plus tends to create more threads than cores/hardware thread. So pinning might actually be a catastrophe in that case. I guess what I meant was that ignoring pinning in a parallel benchmark is not the way to do it.

About the speed of Cilk Plus. It is my personnal experience that Cilk Plus is slow. I forwarded a couple of performance related problem to Intel. I remember excluding Cilk Plus result from some charts because they were embarassing (although I might have done something wrong). Whenever you see academic publication using that kind of technology you realize that people emphasize things like parallelism, speedup, load balance, but rarely performance. And when you dig what actually happened in the performance area, you realize that the numbers are not that good. (Though I agree with you that in this case it should not matter)

Slashdot Top Deals