Skowronek - Slashdot User

Comment Re:Open Source drivers? (Score 4, Informative) 240

by Skowronek on Friday June 04, 2010 @08:36AM (#32456944) Attached to: AMD's Fusion Processor Combines CPU and GPU

The documentation needed to write 3D graphics drivers has been consistently released by ATI/AMD since R5xx. In fact, yesterday I was setting up a new system with a RV730 graphics card which was both correctly detected and correctly used by the open source drivers. Ever since AMD started supporting the open source DRI project with both money, specifications and access to hardware developers things have improved vastly. I know some of the developers personally; they are smart and I believe that given this support, they will produce an excellent driver.

It's sad to see that with Poulsbo Intel did quite an about-face, and stopped supporting open source drivers altogether. The less said about nVidia the better.

In conclusion, seeing who is making this Fusion chip, I would have high hopes for open source on it.

Comment Re:people who do less useful work earn more (Score 2, Interesting) 172

by Skowronek on Sunday May 02, 2010 @10:17AM (#32063620) Attached to: Open Source vs. Wall Street Bonuses

50% management? This would imply that, on average, every manager has almost 2 underlings (for a large company it tends to 2 - proof for the reader). The conclusion of this, from Dirichlet's principle, is that if there is a manager who manages 2 or more underlings, there is at least one manager that manages no more than 1 person. And that's terrifying.

Comment Re:Could be worse (Score 1) 307

by Skowronek on Monday April 19, 2010 @08:19PM (#31904562) Attached to: Cross With the Platform

I did not mention command buffers as a way to submit IM operations; I agree with you entirely that it would result in very sub-par performance.

Some machines, generally older than the R300 or in fact any of the consumer hardware (the ones that OpenGL came from), were optimized for IM and executed it through direct MMIO writes from userland. This avoids the problem of both command buffer size limitations and the high price of ioctl-style command buffer submits.

Comment Re:Could be worse (Score 1) 307

by Skowronek on Monday April 19, 2010 @08:13PM (#31904510) Attached to: Cross With the Platform

No, but there's this argument going around that "immediate mode necessarily performs worse" which is simply not true, if your hardware is not constrained. My view comes from spending a good few years designing GPU chips - we could do whatever we wanted as long as there was enough demand for it.

There isn't enough demand for immediate mode, and for OpenGL in general. That doesn't mean it's not possible to make it perform well.

Additionally, Apple likes to make their own hardware (just look at their recent ex-ATI employee acquisitions) so they can do whatever they please with their chips. Especially in portables.

Comment Re:Could be worse (Score 1) 307

by Skowronek on Monday April 19, 2010 @01:01PM (#31898218) Attached to: Cross With the Platform

Actually, there are several operations that make the vertex fetch operation less parallel than you might think. In particular, vertex sharing by multiple primitives is usually handled with a small depth (32-64 vertices) buffer. In other words, if a vertex index occurs twice in your object, but those occurences are >128 locations apart in your index buffer, the vertex will be processed twice and you won't get peak performance. Another example is the primitive assembly stage, when processed vertices that are output from the Vertex Shader are merged into primitives according to the data in the index buffer. This is a significant performance bottleneck that makes VBO bandwidth lower than fragment shaders.

The assumption that improvement of immediate mode performance would necessarily come with a large area hit is not supported. Most of the GPU area is not control units (which this would end up being, since the datapaths are already in place); it's the memory controller, raster backends and shaders. A quick look at the floorplan of a modern (>2008) GPU proves this quite readily. Even the primitive assembly / triangle setup unit usually occupies less than 2% of the GPU, and it is the current VS rate bottleneck.

Comment Re:Could be worse (Score 5, Interesting) 307

by Skowronek on Monday April 19, 2010 @08:25AM (#31894550) Attached to: Cross With the Platform

Entirely correct @ shaders.

However, I have to take exception with your description of immediate mode - the reason it performs so poorly now is that modern graphics chips are designed pretty much exclusively for DirectX (at least, this goes for ATI).

On machines where immediate mode performance was actually some kind of a priority (for instance, SGI Octane IMPACTSR and relatives), executing a glVertex command amounted to 3 memory writes into a command FIFO that was mapped into a fixed address in userspace which was accessible with a short form of a SW opcode (remember, this is MIPS, there is a range of 64k addresses that can be accessed without loading a base register: -32768 to 32767).

The hardware even managed the hiwater/lowater status of the fifo, and notified the kernel to perform a context switch to a non-gfx process when the gfx process was filling up the command FIFO. Those switches were as a matter of fact "virtualized" (before it was cool) by a combination of hardware, kernel (if hardware contexts are exceeded) and userspace - not entirely unlike what DX10 ADM was supposed to be, except this was in 1995.

For large static meshes (only transforms applied with Vertex Shaders), buffers are definitely going to perform better, because the meshes can be located in local memory (VRAM). However, if something is dynamically generated, immediate mode in a good implementation is no slower than a memcpy, and it does not require a kernel transition to submit a command buffer to card's ring (like modern cards like to do).

Comment Re:Could be worse (Score 3, Insightful) 307

by Skowronek on Monday April 19, 2010 @06:55AM (#31894214) Attached to: Cross With the Platform

The problem with this "explanation" is that the application's effort to use vertex buffers is significantly higher than the effort to use immediate mode.

A hardware implementation of IM (like the one in Silicon Graphics machines) would probably bring much higher energy efficiency than carefully packing up VBOs with software. Even when there's no hardware implementation, the packing up can be equally well performed by a driver, thus just shifting the energy consumption around, not increasing it.

Thus, immediate mode is actually at worst just as efficient as VBs for small vertex counts or dynamic objects, and at best allows hardware acceleration where there is none with VBs.

Submission + - AMD demonstrates OpenCL at SIGGRAPH Asia (fireuser.com) 1

Submitted by cloude-pottier on Tuesday February 10, 2009 @09:25PM

cloude-pottier writes: At SIGGRAPH Asia, AMD demonstrated their implementation of the OpenCL, an open-standards language developed by the Khronos Group targetting GPGPU and general parallel computing applications. The first demo was called PowderToy, a computational fluid dynamics simulation (a video can be seen on the linked page). The original PowderToy, which the demo is based directly on, can be downloaded as well.

Comment Re:Oh just stick a 2-axis accellerometer inside (Score 1) 191

by Skowronek on Sunday October 19, 2008 @12:08AM (#25428763) Attached to: "BlueTrack" Mouse More Advanced Than Laser, Optical

Yeah, that might be pretty cool. But then you have to calibrate it.

Submission + - SHA-1 cracking on a budget (hackaday.com)

Submitted by cloude-pottier on Friday August 31, 2007 @08:57PM

cloude-pottier writes: One thing that is always amazing is what people manage to pull off on absolutely minimal resources. One enterprising individual went on eBay and found boards with more than half a dozen Virtex II Pro FPGAs, nurse them back to life and build a SHA-1 cracker with two of the boards. This is an excellent example of recycling, as these were originally a part of a Thompson Grass Valley HDTV broadcast system. As a part of the project, the creator wrote tools designed to graph the relationships between components using JTAG as to make reverse engineering the organization of the FPGAs on the board more apparent. More details can be seen on the actual project page. If an individual is able to pull this off for under 500 dollars, it almost makes one wonder what resources the government has available to them to do the same thing...

Slashdot Top Deals