Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Comment Truly specialized chips (Score 3, Insightful) 21

What I think many people are missing is... well Seymore Cray.

I make this point because Cray started his super computing journey by building highly advanced machines connected to a single centralized memory system.

As things stand right now, probably the biggest problem we're facing in AI is cache coherence. The bigger the machines we build, the bigger this problem is. Currently, I'm trying to troubleshoot a fairly small HPC, about a thousand cores. As the system exists in its current state, the more cores to a single node I add, the slower the machine gets. This is because the cost of sharing memory between the cores is just too high. HBM2 and HBM3 don't help at all because it's an operating design issue. Thrashing the memory which is what AI does means the number of CPU spinlocks increase. Historically, the cheapest form of shared memory has been atomic variables. They exist in a single page and are always cache coherent. Right now, every access of such a variable is taking a very long time because the kernels are initiating spinlocks to wait for coherence. As such, 128 core or larger processors are generally a lot slower than much smaller processors with duplicated memory in readonly regions.

We need to see progress made in high performance multi-ported memory systems. I think that specialized data lines, for example entirely separate LVDS pairs for reading and writing synchronized (coherent) memory regions could be helpful. Any writes to memory in specific regions would be multicast across a full mesh and spinlocks on reads could be local to the core. As part of the multicast LVDS mesh, there can be a "dirty" status line where a centralized broker would identify writes to a region (as any mmu would) and with minimal propagation delay, raise a dirty flag at the speed of electricity to all subscribers of notification to said region.

Honestly, Cray would probably come up with something much more useful. But, with optimizations like these, performance can improve drastically enough that substantially fewer cores could achieve the same tasks.

From what I've been looking at, GPU is trash for AI. I have racks full of NVidia and AMD's best systems, in a few cases, I have access to several of the computers ranked in the top 10 on the Top500 list. And the obscenely wasted cycles and transistors in general for AI processing is unforgiveable. a chip specifically designed to run transformers should hold at least 100 times the capacity of a single GPU. Then there's the additional fact that by optimizing the data path for AI in combination with smarter cache coherency, we could fit maybe thousands of times more capacity into a single chip.

Strangely, right now, I think the two most interesting players in the market are GraphCore and Huawei. They both have substantially smarter solutions to this problem than either NVidia or AMD.

Comment Re:I don't really like being used (Score 2) 23

Honestly, GM, Ford, BMW, Mercedes, Peugoet, Citroen, EVERY British car maker and more than anyone else, VW... these are companies thoroughly lacking any skills in computer technology. I've driven cars from all of them and it's painfully obvious that fully self-driving isn't even the start of the problems. These companies should be legally banned from using any modern technology in cars and they should be banned from making cars without modern technology. In other words, they should just be banned.

Let me give an example. Following the incident which caused the death of Cruise, GM announced they would recall all their vehicles for a software update. The year is 2024, the car is connected to a central management system, and GM managed to produce a car which had to be recalled to perform a software update? In fact, nearly every vehicle manufacturer has this problem. I cannot for the life of me understand how it's possible to ship a computerized and connected car and not be able to diagnose, troubleshoot and repair ANY software issue remotely. I think if I were CEO of Cruise (or GM), I would shut the company down for shear embarrassment.

Now, we're at the root of the problem. If we remove the pathetic car companies from the equation and we simply ban them from making cars anymore, can the remaining companies be trusted to make a much more thoroughly tested self-driving car? I'm leaning towards... probably. In fact, I would like to see a government mandated self-driving unit/integration test suite that makes it so that every car company in the world is required to share their self-driving training data since this is not proprietary. I believe every car company should have to pass all tests or perform optimally on "no win" situations for every software update using a database containing all known self-driving test data. This way, every time there's any new test cases we hadn't thought of before... like maybe a paragliding flash-mob landing in the middle of the road, it will be added to the centralized test database and all companies will be required to pass those tests within an acceptable time-frame and then all FSD cars, if they haven't already surpassed most human drivers will quickly do so.

Comment Re:LNG (Score 4, Interesting) 216

American in Norway with an EV.

First of all, from personal experience, I'm guessing the EV problems in this article are entirely BMW because if there's one thing BMW should NOT be allowed to ever do is make anything which has electronics or a computer in it. They should probably avoid making anything that moves either. They do make nice interiors for cars though. Everything else, they should be banned from.... yeh, a rolling computer that makes it so that while you're driving can pop the hood of the car because you scratched your nuts and accidentally pressed something in your pocket against the key fob. And then when you ask for a software update, they won't do it because it might make it so the car won't start anymore. BMW sucks at cars. Then there's the heater that doesn't work while charging. Nothing better than -25C weather where the battery can't consume more than 8KW/hour but the charger can deliver 450KW/h and the car can't run the heater... for the 30 minutes you're freezing your ass off in the car.

Off the BMW rant.

Norway has some of the youngest cars anywhere. First off all, we're rich. Second, paying for car repairs in Norway is more expensive than paying car payments on a new car with warranty.

You are harshly penalized in Norway for buying fossil fuel vehicles. At one point, in Oslo, a person driving from the east side of town to the west would be paying about $17 in tolls each way for diesel but $0.50 for EV.

If you drive around in Norway, except for classics, you don't tend to see many older cars.

From what I know, you're right, there's a lower percentage of EVs as you travel north. But last year, over 80% of all new car sales in the country were BEV. And in the 5 "cities" (norway doesn't have any cities, just one large town and a few large villages), where one car per household seems a very high overestimation, is where the population seems to reside. That means that BEV sales had to be scattered all through the country including the north.

There are some weird stretches of road such as between Hamar and Lillehammar where chargers are spread too far apart. But EV ownership in Norway is quite convenient, even in -25C weather.

Comment Re:Someone please explain (Score 1) 49

Most ML is performed today using libraries of two types.
1) Training
2) Inference

If you visit websites like GraphCore's who makes chips specifically for ML, you'll find they focus almost entirely on optimizing for the Python library pyTorch.
If you visit Huawei's website and their Atlas machine learning compute systems, they focus on Pangu and PyTorch.

Training tends to require some big iron to run. This is because Python libraries for training models are extremely inefficient in exchange for being very usable. At this phase of AI/ML research, usable is far more valuable than performant.

Then inference doesn't require much to run, and yet, 2 last gen cards will often outperform a single current generation card. Oddly, due to the power footprints of the cards ever-increasing, the cost doesn't even double.

This is where big problems come in. ML models require lots of RAM. And the faster the better. And a big selling point right now is HBM3 memory. HBM3 memory is memory specifically designed to be as close as possible to processor cores. HBM3 can sometimes even be mounted on the same piece of silicon as the processors, but in recent times, we use chiplets. There are many reasons, but HBM3 is more akin to a level-4 cache than to DRAM. It's super fast and it's more easily addressable. And two cards in one machine with 12GB each doesn't scale, for ML, it's still 12GB and doesn't aggregate.

You bring up a great point about not being consumer grade. And I'll reference the Atlas architecture from Huawei.

The Atlas 9000 is basically a machine learning GPU which is sold by the rack, not the card. It's a Linux super computer which can scale and scale. When I asked at Huawei Connect how far it can scale, they told me that those values are arbitrary and if you want to buy more than the current limit, they'll change the number.

In any case, the Ascend processor used in the Atlas 9000 isn't very powerful, but Huawei designed the systems with full water-cooling and sell solar, battery and wind farms to power them and heating systems to recycle heat waste from the CPUs. Inside of China, Huawei owns its own construction companies and you can seriously ask them to build you a skyscraper or a replica of the British Museum with a top AI super computer in the basement heating the entire building and providing hot water and steam to the surrounding village and it will be done... quickly.

The Chinese government has absolutely no problem gaining access to world class HPC. In fact, they can get more of it and faster than the US can because what makes Atlas very interesting is that almost every single piece down to the epoxy the circuit boards are made of and the sand the chips are made of are sourced from one single company. There are no supply chain issues. Not only that, but they're paying pennies per token compared to the US dollars.

  I'm pretty sure that the biggest reason for the decline in chip revenue is simple. The US forced China to use their own chips and to adapt to using Chinese tech rather than US. China adapted and accelerated their internal development to cut external dependency on western tech. It's only a matter of time that we all just start buying the Chinese tech because you just can't even buy the American stuff. Lead times of two years means that if you can buy Chinese and have it in 30 days, sure it's obsolete, but it's only as obsolete as what the American's will deliver to you in 2 years will be if you order today. And the Chinese stuff is much cheaper and makes your business run much sooner.

Comment Cat's out of the bag (Score 1) 86

If you regulate corporations making AIs, then we'll just make AIs without the help of corporations.

So, you're a evil mad scientist working for some hated government from a basement laboratory. Your country is trade-embargoed and you have to make due with whatever junk your country's military dictator can steal from where-ever.

AI is not nuclear weapons. You do not need a heavy water reactor. You don't need controlled substances. You need electricity and you need compute. In fact, if you are hard up enough, you can burn oil or coal and feed massive clusters of Commodore 64 machines connected with serial cables and train an advanced AI. Uh... you'll need maybe 64 C64s to produce a useful tensor coprocessor running at about 100Khz compared to a modern CPU, but if you have enough C64s, great.

Better yet, raid a junk yard where mobile phones with broken screens go to die. Get enough semi-modern android phones and load them up with Linux and connect them using USB ethernet adapters and put 50 or so developers on implementing drivers for the on-phone GPUs and it won't take long before you have the compute to train LLMs.

There are a lot of massive misconceptions in AI and ML these days. The important ones are that it takes supercomputers. It doesn't. Another is that it requires high end GPUs. It doesn't.

The fact is, making a machine learning processor (a TPU) from scratch on an FPGA with reasonable performance is pretty simple. Performance increases dramatically if cache coherence is much faster. But right now almost all ML is being performed using a small set of popular Python libraries. They are seriously optimized for CUDA and somewhat optimized for OpenCL. On the 5 super computers I work with, almost everything ML running is written in PyTorch.

Here's the rub, PyTorch is written like shit. It's a common disease of academic code. The code does the job it does really well, but the code itself is highly unoptimized. Anyone making a TPU/NPU can start by running PyTorch in CPU and then move operation by operation into their accelerator. The problem now is that PyTorch distributes pretty poorly. I don't think it's PyTorch's fault. training a model requires massive amounts of highly repetitive memory access in a cache coherency hell. Due to the highly unstructured data sets, it's almost impossible to predict which nodes will be accessed next and linear memory access is completely out of the question. Add to this that large HBM3 memory is extremely poorly suited to the job compared to static memory, but fully meshed static memory of the size needed is completely out of the question. (think N*(N-1)/2).

So, let's get back to the real issue. Regulating ML.

If all it takes for me to do ML is to garbage pick and stick shit together and optimize some code, then it means anyone can do the same thing.

We need to very quickly identify how to embrace AI while not limiting ourselves inappropriately. If for example, the US were to try and limit AI development. Besides not being technically possible, they would only tie their own hands and put themselves at a major disadvantage because not everyone is doing to agree to do the same.

I think the main problem we have now is figuring out how to adapt the world to AI.

Comment Offices from hell (Score 0) 240

I work in a university now. A department dedicated to science and more specifically supercomputing. We don't waste money on things like paint and decoration. We have the original dust bunnies grown since 1990 there. I LOVE IT. But I moved from working in big fancy buildings with state of the art coffee machines, pool tables, gourmet restaurants that cost $5 per meal etc... by choice. I took off the knee pads and decided to do something actually meaningful with my life rather than continuing to contribute to childhood obesity.

Then there's the offices in that video.

Let's be honest. Except maybe WebMD which could one day evolve into something important, that company is definitely a bring your own knee pads and fishnets kind of place (maybe chapstick for the sores). Anyone who would sell their souls and work in such drab offices is more of a street walker wondering if some sweaty John is going to beat them senseless tonight than a high class escort. I suppose if you work for that company, you're the kind of person that has a very low opinion of yourself and probably think you can't do better. Those offices, to follow the horrible analogy I chose to use look like a motel that rents by the hour and doesn't clean the sheets in-between.

I'm pretty sure the people working there can find someplace better.

Comment Usability vs security (Score 3, Informative) 67

When DVD Jon published the crack for DVD CSS many of us had been watching encrypted DVDs using open source for quite some time. It was simple, if an electronic device is capable of playing the media, the media could be extracted. I think when I ripped my first DVD, about a year before DeCSS, I spent 3-4 hours single stepping one of the Windows DVD players (might have been a Mac one) and while I didn't reverse engineer the encryption, I did cut/paste the function and use it in Linux instead. Point being, if it can be displayed, it can be extracted.

I've been doing signal processing for about 20 years now. Sometimes as a profession, sometimes just to solve puzzles. Watermarks are far more useless today than ever before.

A file watermark which we could instead see as simply signing a file can be ignored when reading it. There's nothing of value there other than to authenticate origination of the file. Unless the full file workflow has the private keys involved, the file can't be color corrected or any such thing. And if the keys are transferrable, the signature has no value at all.

I would spend more time on file watermarks, but they're useless for anything other than proving whether they originated unaltered at a specific source.

There are visible watermarks such as those which TV networks would place on the screen to advertise their brand and identify the source of material. At one point we cropped this. Now, we simply execute a convnet to identify the position of a watermark and its edges, delete it and in-paint it. For a picture this is a little more difficult than a movie since you only have two dimensions to in-paint from. Neural networks are nice for this particular purpose, but if the watermark was big enough (more or less interfering with the photo) or complex enough, modern neural networks could lack enough training data to infer the missing region. For film, it's a non-issue because signal processing can easily reproduce missing regions from motion as it's pretty likely that future and past frames probably hold the missing content. Then neural networks can infer the rest.

Then there's "invisible watermarks" which I've experimented with over time. I've come up with some pretty creative techniques which are certainly circumventable which employed altering quantization factors in individual macroblocks within videos. It would really take little more than two similar streams with different watermarks and a simple delta comparison to reverse the process. But it is mathematically impossible to produce an invisible watermark which would survive any editing at all... such as recompression. And if it's known how to create or identify a watermark, then it's possible to remove it and add a new one. If the watermark is a signature, the private keys would be needed, but there's little or no chance of storing a signature with enough bits to be useful without altering the image and making the signature visible.

I wish there were a way to invest against companies making these technologies so I could make money by publishing how to circumvent all their algorithms. It would be really fun :) Wait for them to become big and then short trade and publish that their algorithm has been irreversibly cracked?

Comment Re:Maybe Rust would be a Much Better Choice (Score 1) 139

I love your response. I even made a point by point response and when I really looked at the issues you described, I realized it was futile. So I erased and started over.

You're using crap libraries. You can't have a debugged library which has memory leaks. MFC enforced horrible coding standards mainly because it had an eventing model which was junk. And on a modern CPU, all of the cases you made for real-time systems an C++ are irrelevant because the CPU itself will interfere more with the code you write than the language does. In fact, C and JavaScript will have the same problems.

Last I checked, C doesn't like array references with a negative index and I've also found that referencing a position past the end of the array can be problematic.

As for the indeterministic memory consumption of libraries. There are two reasons for this :
- You're using a shit library and whoever wrote it should be drawn and quartered
Or
- The library is performing actions on dynamic data and the data itself is deciding the structure of the memory use and as a result the library is operating appropriately.

A library capable of parsing PLC structured text is an excellent example of a an indeterministic use case. Even if I were to write such a library using C, flex and/or bison, once I parsed the data into a tree, the tree would scale approximately the same as the structure of the tokens generated when parsing the text.

You can't have a language which offers a generic safe, multithreaded variable length arrays AND have real-time. As soon as you're multithreaded, you've now decided that you need cache coherence and on ANY modern processor that is impossible. And implementing something in C or C++ will make absolutely no difference.

C++ is not the right language for the kernel, neither is C or Rust. Consider that Rust exists because people were writing a web browser and learned there wasn't a good programming language for writing a browser (I spent years writing browsers in C++) and they made a new language suitable for writing the browser. The same should have been done for the Linux kernel a long time ago. Instead, people spend over 20 years just writing crap code in a crap language because rather than making the right language for the job, they instead religiously stuck to the junk they already had and loved.

Comment Re:So which AMD CPUs contain FPGAs? (Score 1) 27

Hmm interesting. I wonder how this operates in a Zync environment.

Generally my Zync projects wouldn't be able to do this since most of my data paths from the CPU are through the FPGA component. So, reprogramming while online would be problematic. It could be interesting to experiment though.

Comment Crap research (Score 1) 70

Drawing the conclusion that because a general LLM can't diagnose medical cases AI won't be able to replace GPs is beyond foolishness, it's outright idiocy.

Provide a means of inputting the correct data and train a model based on that and then see what happens.

A slightly modified airport body scanner would provide more than enough data to diagnose medical conditions far more accurately than any doctor could dream of. Walk through, take scans producing a 3-5 seconds of video (it would require rapid scanning, let's say 10 frames per second) and take that video and diagnose that. There would be far more data than a human doctor could hope to process, but a trained model would catch things like cancerous tumors months or even years sooner than a GP could.

Why do people waste their efforts of stupid research like this.

It's like asking a medical doctor to act as a virus expert on television. What the hell does a medical doctor know about viruses?

Comment Re:nonsense (Score 1) 50

China is different. There are similarities. Like the USSR spent more effort on underground railroads for transporting ICBMs than on ICBMs. That way, the could transport a missile of the to the silo where the US could see it, then, they would transport it back underground and repaint it to send again. This caused the US to waste incredible resources on weapons no one would ever dream of using.

China likes building cheap and crappy weapons like pathetic aircraft carriers. They can easily do much better, but why bother. Air craft carriers are idiotic devices made to transport manned fighter jets which is something only morons would do in 2023. But, every time China build a crap carrier, the US starts sucking up ship building resources while China strengthens their nation by using their ship builders to make more economical container ships.

China is a much more dangerous rival than the USSR could have ever been. Their idealism is focused entirely on the strength of their people. China outright begs their people to reach for the stars. The USSR never understood that the free market economy is entirely compatible with communism. And they didn't see that individualism is also compatible with communism. China's worst problem is there is no effective way to get rid of religion or to make people see themselves as a family. Religion is the greatest enemy of peace, ethics or moral values. If China could convince people that we're all the same at convince people to learn a common language so they could communicate and receive information without 'lost in selective translation" issues, they wouldn't resort to harsh Norwegian style methods.

China has far greater resources than the US. China is 50 years behind the US in human development. While the US is fat and lazy, China is spry and agile. The Chinese people are working hard as hell to achieve class elevation. The US has peaked a long time ago. It's very easy to see if you visit the US that the climbers raise their kids to live off the accomplishments of their parents. Then there's the other people who believe you should make the best with the hand you're delt. Americans are settled and good enough is good enough. China has four times the people and they're ambitious and driven. Give it two generations and they'll be fat and lazy too. But the US should end up far behind China except maybe in military strength. China doesn't need to bother with that.

Slashdot Top Deals

A morsel of genuine history is a thing so rare as to be always valuable. -- Thomas Jefferson

Working...