Comment Re:This is mind boggling stupid.... (Score 2) 162
It's an 16x RTX Pro 6000s.
- 1.6 TB/s GDDR7 vs 8 TB/s HBM3e on a B200
- No NVLink, just PCIe Gen5 which is roughly two orders of magnitude slower
- A gigabit switch... _gigabit_
That's not a frontier model server, that's an AI workstation. And no, I'm not exaggerating. You don't run models production models over PCIe bridges.
And, if you continue reading the docs you'll see that the XFRA model has account data, chat history, system prompts, all the supporting infra back at the data center. So you have a substantially slower inference that still has to phone home over a 1 gigabit _symetric_ link for the data it needs.
And that's on top of the fact this "distributed" data center thing has been tried before and doesn't work for many practical reasons.
- 1.6 TB/s GDDR7 vs 8 TB/s HBM3e on a B200
- No NVLink, just PCIe Gen5 which is roughly two orders of magnitude slower
- A gigabit switch... _gigabit_
That's not a frontier model server, that's an AI workstation. And no, I'm not exaggerating. You don't run models production models over PCIe bridges.
And, if you continue reading the docs you'll see that the XFRA model has account data, chat history, system prompts, all the supporting infra back at the data center. So you have a substantially slower inference that still has to phone home over a 1 gigabit _symetric_ link for the data it needs.
And that's on top of the fact this "distributed" data center thing has been tried before and doesn't work for many practical reasons.