"1) power usage is more important at this juncture."
Is it? At this "juncture"? If we're talking about AI inferences, how is power usage important at all?
"2) inference is memory bandwidth limited, not compute limited."
So what? How is "gigawatts" useful in that regard?
"A device with 17 bf16 TFLOPS may be able to do inference at precisely the same rate as a device with 4 bf16 TFLOPS."
So what?
"i.e., for very large model inferece- TFLOPS isn't a great measure."
So what? Who's using that? The measure is a gigawatts, the dumbest measure of all.
"I suppose we could transition to aggregate GB/s as our unit of measure."
You are truly a moron. Maybe you could transition to teaspoons of sugar as well.