It's literally a big blob of floating point weights
You too can be described by a big blob of floating-point weights.
from ever part of a word to every other part of a word
Wrong. So wrong I don't even know where to start.
First off, transformers does not work on words. Transformers is entirely modality independent. Its processing is not in linguistic space. The very first thing that happens with a LLM (which, BTW, are mainly LMMs these days - multimodal models, with multimodal training, with the different modalities reaching the same place in the latent space) is to throw away everything linguistic and move to a purely conceptual (latent) space.
Secondly, weights are not "weights between words" or even "weights between concepts". You're mixing up LLMs with Markov chain predictors. The overwhelming majority of a LLM model is neural network weights and biases. Neural networks are fuzzy logic engines. Every neuron divides its input space with a fuzzy hyperplane, answering a superposition of "questions" in its input space with an answer from no, through maybe, to yes. The weights define how to build the "questions" from the previous layer's output, while the biases shift the yes-no balance. As the "questions" from each layer are built on the "answers" of the previous layer, each layer answers progressively more complex questions than its previous layer.
The overwhelming majority of a NN's params is in its FFNs, which are standard DNNs. They function as detector-generators - detecting concepts in the input latent and then encoding the logical results of the concepts into the output latent. This happens dozens to hundreds of times per cycle.
running them through transformers with various blocks
I can't even tell what you think you mean when you write the word "blocks". Are you trying to refer to attention masking?
Ask a model "What is the capital of Texas?" Get the answer "Austin". That is knowledge. If that is not knowledge then the word knowledge has no meaning. Knowledge is encoded in the FFTs. BTW, if you're wondering how the specific case of answering about Austin works, here you go.
It's not a "bug" because there's no real code flow that can be adjusted
LLMs absolutely do implement and run self-developed algorithms. With the combination of a scratchpad (LRMs, aka "thinking models"), they're Turing-complete (if you assume an infinite context or context-compaction to meet Turing completeness requirements). They can implement any algorithm, given sufficient time and context. You not only "can" implement all of your standard computing logic in NNs (conditionals, loops, etc), but (A) algorithms are self-learned, and (B) it can do far more than traditional computing as NNs inherently do "fuzzy computing". An answer isn't just yes or no, it's a confidence interval. It's not this-path-or-that, it's both paths to the degree of confidence in each.
The original human re-enforcement learning took thousands of human hours of people sitting in cubes clicking on the generation that was the least retarded.
That is not how foundation training works. Dude, it is literally called unsupervised learning. You are thinking of RLHF. That tunes the "chat" style and such. But the underlying reasoning and knowledge in the model is learned unsupervised. "Here is a giant dump of the internet**, learn to predict it". The act of learning to predict what is basically "everything meaningful humans know about" requires building a model of how the world works - what causes what - which is what Transformers does.
(To be fair, today, we prefilter these "dumps" significantly, change how much different sources are weighted, etc. But for earlier models, they were pretty much just raw dumps)