GPUs excel at swiftly transferring data but suffer from limited locality of reference due to their relatively small caches, which makes them better suited for scenarios that involve heavy computation on small datasets rather than light computation on large ones. Consequently, the networks optimized for GPU architecture tend to run in layers sequentially to maximize the throughput of their computational pipelines (as illustrated in Figure 1 below). To accommodate larger models, given the GPUs' restricted memory capacity of only tens of gigabytes, multiple GPUs are often pooled together, leading to the distribution of models across these units and resulting in a convoluted software framework that must navigate the intricacies of communication and synchronization between different machines. In contrast, CPUs possess significantly larger and faster caches, along with access to extensive memory resources that can reach terabytes, allowing a typical CPU server to hold memory equivalent to that of dozens or even hundreds of GPUs. This makes CPUs particularly well-suited for a brain-like machine learning environment, where only specific portions of a vast network are activated as needed, offering a more flexible and efficient approach to processing. By leveraging the strengths of CPUs, machine learning systems can operate more smoothly, accommodating the demands of complex models while minimizing overhead.