Neural Magic Description
The GPUs are fast at transferring data, but they have very limited locality of reference due to their small caches. They are designed to apply a lot compute to little data, and not a lot compute to a lot data. They are designed to run full layers of computation in order to fully fill their computational pipeline. (See Figure 1 below). Because large models have small memory sizes (tens to gigabytes), GPUs are placed together and models are distributed across them. This creates a complicated and painful software stack. It also requires synchronization and communication between multiple machines. The CPUs on the other side have much larger caches than GPUs and a lot of memory (terabytes). A typical CPU server may have memory equivalent to hundreds or even tens of GPUs. The CPU is ideal for a brain-like ML environment in which pieces of a large network are executed as needed.
Integrations
Company Details
Product Details
Neural Magic Features and Options
Neural Magic User Reviews
Write a Review- Previous
- Next