Training AI demands raw GPU compute. Inference demands something else entirely: memory. The GPUs powering today's models carry limited high-bandwidth memory (HBM) before external memory is ...