www.eetimes.com, Jul. 23, 2020 –
While inference accelerators started out primarily in the data center, they have quickly moved to edge inference with applications such as autonomous driving and medical imaging. Through this transition, customers are finding out, often the hard way, that the same accelerator that did so well processing images in the data center fails badly in edge inference. The reason for this is simple: one processes a pool of data while the other processes a stream.
Streaming throughput is when you process at batch = 1 and a pool is when you process batch = many. In the data center, customers are typically processing pools of data such as photos that are being tagged. The goal is getting through as many photos as possible with the least amount of resources and power consumption, and best latency.