How AI Accelerators Are Changing The Face Of Edge Computing

AI has become the key driver for the adoption of edge computing. Originally, the edge computing layer was meant to deliver local compute, storage, and processing capabilities to IoT deployments. Sensitive data that cannot be sent to the cloud for processing and analysis is handled by the edge. It also reduces the latency involved in the roundtrip to the cloud. Most of the business logic that runs in the cloud is moving to the edge to provide low-latency, faster response time. Only a subset of the data generated by sensors is sent to the cloud after aggregating and filtering the data at the edge. This approach significantly saves the bandwidth and cloud storage costs for enterprises.


Edge is delivering three essential capabilities – local data processing, filtered data transfer to the cloud and faster decision making.

Since most of the decision making is now taking advantage of artificial intelligence, the edge is becoming the perfect destination for deploying machine learning models trained in the cloud.

Machine learning and deep learning models are exploiting the power of Graphics Processing Units (GPU) in the cloud to speed up training. Mainstream cloud providers such as AWS, Azure, Google Cloud Platform, IBM, and Alibaba are offering GPU as a service. Modern deep learning frameworks including TensorFlow, PyTorch, Apache MXNet and Microsoft CNTK are designed to take advantage of the underlying GPUs for accelerating the training process.

When compared to the enterprise data center and public cloud infrastructure, edge computing has limited resources and computing power. When deep learning models are deployed at the edge, they don’t get the same horsepower as the public cloud which may slow down inferencing, a process where trained models are used for classification and prediction.

To bridge the gap between the data center and edge, chip manufacturers are building a niche, purpose-built accelerators that significantly speed up model inferencing. These modern processors assist the CPU of the edge devices by taking over the complex mathematical calculations needed for running deep learning models. While these chips are not comparable to their counterparts – GPUs – running in the cloud, they do accelerate the inferencing process. This results in faster prediction, detection and classification of data ingested to the edge layer.

There are three AI accelerators that are available today that accelerate inferencing at the edge. Let’s take a closer look at them.


When it comes to GPUs, NVIDIA is an undisputed market leader. Almost every public cloud provider offers NVIDIA-based GPUs such as K80, P100 and T4 models as a part of the infrastructure service. The company also sells DGX and EGX servers that come with multiple high-end GPUs specifically meant for running deep learning, high-performance computing, and scientific workloads.

NVIDIA has built the Jetson family of GPUs specifically for the edge. In terms of programmability, they are 100% compatible with their enterprise data center counterparts. These GPUs have fewer cores and draw less power than the traditional GPUs powering the desktops and servers. Jetson Nano is the most recent addition to the Jetson family which comes with a 128 core GPU. Nano is the most affordable GPU module that NVIDIA has ever shipped. A form-factor that closely resembles the Raspberry Pi, Jetson Nano Dev Kit empowers hobbyists, makers and professionals to build next-generation AI and IoT solutions. Jetson TX2 and Jetson Xavier are meant for industrial and robotic use cases.

NVIDIA Jetson devices are powered by a unified software stack called JetPack which comes with essential drivers, runtime and libraries that can run ML and AI models at the edge. Data scientists and developers can easily convert TensorFlow and PyTorch models to TensorRT, a format that optimizes the model for accuracy and speed.

NVIDIA Jetson is already the most popular edge computing platform that integrates with Azure IoT Edge and AWS IoT Greengrass.

Intel Movidius & Myriad Chips

In 2016, Intel acquired Movidius, a niche chipmaker that made computer vision processors used in drones and virtual reality devices. The flagship product of Movidius was Myriad, a chip that is purpose-built for processing images and video streams. It is positioned as a VPU – Vision Processing Unit – for its ability to deal with computer vision.

After acquiring Movidius, Intel has packaged Myriad 2 in a USB thumb drive form factor, which is sold as a Neural Compute Stick (NCS). The best thing about NCS is that it works with both x86 and ARM devices. It can be easily plugged into an Intel NUC or a Raspberry Pi for running inferencing. It draws power from the host device without the need for an external power supply.

Similar to NVIDIA JetPack, Intel has built a software platform to optimize machine learning models for Movidius and Myriad. Intel Distribution of OpenVINO Toolkit runs computer vision models trained in the cloud at the edge. OpenVINO, which stands for Open Visual Inference and Neural Network Optimization, is an open source project that aims to bring consistent inferencing approach to models running on x86, ARM32, and ARM64 platforms. Existing Convolutional Neural Network (CNN) models can be converted into OpenVINO Intermediate Representation (IR) that significantly reduces the size of the model while optimizing it for inferencing.

Intel is embedding Movidius and Myriad chips into x86 developer kits designed and sold under the brand of Up Squared. They are also available as USB sticks and add-on cards that can be attached to PCs, servers, and edge devices.

Intel is betting big on Movidius as the hardware platform and OpenVINO as the software platform to capture the edge market.

Google Edge TPU

A couple of years ago Google announced that it is adding Tensor Processing Units (TPUs) to its cloud platform to accelerate machine learning workloads. Cloud TPUs deliver the computational power needed to train complex machine learning models based on deep neural networks. Currently, in V2, Cloud TPUs can deliver 180 teraflops of performance with 64 GB High Bandwidth Memory (HBM). Customers using Google Cloud Platform can connect to Cloud TPUs from custom VM types to balance processor speeds, memory, and high-performance storage resources for running compute-intensive workloads.

More recently, Google announced the availability of Edge TPU, a flavor of its TPU designed to run at the edge. Edge TPU complements the Cloud TPU by performing inferencing of trained models at the edge. According to Google, Edge TPU is a purpose-built ASIC designed to run AI at the edge. It delivers high performance in a small physical and power footprint, enabling the deployment of high-accuracy AI at the edge. These purpose-built chips can be used for emerging use-cases such as predictive maintenance, anomaly detection, machine vision, robotics, voice recognition, and many more.

Developers can optimize TensorFlow models for Edge TPU by converting them to TensorFlow Lite models compatible with the edge. Google has made a web-based and command line tool to convert existing TensorFlow models optimized for Edge TPU. Unlike NVIDIA and Intel edge platforms, Google’s Edge TPU cannot run models other than TensorFlow at the edge. Google is building a seamless pipeline that automates and simplifies the workflow involved in training the models in the cloud and deploying them on Edge TPU. Cloud AutoML Vision, an automated, no-code environment to train CNNs supports exporting the model into TensorFlow Lite format optimized for the Edge TPU. By far, this is the simplest workflow available to build cloud-to-edge pipelines.

With AI becoming the key driver of the edge, the combination of hardware accelerators and software platforms are becoming important to run the models for inferencing. NVIDIA Jetson, Intel Movidius and Google Edge TPU are the choices available to accelerate AI inferencing at the edge.


Article by channel:

Read more articles tagged: Edge Computing