The concept is certainly compelling. Having a machine capable of reacting to real-world visual, auditory or other type of data and then responding, in an intelligent way, has been the stuff of science fiction until very recently.
We are now on the verge of this new reality with little general understanding of what it is that artificial intelligence, convolutional neural networks, and deep learning can (and can’t) do, nor what it takes to make them work.
At the simplest level, much of the current efforts around deep learning involve very rapid recognition and classification of objects—whether visual, audible, or some other form of digital data. Using cameras, microphones and other types of sensors, data is input into a system that contains a multi-level set of filters that provide increasingly detailed levels of differentiation. Think of it like the animal or plant classification charts from your grammar school days: Kingdom, Phylum, Class, Order, Family, Genus, Species.
The trick with machines is to get them to learn the characteristics or properties of these different classification levels and then be able use that learning to accurately classify a new object they haven’t been previously exposed to. That’s the gist of the “artificial intelligence” that gets used to describe these efforts. In other words, while computers have been able to identify things they’ve seen before, learning to recognize a new image is not just a dog, but a long-haired miniature dachshund after they’ve “seen” enough pictures of dogs is a critical capability. Actually, what’s really important—and really new—is the ability to do this extremely rapidly and accurately.
Like most computer-related problems, the work to enable this has to be broken down into a number of individual steps. In fact, the word “convolution” refers to a complex process that folds back on itself. It also describes a mathematical formula in which results from one level are fed forward to the next level in order to improve the accuracy of the process. The phrase “neural network” stems from early efforts to create a computing system that emulated the human brain’s individual neurons working together to solve a problem. While most computer scientists now seem to discount the comparison to the functioning of a real human brain, the idea of a number of very simple elements connected together in a network and working together to solve a complex problem has stuck, hence convolutional neural networks (CNNs).
Deep learning refers to the number, or depth, of filtering and classification levels used to recognize an object. While there seems to be debate about how many levels are necessary to justify the phrase “deep learning,” many people seem to suggest 10 or more. (Although Microsoft’s research work on visual recognition went to 127 levels!)
A key point to understanding deep learning is there are two critical but separate steps involved in the process. The first involves doing extensive analysis of enormous data sets and automatically generating “rules” or algorithms that can accurately describe the various characteristics of different objects. The second involves using those rules to identify the objects or situations based on real-time data, a process known as inferencing.[pullquote]A key point to understanding deep learning is there are two critical but separate steps involved in the process.”[/pullquote]
The “rule” creation efforts necessary to build these classification filters are done offline in large data centers using a variety of different computing architectures. NVIDIA has had great success with their Tesla (the chip, not the car)-based GPU-compute initiatives. These leverage the floating point performance of graphics chips and the company’s GPU Inference Engine (GIE) software platform to help reduce the time necessary to do the data input and analysis tasks of categorizing raw data from months to days to hours in some cases.
We’ve also seen some companies talk about the ability of other customizable chip architectures, notably FPGAs (Field Programmable Gate Arrays), to handle some of these tasks as well. Intel recently purchased Altera to specifically bring FPGAs into their data center family of processors, in an effort to drive the creation of even more powerful servers and ones uniquely suited to performing these (and other) types of analytics workloads.
Once the basic “rules” of classification have been created in these non real-time environments, they have to be deployed on devices that accept live data input and make real-time classifications. Though related, this is a different set of tasks and a different type of work than what’s used to create these rules in the first place.
In this inferencing area, we’re just starting to see a number of companies talking about bringing deep learning and artificial intelligence to a variety of devices. In truth, there’s little to no new “learning” going on in these implementations—they’re essentially completely focused on being able to recognize the objects, situations or data points they are pre-programmed to look for based on the rules or algorithms that have been loaded onto them for a particular application. Still, this is an enormously difficult task because of the need to run the multiple layers of a convolutional neural network in real time.
Qualcomm, for example, just announced their 820 chip, known primarily as the compute engine inside many of today’s high-end smartphones, can be used for deep learning and neural network applications. The new ingredient required to make this work is the Snapdragon Neural Processing Engine, an SDK powered by the company’s Zeroth Machine Intelligence Platform. The combination can be used on the 820 to speed the performance of CNNs and deep learning on devices ranging from connected video cameras to cars and much more. The 820 incorporates a CPU, GPU and DSP, all of which could potentially be used to run deep learning algorithms for different applications.
In the case of autonomous cars—which are expected to be one of the key beneficiaries of deep learning and neural networks—NVIDIA’s liquid-cooled Drive PX2 platform can also accelerate neural network performance. Announced at this year’s CES, the Drive PX2 includes two next generation SOCs (System on Chip—essentially a CPU, GPU and other computing elements all connected together on a single chip). It is specifically designed to monitor the camera, LIDAR and other sensor inputs from a car, then to recognize objects or situations and react accordingly.
Future iterations of AI and deep learning accelerators will likely be able to bring some of the offline “rule creating” mechanisms onboard so that objects equipped with these components will be able to get smarter over time. Of course, it’s also possible to update the algorithms on existing devices in order to achieve a similar result.
Regardless of how the technology evolves, it’s going to be a critical element in the devices around us for some time to come, so it’s important to understand at least a little bit about how the magic works.