Adventures in Machine Intelligence

on January 27, 2017

(Tech.Pinions: Today’s Daily piece, “Adventures in Machine Intelligence” was an Insider post we originally published on December 12th, 2016. We post it today as an example of the daily content for our Insider subscribers. You can subscribe, yearly or monthly, at the page found here)

While I tend to stay away from high-performance computing and data center analysis, I’ve taken up the effort to better understand the soup-to-nuts solutions being developed for machine learning, everything from technical details around chipset architectures, software, network modeling, etc. Luckily, a large number of our clients have assets in these areas so engaging in discussions to help me better understand the dynamics has been straightforward. I’m not going to proclaim to be an expert but my technical background, as well as staying current in semiconductor industry analysis, is proving to be quite helpful. I’d like to share a few basics I find quite interesting.

A great deal of the work up to this point has been around data collection. Large amounts of data on specific subject matter, or around specific categories of data, is the key for machine learning. Simply put, data is the building block of machine learning and intelligence. Interestingly, and somewhat contrary to some opinions, it is not the person or company with the most data who is best positioned but those with the right data. A key part of this analysis about where we go in machine intelligence and how that translates to smart computers (AI) needs to be grounded in collecting the right data. So fundamentally, the starting point is data and the right data.

Lots of companies have been gathering data. Google has been gathering data from searches, world mapping and more. Microsoft has been gathering enterprise data, and Facebook gathers social data. There are a lot of companies gathering data but many are still in the early stages of making their backend data collection efforts into smart machines. In fact, very little of the technology we use is smart. By smart I mean something that is truly predictive and can anticipate human needs. We have a tremendously long way to go in making our machines truly smart. In a recent conversation with some semiconductor architects of machine learning silicon, I asked them if we can estimate a point in time in the history of the personal computer and liken it to where we are today in machine learning. Their answer? No later than the early IBM PCs. This was from folks who have been in the silicon industry for a very long time. The context for this discussion was around how much silicon needs to still advance for machine intelligence and AI to truly start to mature. So it is worth noting their comments on the early IBM days would include their knowledge that the early IBMs ran an Intel 8086’s with 4,500 transistors. Today, we have architectures that have more then 10 billion transistors.

After being convinced we still have a tremendous amount of innovation in semiconductors to get where we need to be in machine learning and AI, I started looking into what is happening today. The next step is to understand how to train a network or how to teach a computer to be smart(er). I stated above it all starts with data, good data, and the right data. Some of the most common examples of network training today are around computer vision. We are teaching computers to identify all kinds of things by throwing terabytes of data at them and teaching them a dog is a dog, a cat is a cat, a car is a car, etc. Training a network is not entirely arbitrary. It is calculated and intentional. The reason is network models have to be built/programmed before they can be trained. Leaning on decades of work on machine learning, many programs exist to train a network in some more common fields that work with large data. Medicine, autonomous vehicles, agriculture, astrophysics, oil and gas, and several others are areas where people have been focused creating this network model. Many hours of hard work and hard science go into the training of these network models so data can be collected and given to the machine so it can learn. Companies playing in this field today are picking their battles in areas where big money is to be made with these training models.

What is fascinating is how long it takes to train a network. With a modern day CPU/GPU and machine learning software, a network can be trained in as little as a few hours depending on the data set. To train a network what a dog is, with roughly two terabytes of data, could take 3-4 hours. However, there are many cases where the data sets are so large it could take several weeks to a month to train a computer just on one single thing. This again underscores the point of how far we still have to go in silicon. I’m reminded of early demonstrations of Microsoft Office running on Pentium chipsets where the demo shined because Excel could process a massive spreadsheet in 30 minutes or less. Today, it is nearly instantaneous. Someday, training a network will be nearly instant as will its ability to query that data and yield insight or a response. Both instant and in real time is the holy grail but we are many years away.

Knowing how early in this stage we are makes it hard to count any company out at this point. But it does emphasize how the right data being collected is key. Companies are right now setting the stage by getting the right data they need to carve out value in the future with AI. What is fascinating is how deep learning algorithms are helping networks learn faster with less data. Expert consensus affirms that having the largest data sets is not necessarily the guarantee of who wins in the future. Because specific network models have to be built, it emphasizes the collection of the “right data” philosophy.

What this means is companies and services can benefit from months or years of the right kind of specific data and still train a network model. Even companies who are starting today and just starting to gather data have a chance in this future leveraging machine intelligence for themselves and their customers — if the data is good.

With some context of where we need to go, silicon architectures — CPU, GPU, FPGA, custom ASICs, as well as memory — are all key to advancing technology in the data center for more efficient and capable backend systems for machine intelligence. But all are still governed by science and we have a relatively good idea what is possible and when. Which is why we know it will still be many, many years and, hopefully, a few new breakthroughs before we get even close to where we need to be for our intelligent computer future.