Privacy Invasion in a World of Immature AI

Recent reports have come to light that Facebook, Apple, Amazon, Google, and Microsoft all have teams dedicated to listening and transcribing users voice recordings from voice interactions with smart assistants. I thought this information created an opportunity to discuss and critique of the immaturity of AI.

All the aforementioned companies have made adjustments to this process by ending it entirely or letting people opt-out but the fear of privacy invasion, even by extremely privacy-minded companies, is still lingering in the public mind. This information coming to light brings up a few interesting points about the immature state of AI and how companies can be even more privacy-minded during this immature stage.

Immature AI
This entire concept of technology and Artificial Intelligence is incredibly immature. Long-time readers of my writing know that I have articulated how we are still in the stage of training machines, which is the precursor to actually having something I would consider artificial intelligence. Because we are still training computers and doing so mostly manually by humans, we get the kind of situations where big tech companies need to have humans involved to help train the intelligence they are trying to build into their platforms.

In many contexts for machine learning, computers still need to learn from labeled data. For example, to train a computer to recognize a dog, humans had to find thousands of pictures of dogs, and label them “dog” before they could insert that training data to the machine. People in ML/AI research have long articulated this world of machines still needed labeled and structured data in order to learn. In some interesting cases, like autonomous cars, for example, it is possible to create graphics simulations of roads and cars, which my nature will be labeled, and use that to train machines. This, however, does not work in every context. There are still many situations were labeled still image data is needed to train machines.

In the case of voice, which is probably the most relevant place humans are still heavily needed today, it is understandable why big tech companies can’t automate training their machines for voice-first systems. A computer simply needs help understanding when it transcribed a humans voice correctly or not, or processed a request correctly or not. Humans are needed to tell the machine whether they heard the user right or transited voice to text correctly. Whether they are training based on the sentiment, I’m not sure, but the voice is the area where humans are still heavily needed.

We are still a long way off from having humans involved in the machine learning process, but the broader question of how to do this with even more privacy-conscious approaches remains.

More Privacy Conscious Approaches
The big question here is whether there is a better way. Yes, these companies anonymize the data, or voice recordings so it can never be tied back to personal information and I trust that take that process seriously. But even the most privacy-conscious companies can do better at coming at privacy from every angle.

I found this article in Slate quite interesting. The article brings up some interesting questions, and frames the human need in machine learning as well, but also asks if better ways to protect user privacy could exist.

The one way I thought was most interesting was to mask the voice of the user. There are many automated ways these companies could have put a distortion filter on the recordings, so the person listening didn’t hear the actual real voice of the user. This is clever, and a great idea as well as a great compromise for this problem.

This is one of those, why didn’t the companies in question think of this first scenarios. This is where the total industry mindset of privacy still needs to change. In order to deliver better services, we the consumers will need to be ok with companies like Apple, Microsoft, Google, and Amazon (I’m intentionally leaving Facebook off this list) using our behavioral data to provide us with better services if you are not interested in that opt-out plain and simple. But history tells us consumers are more than willing to trade-off some levels of privacy for better services.

In this blog post by Apple at their Machine Learning Journal, they set the bar on this subject with the concept of machine learning at scale with privacy in mind. Had Apple introduced this concept of distorted voices, they could have again set the bar for the industry with this challenging problem of voice + AI training. Rather than Apple ending the program, I’d rather them use solutions like this along with a more distinct user opt-in process, so that Siri can continue to get better.

Voice assistants will play a critical role in our future actions with technology. Knowing humans still need to involved in the process; this is a key challenge to overcome in order to compete in voice assistants but still respect user privacy holistically. I’m optimistic the industry can keep moving forward here, but it goes back to protecting user privacy from every angle and having that mindset as a part of every process where user data/information is used to train machines.

Published by

Ben Bajarin

Ben Bajarin is a Principal Analyst and the head of primary research at Creative Strategies, Inc - An industry analysis, market intelligence and research firm located in Silicon Valley. His primary focus is consumer technology and market trend research and he is responsible for studying over 30 countries. Full Bio

Leave a Reply

Your email address will not be published. Required fields are marked *