AI is no Knight in Shining Armor fighting to save Humanity

Last week during Mark Zuckerberg’s congressional hearing we heard Artificial Intelligence (AI) mentioned time and time again as the one size fits all solution to Facebook’s problems of hate speech, harassment, fake news… Sadly though, many agree with me that we are a long way away from AI to be able to eradicate all that is bad on the internet.

Abusive language and behavior are very hard to detect, monitor, and predict. As Zuckerberg himself pointed out, there are so many different factors that play into making this particular job hard: language, culture, context, all play a role in helping us determine if what we hear, read or see is to be deemed offensive or not.

The problem that we have today with most platforms, not just Facebook, is that humans are determining what is offensive. They might be using a set of parameters to do so, but they ultimately use their judgment. Hence consistency is an issue. Employing humans also makes it much harder to scale. Zuckerberg’s 20,000 people number sure is impressive, but when you think about the content that 2 billion active users can post in an hour, you can see how futile even that effort seems.

I don’t want to get into a discussion of how Zuckerberg might have used the promise of AI as a red herring to get some pressure off his back. But I do want to look at why, while AI can solve scalability, its consistency and accuracy in detecting hate speech in the first place is highly questionable today.

The “feed It Enough Data” Argument

Before we can talk about AI and its potential benefits we need to talk about Machine Learning (ML). For machines to be able to reason like a human, or hopefully better, they need to be able to learn. We teach the machines by using algorithms that discover patterns and generate insights from a massive amount of data they are exposed to so that they can make decisions on their own in the future. If we input enough pictures and descriptions of dogs and hand-code the software with what could look like a dog or be described as a dog, the machine will eventually be able to establish and recognize the next engineered “doodle” as a dog.

So one would think that if you feed a machine enough swear words, racial, religious or sexual slurs, it would be able to, not only detect, but also predict toxic content going forward. The problem is that there is a lot of hate speech out there that uses very polite words as there is harmless content that is loaded with swear words. Innocuous words such as “animals” or “parasites” can be charged with hate when directed to a specific group,of people. Users engaging in hate speech might also misspell words or use symbols instead of letters all aimed at preventing keywords-based filters to catch them.

Furthermore, training the machine is still a process that involves humans and consistency on what is offensive is hard to achieve. According to a study published by Kwok and Wang in 2013, there is a mere 33% agreement between coders from different races, when tasked to identify racist tweets.

In 2017, Jigsaw, a company operated by Alphabet, released an API called Perspective that uses machine learning to spot abuse and harassment online and is available to developers. Perspective created a “toxicity score” for the comments that were available based on keywords and phrases and then predicted content based on such score. The results were not very encouraging. According to New Scientist

“you’re pretty smart for a girl” was deemed 18% similar to comments people had deemed toxic, whereas “I love Fuhrer” was 2% similar.

The “feed It the Right Data” Argument

So, it seems that it is not about the amount of data but rather, about the right kind of data, but how do we get to it? Haji Mohammad Saleem and his team at the University of McGill, in Montreal, tried a different approach.

They focused on the content on Reddit that they defined as “a major online home for both hateful speech communities and supporters for their target groups.” Access to a large amount of data from groups that are now banned on Redditt allowed the McGill’s team to analyze linguistic practices that hate groups share thus avoiding having to compile word lists and providing a large amount of data to train and test the classifiers. Their method resulted in fewer false positives, but it is still not perfect.

Some researchers believe that AI will never be able to be totally effective in catching toxic language as this is subjective and requires human judgment.

Minimizing Human Bias

Whether humans will be involved in coding or will remain mostly responsible for policing hate speech, it is really human bias that I am concerned about. This is different than talking about approach consistency that considers cultural, language and context nuances. This is about having humans’ personal beliefs creep into their decisions when they are coding the machines or monitoring content. Try and search for “bad hair” and see how many images of beautifully crafted hair designs for Black women show up in your results. That, right there, is human bias creeping into an algorithm.

This is precisely why I have been very vocal about the importance of representation across tech overall but in particular when talking about AI. If we have a fair representation of gender, race, religious and political believes and sexual orientation among the people trusted to teach the machines we will entrust with different kind of tasks, we will have a better chance at minimising bias.

Even when we eliminate bias at the best of our ability we would be deluded to believe Zuckerberg’s rosy picture of the future. Hate speech, fake news, toxic behavior change all the time making the job of training machines a never-ending one. Ultimately, accountability rests with platforms owners and with us as users. Humanity needs to save itself not wait for AI.

Why Apple Had To Release Siri Half-Baked

Siri has been having a bad week. Gizmodo’s Mat Honan called Apple’s voice-response service “a lie.” Daring Fireball’s John Gruber, who rarely has bad things to say about Apple efforts, said it “isn’t up to Apple’s usual level of fit and finish, not by a long shot.”  And my colleague Patrick Moorhead tweeted that inconsistency was leading him to reduce his use ofSiri screen shot the service.

Hang in there, Pat, Siri needs you. I share the frustrations and annoyances of Siri users, but the only way she’s going to get better.

Here’s what I think is going on, with the usual caveat that Apple only shares its thinking with people it can legally keep from talking about it, leaving the rest of us free to speculate. Apple doesn’t much like public beta testing. Before a major release, Microsoft will typically make a new version of Windows or Office to tens of thousand of users for months,  allowing developers to find and fix most of the bugs. Apple limits beta testing mostly to internal users and selected developers. It can get away with this because the real-world combinations of Mac or iOS hardware and software are orders of magnitude simpler than in the Windows world.

Siri is very different. The artificial intelligence engine behind the service lacks any inherent understanding of language. It has to be trained to make connections, to extract meaning from a semantic jumble. To even get to the databases and search tools Siri uses to answer question, it first must contract a query from the free-form natural language that humans begin mastering long before they can talk, but which machines find daunting. (See Danny Sullivan’s Search Engine Land post for an excellent analysis of Siri’s struggles with queries about abortion clinics.)

The secret to machine learning is feedback. I expect that Siri carefully logs every failed query, along with what the user does next. And algorithmic analysis of those logs, combined perhaps with some human intervention, means that every mistake contributes to the process of correction. In other words, Siri learns from its errors and the more people use it, the faster it will get better. Benoit Maison has a good explanation of how this works on his blog.

The server-based learning creates a very different situation from the troubled handwriting recognition that helped doom Apple’s Newton 15 years ago (and to which some critics have compared Siri’s troubles.) Newtons were products of a preconnected age, so there was no way for the community of MessagePads to learn from each other’s mistakes. And the extremely limited processing power memory on the Newton itself made the claim that it would learn from its errors an empty promise. The Newton could never get past “egg freckles.”

Now, all of this said, Apple’s approach to Siri is a distinct departure from its usual practice of under-promising and over-delivering. It properly labeled Siri as a “beta” product. But, at the same time, it is using the half-backed feature as a major selling point for the iPhone 4S, hitting it hard in commercials. This is a disservice to customers, who have learned to expect a high polish on Apple products, and has saddled Siri with unreasonably high expectations that now are inspiring a backlash. Apple had to release Siri prematurely to let the learning process go forward. Let’s hope that Apple did not do the service permanent damage with its hype.