Oren Etzioni smiles a lot. He is a computer scientist who runs the Allen Institute for Artificial Intelligence in Seattle.He greets me in his bright office and leads me past a whiteboard scrawled with thoughts on machine intelligence. (” Define success, “” What is the task? ) Outside, young artificial intelligence researchers wear headphones and tap away at keyboards.
Etzioni and his team are looking at common sense. He defined it as two legendary AI moments — IBM’s Deep Blue beating chess grandmaster Garry Kasparov in 1997, and DeepMind’s AlphaGo beating the world’s top go player, Lee Sedol, last year. (Google acquired DeepMind in 2014.)
“With Deep Blue, our program can make superhuman chess moves while the room is on fire.” Etzioni joked. “Right? Complete lack of context. Fast forward 20 years, when the room is on fire, we have a computer that can play superhuman go moves.” Humans, of course, have no such limit. If there is a fire, people will sound the alarm and run to the gate. In other words, humans possess a basic knowledge of the world (fire burns things), as well as the ability to reason (you should try to stay out of control of fire).
In order for AI to truly think like a human, we need to teach it things that everyone else knows, like physics (balls thrown into the air fall) or things of relative size (elephants can’t fit in a bathtub). Before AI had these basic concepts, Etzioni argued that AI could not reason.
With Hundreds of millions of dollars invested by Paul Allen, Etzioni and his team are working to develop a layer of common-sense reasoning to fit existing neural networks. (The Allen Institute is a nonprofit, so everything they find will be out there for anyone to use.) The first question they face is to answer a question: what is common sense?
Etzioni describes it as all the knowledge about the world that we take for granted, but rarely say out loud. He and his colleagues created a series of benchmark questions that a truly rational AI should be able to answer: If I put my socks in a drawer, will they be there tomorrow? Will people be angry if I step on their toes? One way to get this knowledge is to extract it from humans. Etzioni’s lab is paying crowdsourcing people on Amazon Turk to help them make common-sense statements.
The team then uses a variety of machine learning techniques — some old-fashioned statistical analysis, some deep-learning neural networks — to train based on those statements. If they do it right, Etzioni believes they can produce reusable computer-reasoning “Legos” : a set that can understand words, a set that can master physics, and so on.
Yejin Choi, one of the scientists on Etzioni’s team who studies common sense, led several crowdsourcing efforts. In one project, she wants to develop an artificial intelligence that can understand a person’s actions or state their underlying intentions or emotions. She started by studying the thousands of Wiktionary online stories, blogs and idiom entries to extract “phrase events” such as “Jeff knocks Roger out.” She would then anonymously record each phrase — “X knocked Y unconscious” — and ask the crowdsourcers on the Turk platform to describe X’s intentions: why did they do it?
When she collected 25,000 of these marker sentences, she used them to train a machine learning system to analyze sentences it had never seen before and infer their mood or intent. At best, the new system works only half the time. But when it’s up and running, it shows some very human perception: Give it a line like “Oren made Thanksgiving dinner,” and it predicts Oren will try to impress his family.
“We can also reason about other people’s reactions, even if they are not mentioned,” Choi said. “So X’s family might feel impressed and loved.” Another system her team built uses crowdsourced people on a Turk platform to mark people’s psychological states in stories; Given a new situation, the resulting system can also draw some “sharp” inferences.
For example, I was told that a coach music for his band bad performances are angry, and said, “the coach is very angry, took his chair away. Artificial intelligence will predict they will” fear “, although the story is not clearly illustrate this point. Cui Ye Jin, Emmanuel, Onions, and their colleagues did not give up the deep learning. In fact, They think it’s a very useful tool, but they don’t think there’s any shortcut to convincing people to make explicit statements about the weird, invisible, implicit knowledge that we all have.
Deep learning is garbage in, garbage out. It is not enough to feed a neural network with lots of news articles, because it does not absorb unstated knowledge, which is the obvious thing that writers do not want to mention. As Choi said, “People don’t say, ‘My house is bigger than me.'” To help solve this problem, she had crowdsources on the Turk robot platform analyze the physical relationships implied by 1,100 common verbs, such as “X throws Y.” This in turn provides a simple statistical model to use the sentence “Oren threw a ball” to infer that the ball must be smaller than Oren.
Another challenge is visual reasoning. Aniruddha Kembhavi, another A.I. scientist on Etzioni’s team, showed me a virtual robot roaming the screen. Other scientists at the Allen Institute have built sims-like houses filled with everyday objects — kitchen cupboards filled with dishes, couches laid out as they please, and in accordance with the laws of physics in the real world.
Then they designed the robot, which looks like a dark gray garbage can with arms, and told it to search for certain items. After completing thousands of tasks, the neural network gained the foundation to live in real life.
“When you ask him, ‘Do I have tomatoes? He doesn’t open all the cabinets. He’s more inclined to open the refrigerator,” Kambawi said. “Or, if you say, ‘Find me my keys,’ he doesn’t try to pick up the TV. He looks behind the TV. He already knows that the TV is not usually taken away.” Etzioni and his colleagues hope that these different components — Choi’s linguistic reasoning, visual thinking, and other work they’re doing to enable AI to master textbook scientific information — will eventually come together.
But how long will it take, and what will the final product look like? They don’t know. The common sense system they are building still gets it wrong, sometimes more than half the time. Choi estimates that she will need about a million artificial languages to train her various language parsers. It seems unusually difficult to establish common sense.
There are other reasonable ways to make machines, but they are more labor intensive. For example, you could sit down and write out by hand all the rules that tell the machine how the world works. That’s how Doug Lenat’s Cyc project works.
For 34 years, Leinart has hired a team of engineers and philosophers to write 25 million common-sense rules, such as “water is wet” or “Most people know their friends by name.” This allowed Cyc to infer: “If your shirt is wet, so you’re probably in the rain.” The advantage is that Lenart has precise control over what goes into the Cyc database; Crowdsourced knowledge is not.
This kind of artificial intelligence, made up of rough, manual behavior, has become unfashionable in the world of deep learning. This is partly because it can be “fragile” : without the right rules for the world, AI could be in trouble. This is why stylized chatbots are so “mentally retarded”; They can’t deduce if they’re not explicitly told how to answer a question.
Cycs are more capable than chatbots and have been approved for use in healthcare systems, financial services and military projects. But the work has been slow and costly. Leinart says Cyc cost about $200 million to develop. But little by little manual programming may simply replicate some of the inherent knowledge that, according to Chomskyite, the human brain possesses.
That’s what Dilip George and the researchers did with Breakout. In order to create an AI that doesn’t become “mentally retarded” in the face of changes in the game layout, they abandoned deep learning and built a system that hardcoded basic assumptions. George told me that their AI learned effortlessly that “objects exist, that objects interact, and that there is a causal relationship between one object’s motion and its collisions with other objects.”
In Breakout, the system developed the ability to measure different courses of action and their likely outcomes. But it also has the opposite effect. If the AI wants to break a brick in the top-left corner of the screen, it will rationally place the tablet in the top-right corner. This means that when Vicarious changes the rules of the game — adding new blocks or elevating tablets — the system will be compensated. It seems to capture some generic understanding of Breakout itself.
Clearly, there are engineering trade-offs to this kind of AI. Arguably, careful design and careful planning to figure out exactly what pre-programmed logic to put into the system is a much harder job. When designing a new system, it is difficult to strike the right balance between speed and accuracy. George says he looks for the smallest data set “to put into the model so it can learn quickly.” The fewer assumptions you need, the more efficient the machine will be at making decisions.
Once you’ve trained a deep learning model to recognize cats, you can show it a Russian blue cat it’s never seen before, and it will immediately conclude — this is a cat. After processing millions of photos, it not only knows what makes a cat a cat, it also knows the fastest way to identify a cat.
Vicarious’s AI is slower by comparison because it actively makes logical inferences over time. When Vicarious’s AI works well, it can learn from less data. George’s team created an artificial intelligence to break through the neural network’s “I’m not a robot” barrier by identifying distorted font images.
Like the Breakout system, they pre-endowed the AI with some capabilities, such as knowledge to help it recognize characters. With the guidance in place, they only had to train the AI on 260 images before it learned to crack the captcha code with 90.4 percent accuracy. A neural network, by contrast, needs to be trained on more than 2.3 million images to crack a captcha.
Others, in different ways, are building commonsense structures into neural networks. For example, two DeepMind researchers recently created a hybrid system: part deep learning, part more traditional techniques. They called this system inductive logic programming. The goal is to create something that can reason mathematically.
They trained it with the children’s game “Fizz-Buzz,” in which you count up from 1 and say “fizz” if a number is divisible by 3, and “buzz” if it is divisible by 5. A normal neural network can only process numbers it has seen before; If you train it to 100 minutes, it will know “fizz” at 99 and “buzz” at 100.
But it doesn’t know what to do with 105. DeepMind’s hybrid deep thinking system, by contrast, seemed to understand the rule and had no problems with numbers over 100. Edward Grefenstette, one of the DeepMind programmers who developed the hybrid system, said, “You can train systems to reason in a way that the deep learning network can’t do alone.”
Yann LeCun, a deep learning pioneer and now head of artificial intelligence research at Facebook, agrees with many of the critics of the field. It requires too much training data, cannot reason, and lacks common sense, he admits. “I’ve basically been repeating that over and over again for the last four years,” he reminded me. But he still believes that with the right kind of in-depth study, the answers can be found. He disagrees with Chomsky’s view of human intelligence. He argues that the human brain develops reasoning through interaction rather than through internal rules.
“If you think about how animals and babies learn, in the first few minutes, hours, days of life, a lot of things are learned so fast that it seems natural,” he points out. “But in fact, they don’t need to hard code because they can learn something very quickly.” From this perspective, in order to understand the physics of the world, a baby simply needs to move its head around, data the incoming image, and conclude that depth of field is what it is.
Still, Yang admits it’s not clear what pathways could help deep learning bounce back. It could be “antagonistic” neural networks, a relatively new technique in which one network tries to trick the other with fake data, forcing the second network to develop extremely subtle internal representations of images, sounds and other inputs.
The advantage is that there is no “lack of data” problem. You don’t need to collect millions of pieces of data to train neural networks, because they learn from each other. (Author’s note: A similar method is being used to make those deeply disturbing “deepfake” videos in which someone seems to be saying or doing something they’re not.)
I met Likun Yang in his office at Facebook’s artificial Intelligence Lab in New York. The horse