Why do doctors recommend that people receive a flu vaccine every year?
(A) Vaccines increase in strength over time.
(B) Viruses tend to mutate from year to year.
(C) Viruses tend to replicate more rapidly over time.
(D) Vaccines are absorbed completely by the body after a year.
Got the right answer? This is the kind of question that researchers plan to test on artificial intelligence in a new contest based on eighth-grade science knowledge.
The Seattle-based Allen Institute for Artificial Intelligence (AI2) is launching the competition on Wednesday. It’s pretty simple: the one that answers the most questions right wins.
“We think using science exam questions is a really interesting way to approach the AI problem, because there’s all kind of interesting common sense problems that come up… It’s not just knowing science facts and applying them,” explained Carissa Schoenick, program manager for AI2’s “Project Aristo” AI system.
“It’s actually understanding questions using machine reading to correctly parse the linguistic structures behind these questions and get at what the core of the question is about, and then using reasoning to access different facts or different bits of logic to arrive at the right answer.”
"We’ve been in the fourth grade world for a while, and we’re excited to start picking into all the challenges of eighth grade."
The contest is hosted on Kaggle, where participants will have access to lots of sample questions to test their AI. They’ll be given a baseline of the minimum score they should be aiming for, and their performance will ultimately be tested on a set of previously-unseen questions. The winner will receive $50,000, with $20,000 and $10,000 offered for second and third place.
Schoenick is keen to see anyone get involved, though she’s particularly interested in taking on IBM’s Watson. The only real stipulation is that participants’ models must be open source; others can still take part, but won’t qualify for the prizes.
AI2's own Aristo was built to answer science questions like this, and the institute hopes the contest will encourage other researchers to pursue the same goal. But it’ll be a challenge: At the moment, Aristo only works with fourth grade exam questions—and only scores around 45 percent on those, far from a passing mark. That goes up to 75 percent, however, when the researchers limit the test to multiple-choice questions and remove those that involve diagrams or written answers.
“We’ve been in the fourth grade world for a while, and we’re excited to start picking into all the challenges of eighth grade,” said Schoenick, and warned that eighth grade questions not only cover more material but also require better reading comprehension and reasoning. The contest will only include multiple choice options. Here’s another example:
Some types of fish live most of their adult lives in salt water but lay their eggs in freshwater. The ability of these fish to survive in these different environments is an example of:
(A) selective breeding
(B) learning a new habit
(D) developmental stages
The open contest won’t include questions based on diagrams, but as you can see it’s still pretty tricky. The Aristo team’s approach is to use natural language parsing and processing to “read knowledge,” essentially taking information from textbooks and the web and storing it in a computable form for use later. They then need to turn the questions and answer options into an input the machine can “understand,” and even use computer vision techniques to try to work on the diagrams.
"Understanding the question would help you tremendously in getting a great score."
“Finally, we have an array of solvers that use different statistical and inference or reasoning techniques to select the right answer,” explained Schoenick. “Sometimes these groups of solvers kind of vote together and support each other in the confidence that a given answer is right.”
The science question challenge isn’t intended as a Turing Test; the machines don’t have to act human so long as they get the right answers. I asked if there could nevertheless be a way to “cheat” the test and perform well but via spurious means—a common criticism of other AI tests. Could a system win without really knowing what it was doing?
“I think really understanding the question would help you tremendously in getting a great score,” said Schoenick, but conceded that, “You could achieve kind of a low baseline score with standard text look-up techniques and basic algorithms used over large corpora of texts.”
“That’s not necessarily a bad thing. All of these techniques can be combined to eventually produce a system that can both reason and have confidence in answers based on several techniques like that.”
The contest welcomes all approaches, even if it’s not “true AI.” After all, as Schoenick pointed out, even students will use a certain amount of educated guessing to get results.