Why IBM’s dazzling Watson supercomputer was a lousy tutor

In the annals of artificial intelligence, Feb. 16, 2011, was a watershed moment.

That day, IBM’s Watson supercomputer finished off a three-game shellacking of Jeopardy! champions Ken Jennings and Brad Rutter. Trailing by over $30,000, Jennings, now the show’s host, wrote out his Final Jeopardy answer in mock resignation: “I, for one, welcome our computer overlords.”

A lark to some, the experience galvanized Satya Nitta, a longtime computer researcher at IBM’s Watson Research Center in Yorktown Heights, New York. Tasked with figuring out how to apply the supercomputer’s powers to education, he soon envisioned tackling ed tech’s most sought-after challenge: the world’s first tutoring system driven by artificial intelligence. It would offer truly personalized instruction to any child with a laptop — no human required.

“I felt that they’re ready to do something very grand in the space,” he said in an interview.

Nitta persuaded his bosses to throw more than $100 million at the effort, bringing together 130 technologists, including 30 to 40 Ph.D.s, across research labs on four continents.

But by 2017, the tutoring moonshot was essentially dead, and Nitta had concluded that effective, long-term, one-on-one tutoring is “a terrible use of AI — and that remains today.”

For all its jaw-dropping power, Watson the computer overlord was a weak teacher. It couldn’t engage or motivate kids, inspire them to reach new heights or even keep them focused on the material — all qualities of the best mentors.

It’s a finding with some resonance to our current moment of AI-inspired doomscrolling about the future of humanity in a world of ascendant machines. “There are some things AI is actually very good for,” Nitta said, “but it’s not great as a replacement for humans.”

His five-year journey to essentially a dead-end could also prove instructive as ChatGPT and other programs like it fuel a renewed, multimillion-dollar experiment to, in essence, prove him wrong.

Some of the leading lights of ed tech, from Google to Microsoft, are trying to pick up where Watson left off, offering AI tools that promise to help teach students. Sal Khan, founder of Khan Academy, last year said AI has the potential to bring “probably the biggest positive transformation” that education has ever seen. He wants to give “every student on the planet an artificially intelligent but amazing personal tutor.”

A 25-year journey

To be sure, research on high-dosage, one-on-one, in-person tutoring is unequivocal: It’s one of the most powerful interventions available, offering significant improvement in students’ academic performance, particularly in subjects like math, reading and writing.

But traditional tutoring is also “breathtakingly expensive and hard to scale,” said Paige Johnson, a vice president of education at Microsoft. One school district in West Texas, for example, recently spent more than $5.6 million in federal pandemic relief funds to tutor 6,000 students. The expense, Johnson said, puts it out of reach for most parents and school districts.

For IBM, the opportunity to rebalance the equation in kids’ favor was hard to resist.

The Watson lab is legendary in the computer science field, with six Nobel laureates and six Turing Award winners among its ranks. It’s where modern speech recognition was invented, and home to countless other innovations such as barcodes and the magnetic stripes on credit cards that make ATMs possible. It’s also where, in 1997, Deep Blue beat world chess champion Garry Kasparov, essentially inventing the notion that AI could “think” like a person.

The heady atmosphere, Nitta recalled, inspired “a very deep responsibility to do something significant and not something trivial.”

Within a few years of Watson’s victory, Nitta, who had arrived in 2000 as a chip technologist, rose to become IBM Research’s global head of AI solutions for learning. For the Watson project, he said, “I was just given a very open-ended responsibility: Take Watson and do something with it in education.”

Nitta spent a year simply reading up on how learning works. He studied cognitive science, neuroscience and the decades-long history of “intelligent tutoring systems” in academia. Foremost in his reading list was the research of Stanford neuroscientist Vinod Menon, who’d put elementary schoolers through a 12-week math tutoring session, collecting before-and-after scans of their brains using an MRI. Tutoring, he found, produced nothing less than an increase in neural connectivity.

Nitta returned to his bosses with the idea of an AI-powered cognitive tutor. “There’s something I can do here that’s very compelling,” he recalled saying, “that can broadly transform learning itself. But it’s a 25-year journey. It’s not a two-, three-, four-year journey.”

IBM drafted two of the highest-profile partners possible in education: the children’s media powerhouse Sesame Workshop and Pearson, the international publisher.

One product Sesame envisioned was a voice-activated Elmo doll that would serve as a kind of digital tutoring companion, interacting fully with children. Through brief conversations, it would assess their skills and provide spoken responses to help kids advance.

Meanwhile, Pearson promised that it could soon allow college students to “dialogue with Watson in real time.”

Nitta’s team began designing lessons and putting them in front of students — both in classrooms and in the lab. In order to nurture a back-and-forth between student and machine, they didn’t simply present kids with multiple-choice questions, instead asking them to write responses in their own words.

It didn’t go well.

Some students engaged with the chatbot, Nitta said. “Other students were just saying, ‘IDK’ [I don’t know]. So they simply weren’t responding.” Even those who did began giving shorter and shorter answers.

Nitta and his team concluded that a cold reality lay at the heart of the problem: For all its power, Watson was not very engaging. Perhaps as a result, it also showed “little to no discernible impact” on learning. It wasn’t just dull; it was ineffective.

**Satya Nitta** (left) and part of his team at IBM’s Watson Research Center, which spent five years trying to create an AI-powered interactive tutor using the Watson supercomputer. [Photo: The 74]

“Human conversation is very rich,” he said. “In the back and forth between two people, I’m watching the evolution of your own worldview.” The tutor influences the student — and vice versa. “There’s this very shared understanding of the evolution of discourse that’s very profound, actually. I just don’t know how you can do that with a soulless bot. And I’m a guy who works in AI.”

When students’ usage time dropped, “we had to be very honest about that,” Nitta said. “And so we basically started saying, ‘OK, I don’t think this is actually correct. I don’t think this idea — that an intelligent tutoring system will tutor all kids, everywhere, all the time — is correct.”

‘We missed something important’

IBM soon switched gears, debuting another crowd-pleasing Watson variation — this time, a touching throwback: It engaged in ">Oxford-style debates. In a televised demonstration in 2019, it went up against debate champ Harish Natarajan on the topic “Should we subsidize preschools?” Among its arguments for funding, the supercomputer offered, without a whiff of irony, that good preschools can prevent “future crime.” Its current iteration, Watsonx focuses on helping businesses build AI applications like “intelligent customer care.”

Nitta left IBM, eventually taking several colleagues with him to create a startup called Merlyn Mind. It uses voice-activated AI to safely help teachers do workaday tasks such as updating digital gradebooks, opening PowerPoint presentations and emailing students and parents.

Thirteen years after Watson’s stratospheric Jeopardy! victory and more than one year into the Age of ChatGPT, Nitta’s expectations about AI couldn’t be more down-to-earth: His AI powers what’s basically “a carefully designed assistant” to fit into the flow of a teacher’s day.

To be sure, AI can do sophisticated things such as generating quizzes from a class reading and editing student writing. But the idea that a machine or a chatbot can actually teach as a human can, he said, represents “a profound misunderstanding of what AI is actually capable of.”

Nitta, who still holds deep respect for the Watson lab, admits, “We missed something important. At the heart of education, at the heart of any learning, is engagement. And that’s kind of the Holy Grail.”

These notions aren’t news to those who do tutoring for a living. Varsity Tutors, which offers live and online tutoring in 500 school districts, relies on AI to power a lesson plan creator that helps personalize instruction. But when it comes to the actual tutoring, humans deliver it, said Anthony Salcito, chief institution officer at Nerdy, which operates Varsity.

”The AI isn’t far enough along yet to do things like facial recognition and understanding of student focus,” said Salcito, who spent 15 years at Microsoft, most of them as vice president of worldwide education. “One of the things that we hear from teachers is that the students love their tutors. I’m not sure we’re at a point where students are going to love an AI agent.”

The No. 1 factor in a student’s tutoring success is simply showing up consistently, research suggests. As smart and efficient as an AI chatbot might be, it’s an open question whether most students, especially struggling ones, would show up for an inanimate agent or develop a sense of respect for its time.

When Salcito thinks about what AI bots now do in education, he’s not impressed. Most, he said, “aren’t going far enough to really rethink how learning can take place.” They end up simply as fast, spiffed-up search engines.

In most cases, he said, the power of one-on-one, in-person tutoring often emerges as students begin to develop more honesty about their abilities, advocate for themselves and, in a word, demand more of school. “In the classroom, a student may say they understand a problem. But they come clean to the tutor, where they expose, ‘Hey, I need help.’”

Cognitive science suggests that for students who aren’t motivated or who are uncertain about a topic, only one-on-one attention will help. That requires a focused, caring human, watching carefully, asking tons of questions and reading students’ cues.

Jeremy Roschelle, a learning scientist and executive director of Digital Promise, a federally funded research center, said usage with most ed tech products tends to drop off. “Kids get a little bored with it. It’s not unique to tutors. There’s a newness factor for students. They want the next new thing.”

Even now, Nitta points out, research shows that big commercial AI applications don’t seem to hold users’ attention as well as top entertainment and social media sites like YouTube, Instagram and TikTok. One recent analysis dubbed the user engagement of sites like ChatGPT “lackluster,” finding that the proportion of monthly active users who engage with them in a single day was only about 14%, suggesting that such sites aren’t very “sticky” for most users.

For social media sites, by contrast, it’s between 60% and 65%.

One notable AI exception: Character.ai, an app that allows users to create companions of their own among figures from history and fiction and chat with the likes of Socrates and Bart Simpson. It has a stickiness score of 41%.

As startups like Synthesis offer “your child’s superhuman tutor,” starting at $29 per month, and Khan Academy publicly tests its popular Khanmigo AI tool, Nitta maintains that there’s little evidence from learning science that, absent a strong outside motivation, people will spend enough time with a chatbot to master a topic.

“We are a very deeply social species,” said Nitta, “and we learn from each other.”

IBM declined to comment on its work in AI and education, as did Sesame Workshop. A Pearson spokesman said that since last fall it has been beta-testing AI study tools keyed to its e-textbooks, among other efforts, with plans this spring to expand the number of titles covered.

Getting ‘unstuck’

IBM’s experiences notwithstanding, the search for an AI tutor has continued apace, this time with more players than just a legacy research lab in suburban New York. Using the latest affordances of so-called large language models, or LLMs, technologists at Khan Academy believe they are finally making the first halting steps in the direction of an effective AI tutor.

Kristen DiCerbo remembers the moment her mind began to change about AI.

It was September 2022, and she’d only been at Khan Academy for a year-and-a-half when she and founder Khan got access to a beta version of ChatGPT. Open AI, ChatGPT’s creator, had asked Microsoft co-founder Bill Gates for more funding, but he told them not to come back until the chatbot could pass an Advanced Placement biology exam.

So Open AI queried Khan for sample AP biology questions. He and DiCerbo said they’d help in exchange for a peek at the bot — and a chance to work with the startup. They were among the first people outside of Open AI to get their hands on GPT-4, the LLM that powers the upgraded version of ChatGPT. They were able to test out the AI and, in the process, become amateur AI prompt engineers before anyone had even heard of the term.

Like many users typing in queries in those first heady days, the pair initially just marveled at the sophistication of the tool and its ability to return what felt, for all the world, like personalized answers. With DiCerbo working from her home in Phoenix and Khan from the nonprofit’s Silicon Valley office, they traded messages via Slack.

**Kristen DiCerbo** introduces users to Khanmigo in a Khan Academy promotional video. [Screenshot: YouTube]

“We spent a couple of days just going back and forth, Sal and I, going, ‘Oh my gosh, look what we did! Oh my gosh, look what it’s saying — this is crazy!’” she told an audience during a recent appearance at the University of Notre Dame.

She recounted asking the AI to help write a mystery story in which shoes go missing in an apartment complex. In the back of her mind, DiCerbo said, she planned to make a dog the shoe thief, but didn’t reveal that to ChatGPT. “I started writing it, and it did the reveal,” she recalled. “It knew that I was thinking it was going to be a dog that did this, from just the little clues I was planting along the way.”

More tellingly, it seemed to do something Watson never could: have engaging conversations with students.

DiCerbo recounted talking to a high school student they were working with w

Utworzony 10d | 26 kwi 2024, 09:50:05

Zaloguj się, aby dodać komentarz

Inne posty w tej grupie

Warren Buffett cautions shareholders AI scams could be the ‘growth industry of all time’

Warren Buffett cautioned the tens of thousands of shareholders who packed an arena for his annual meeting that

6 maj 2024, 16:20:07 | Fast company - tech

Is giving Elon Musk a $56 billion compensation package—the most in history—the ‘fair’ thing to do?

Over the past couple of weeks, Tesla shareholders have been casting their ballots in one of the odder shareholder votes in corporate history: a vote to reinstate

6 maj 2024, 16:20:05 | Fast company - tech

Former Trump strategist Brad Parscale wants to use his AI tools to give his candidacy the edge in 2024

Donald Trump’s former campaign manager looked squarely into the camera and promised his viewers they were about to witness a bold new era in

6 maj 2024, 16:20:03 | Fast company - tech

A lawsuit against Meta wants to give users an external tool to control their Facebook feed

Do social media users have the right to control what they see — or don’t see — on their feeds?

6 maj 2024, 14:10:03 | Fast company - tech

Want a job in AI? Start at your local community college

Josh Sinnott graduated from Arizona State University in 2017 with a bachelor’s degree in psychology and a minor in business. He worked various jobs, including a bakery, Charles Schwab, and a

6 maj 2024, 11:40:07 | Fast company - tech

The book of Bitwarden: How the best free password manager is fending off tech giants

Kyle Spearrin had never developed a mobile app or browser extension when he started building Bitwarden as a fun side project in 2015.

Nearly nine years later, Spearrin’s humble att

6 maj 2024, 11:40:06 | Fast company - tech

3 hidden but powerful Microsoft Edge time savers

You’d be forgiven for using your favorite web browser day in and day out without really delving too deeply into its more esoteric features—especially since Microsoft Edge

6 maj 2024, 04:50:05 | Fast company - tech

Tomas_r2