Advertisement

SKIP ADVERTISEMENT

When A.I. Chatbots Hallucinate

Karen Weise and

Karen Weise reported this story from Seattle and Cade Metz reported from San Francisco.

When did The New York Times first report on “artificial intelligence”?

According to ChatGPT, it was July 10, 1956, in an article titled “Machines Will Be Capable of Learning, Solving Problems, Scientists Predict” about a seminal conference at Dartmouth College. The chatbot added:

ChatGPT

This conference is now considered to be the birthplace of artificial intelligence as a field of study, and the article mentions the term “artificial intelligence” several times in the context of the discussions and presentations that took place at the conference.

undefined

The 1956 conference was real. The article was not. ChatGPT simply made it up. ChatGPT doesn’t just get things wrong at times, it can fabricate information. Names and dates. Medical explanations. The plots of books. Internet addresses. Even historical events that never happened.

When ChatGPT was recently asked how James Joyce and Vladimir Lenin first met — an encounter that has never been confirmed — this is how it responded:

ChatGPT

James Joyce and Vladimir Lenin met in Zurich, Switzerland in 1916. Both men were living in exile in Zurich during World War I. Joyce was a writer and Lenin was a revolutionary. They met at the Cafe Odéon, a popular gathering place for artists and intellectuals in Zurich.

The response above has been truncated for length.

undefined

Fabrications and definitive statements on uncertain history like these are common. Figuring out why chatbots make things up and how to solve the problem has become one of the most pressing issues facing researchers as the tech industry races toward the development of new A.I. systems.

Chatbots like ChatGPT are used by hundreds of millions of people for an increasingly wide array of tasks, including email services, online tutors and search engines. And they could change the way people interact with information. But there is no way of ensuring that these systems produce information that is accurate.

The technology, called generative A.I., relies on a complex algorithm that analyzes the way humans put words together on the internet. It does not decide what is true and what is not. That uncertainty has raised concerns about the reliability of this new kind of artificial intelligence and calls into question how useful it can be until the issue is solved or controlled.

The tech industry often refers to the inaccuracies as “hallucinations.” But to some researchers, “hallucinations” is too much of a euphemism. Even researchers within tech companies worry that people will rely too heavily on these systems for medical and legal advice and other information they use to make daily decisions.

“If you don’t know an answer to a question already, I would not give the question to one of these systems,” said Subbarao Kambhampati, a professor and researcher of artificial intelligence at Arizona State University.

ChatGPT wasn’t alone in erring on the first reference to A.I. in The Times. Google’s Bard and Microsoft’s Bing chatbots both repeatedly provided inaccurate answers to the same question. Though false, the answers seemed plausible as they blurred and conflated people, events and ideas.

Google’s Bard said:

Google’s Bard

The earliest mention of artificial intelligence (AI) in The New York Times was in 1956, when the term was coined by John McCarthy 1 at a workshop at Dartmouth College. The workshop brought together some of the brightest minds in computer science to discuss the possibility of creating machines that could think like humans.

The Times reported on the workshop in a front-page article,2 calling it “a new field of science” 3 that was “aimed at creating machines that can think like humans.” The article quoted McCarthy as saying 4 that the goal of AI was to “build machines with minds, capable of learning and adapting to new situations.”

New York Times Analysis
1

Mr. McCarthy is credited with creating the term artificial intelligence in 1955, in a proposal for the Dartmouth conference.

2

The Times did not cover the Dartmouth workshop at all, let alone on the front page.

3

More recent online articles refer to the conference as starting “a new field of science.

4

The Times could not find a reference to Mr. McCarthy ever saying this phrase, though it includes common ways to describe A.I.

undefined

Microsoft’s Bing cited its findings to a realistic-looking web address on The Times’s website:

Microsoft’s Bing

Based on web search results, the earliest mention of artificial intelligence in the New York Times seems to be from a book review published on March 5, 1956. 1 The book is called “The Human Use of Human Beings” 2 by Norbert Wiener, and the reviewer is John R. Pierce. 3 The review mentions artificial intelligence as one of the topics covered by Wiener, who is described as “a pioneer in cybernetics”.: https://www.nytimes.com/1956/03/05/archives/the-human-use-of-human-beings-cybernetics-and-society-by-norbert.html 4

New York Times Analysis
1

The Times published a review of a book by Mr. Wiener one day earlier, on March 4, 1956.

2

This seminal book was published in 1950, not 1956. The March 4 review was of Mr. Wiener’s book “I Am A Mathematician: The Later Life of a Prodigy.”

3

The March 4 review was not written by John R. Pierce, a noted electrical engineer. It was written by John G. Kemeny, a mathematician.

4

The format of this URL is consistent with Times archives for articles in this period, but this exact webpage never existed.

undefined

According to The Times’s archives, all the chatbots were wrong. They cited articles that did not exist. And while coverage of early research on thinking machines dated to the 1930s, it wasn’t until 1963 that The Times first published an article with the phrase “artificial intelligence.”

“We released Bard as an experiment and want to be as transparent as possible about well documented limitations,” Jennifer Rodstrom, a spokeswoman for Google, said. “These are top of mind for us as we continue to fine tune Bard.”

Like Google, Microsoft and OpenAI say they are working to reduce hallucinations.

The new AI. systems are “built to be persuasive, not truthful,” an internal Microsoft document said. “This means that outputs can look very realistic but include statements that aren’t true.”

The chatbots are driven by a technology called a large language model, or L.L.M., which learns its skills by analyzing massive amounts of digital text culled from the internet.

By pinpointing patterns in that data, an L.L.M. learns to do one thing in particular: guess the next word in a sequence of words. It acts like a powerful version of an autocomplete tool. Given the sequence “The New York Times is a ____,” it might guess “newspaper.”

Because the internet is filled with untruthful information, the technology learns to repeat the same untruths. And sometimes the chatbots make things up. They produce new text, combining billions of patterns in unexpected ways. This means even if they learned solely from text that is accurate, they may still generate something that is not.

Because these systems learn from more data than humans could ever analyze, even A.I. experts cannot understand why they generate a particular sequence of text at a given moment. And if you ask the same question twice, they can generate different text.

That compounds the challenges of fact-checking and improving the results.

Bard said in one chat:

Google’s Bard

The earliest mention of artificial intelligence in the New York Times was in a February 19, 1950 article titled “Thinking Machines.”1 The article, by Walter Sullivan,2 reported on a meeting of the American Association for the Advancement of Science,3 where a number of scientists discussed the possibility of creating machines that could think. Sullivan wrote that “the possibility of creating machines that can think is no longer a remote one.”4 He quoted one scientist as saying that “we are on the threshold of a new era in which machines will be able to do many of the things that we now do ourselves.”

The response above has been truncated for length.
New York Times Analysis
1

The Times published an article in November 1950 called “‘Revolution’ is Seen in ‘Thinking Machines.’”

2

Mr. Sullivan was a long-time science reporter at The Times.

3

The November 1950 article was about the meeting of a different group, the Society for the Advancement of Management.

4

This phrase does not appear in The Times archives from that era.

undefined

Then Bard said in another chat:

Google’s Bard

The earliest mention of artificial intelligence in the New York Times was in a 1955 article titled “Machine That Thinks”. The article described a computer program that could play checkers 1 at a human level. The program, developed by Arthur Samuel, was able to learn from its mistakes and improve its play over time. The article was written 2 by Herbert Goldstine, 3 a mathematician who worked on the Manhattan Project...The article was published on July 21, 1955, and it was met with mixed reactions

New York Times Analysis
1

Mr. Samuel started programming his checkers player in 1952, and it debuted on television in 1956.

2

News articles in The Times are written by journalists, not subjects of the news.

3

Herbert Goldstine appears to be an inaccurate reference to Herman Goldstine, a noted computer scientist.

undefined

Companies like OpenAI, Google and Microsoft have developed ways to improve the accuracy. OpenAI, for instance, tries to refine the technology with feedback from human testers.

As people test ChatGPT, they rate the chatbot’s responses, separating useful and truthful answers from those that are not. Then, using a technique called reinforcement learning, the system spends weeks analyzing the ratings to better understand what it is fact versus fiction.

A newer version of ChatGPT called ChatGPT Plus, which is available for a $20 monthly subscription, consistently avoided answering the question about the first mention of artificial intelligence in The Times. This could be the result of reinforcement learning or other changes to the system applied by OpenAI.

ChatGPT Plus

I am unable to browse the internet or access specific archives directly. However, I can provide you with some general historical context on the topic of artificial intelligence.

The New York Times, being a prominent newspaper, likely covered the topic soon after its inception, with articles in the late 1950s or early 1960s. Early mentions of artificial intelligence in the paper would have discussed the hopes and challenges of AI research, as well as the potential impact of AI on society.

undefined

Microsoft built its Bing chatbot on top of OpenAI’s underlying technology, called GPT-4, and has layered on other ways to improve accuracy. The company uses GPT-4 to compare the chatbot’s responses with the underlying data and rate how the model is performing. In other words, Microsoft uses the A.I. to make the A.I. better.

The company also tries to improve the chatbot’s responses with help from its traditional internet search engine. When you type a query into the Bing chatbot, Microsoft runs an internet search on the same subject and then folds the results into the query before sending it on to the bot. By editing the query, said Sarah Bird, a leader in Microsoft’s responsible A.I. efforts, the company can push the system to produce better results.

Google uses similar methods to improve the accuracy of its Bard chatbot. It uses human feedback to hone the system’s behavior, and it “grounds” the system using information from the company’s search engine, said Eli Collins, a vice president of research at Google.

Microsoft does not check the bot’s responses for accuracy in real time, Ms. Bird said, though it is researching how to do that. It checks the accuracy of a small portion of results after the fact and then uses that analysis.

But becoming more accurate may also have a downside, according to a recent research paper from OpenAI. If chatbots become more reliable, users may become too trusting.

“Counterintuitively, hallucinations can become more dangerous as models become more truthful, as users build trust in the model when it provides truthful information in areas where they have some familiarity,” the paper said.

Steve Lohr and Nico Grant contributed reporting. Jack Begg and Susan C. Beachy contributed research.

A correction was made on 
May 9, 2023

An earlier version of this article referred imprecisely to ChatGPT’s response to a question about a meeting between James Joyce and Vladimir Lenin. Its response about the men having met in a cafe may have been based on accounts of such a meeting, including a fictional one, but the possible meeting at the Cafe Odéon was not entirely fabricated by ChatGPT.

How we handle corrections

Karen Weise is a technology correspondent based in Seattle, covering Amazon and Microsoft. Her work aims to help readers better understand two of the most powerful companies in America and their growing influence on society. More about Karen Weise

Cade Metz is a technology reporter and the author of “Genius Makers: The Mavericks Who Brought A.I. to Google, Facebook, and The World.” He covers artificial intelligence, driverless cars, robotics, virtual reality and other emerging areas. More about Cade Metz

A version of this article appears in print on  , Section B, Page 4 of the New York edition with the headline: When Chatbots ‘Hallucinate’. Order Reprints | Today’s Paper | Subscribe

Advertisement

SKIP ADVERTISEMENT