You Can’t Trust Everything Generative AI Tells You

Posted on

Marietta is a small city of around 13,000 residents situated along the Ohio River, featuring charming boutique stores, intriguing local history museums, and stunning views of the nearby Appalachian foothills. It has been about a year since my last visit, so I should probably think about going back. I consulted ChatGPT for a recommendation on the best Thai restaurant in the city—Thai Taste. The issue? That restaurant is located in Marietta, Georgia. The Marietta in Ohio lacks a Thai restaurant.

The inquiry about the “best Thai restaurant in this small town” was a casual example in a discussion I had with Katy Pearce, an associate professor at the University of Washington who is also part of the UW Center for an Informed Public. As examples go, this one is rather minor. There are other excellent eateries in Marietta, Ohio, and a Thai restaurant nearby in Parkersburg, West Virginia. However, this situation highlights a more significant problem: When using an AI chatbot as a search tool, you might receive a confident yet profoundly unhelpful answer. According to Pearce, much like golden retrievers, chatbots “really want to please you.”

Large language models (LLMs) like OpenAI’s ChatGPT, Anthropic’s Claude, and Google’s Gemini are increasingly becoming primary sources for obtaining information about the world. They are replacing traditional search engines as the first and often sole destination for many people when they have a question. Technology firms are swiftly incorporating generative AI into search engine results, news feed summaries, and various other information sources. This trend makes it increasingly vital for us to learn how to discern when AI is providing good information and when it is not, as well as to approach everything from AI with the appropriate level of skepticism.

In some instances, using AI instead of a search engine can be beneficial. Pearce mentioned that she frequently turns to an LLM for low-stakes questions where there is likely enough information available on the internet and within the model’s training data to ensure a satisfactory response. Recommendations for restaurants or simple home improvement advice, for instance, can generally be reliable.

Other scenarios, however, are riskier. Depending on a chatbot for information regarding your health, finances, news, or politics could lead to serious repercussions, given the possibility of erroneous assertions.

“It gives you this authoritative, confident answer, and that is the danger in these things,” stated Alex Mahadevan, the director of the MediaWise media literacy program at the Poynter Institute. “It’s not always correct.”

Here are some things to be aware of and tips on how to verify what you come across when you query generative AI.

Reasons you can’t always rely on AI

To grasp why chatbots occasionally provide inaccurate information or fabricate answers, it is useful to understand a bit about their functioning. LLMs are trained on vast quantities of data and attempt to calculate the likelihood of what follows based on user input. They are considered “prediction machines” that generate outputs based on the probability of correctly responding to your inquiries, according to Mahadevan. They aim to provide an answer that has the greatest statistical chance of being correct, rather than verifying what the actual correct answer is.

When I inquired about a Thai restaurant in a town named Marietta, the LLM could not pinpoint one that satisfied my specific criteria—yet it located one in another town with the same name, nearly 600 miles to the south. In the LLM’s internal computations, the likelihood of that being the answer I sought may be higher than the probability that I asked for something non-existent.

This reflects Pearce’s “golden retriever problem.” Similar to the beloved dog breed, the AI tool is adhering to its training by striving to make you satisfied.

When an LLM fails to identify the precise best answer for your query, it may provide a response that sounds correct but is not. Alternatively, it could fabricate an answer that seems plausible. This is particularly problematic, as certain errors, known as hallucinations, are quite evident. You might easily recognize that applying glue to your pizza to make the cheese stick is a bad idea. Others can be subtler, such as when a model invents fictitious citations and attributes them to real authors.

“If the tool lacks the information, it fabricates something new,” Pearce stated. “That new information could potentially be entirely inaccurate.”

The training data of these models is also crucial when it comes to precision. A lot of these large systems were trained on nearly the entire internet, which contains both credible, fact-based content and arbitrary statements made in online forums.

Humans also create false information. However, a person can generally provide a reliable source for their information, or detail their calculation methods. An LLM, even if it attempts to reference its sources, might not be capable of showing an accurate trail of evidence.

So, how can you determine what is trustworthy?

Understand the consequences

The precision of what you receive from a chatbot or other generative AI tool might not be significant, but it could be extremely important. Recognizing the consequences of acting on unreliable information is essential for making informed choices, Pearce noted.

“When individuals use generative AI tools to acquire information that they assume is based on factual reality, the first consideration should be: What are the consequences?” she remarked.

Seeking recommendations for a music playlist? Low stakes. If a chatbot invents an Elton John song, it’s not worth losing sleep over.

However, the stakes become greater concerning your healthcare, financial choices, or the quest for reliable news and information about the world.

Bear in mind the types of data used to train the models. For health-related inquiries, remember that while the training data may have included medical journals, it might also contain unscientific social media posts and threads on message boards.

Always verify any information that could influence significant decisions—consider aspects that might impact your finances or well-being. The data driving those choices requires closer examination, and the propensity of generative AI to mix facts or fabricate information should raise concerns before making a move.

“If it’s something that requires factual accuracy, you should definitely verify it multiple times,” Pearce emphasized.

Generative AI alters how you validate information online

Poynter’s MediaWise program has promoted media literacy long before the arrival of ChatGPT at the end of 2022. Pre-generative AI, the main advice was to evaluate the source, Mahadevan mentioned. If you encounter something on Facebook or in a Google search stating that a celebrity or politician has passed away, you could look into the credibility of the source disseminating that information to assess reliability. If a major news outlet reports it, it’s likely trusted. If the only source is your cousin’s neighbor’s ex-husband? Perhaps not.

Although this advice is straightforward, many individuals disregard or misunderstand it. “People have consistently struggled to assess information online,” Mahadevan remarked.

For AI, this guidance is not as effective as it once was. The responses from a chatbot may often lack context. You pose a question to it, and it provides an answer, leading you to instinctively trust the AI model as the authority. This differs from a traditional Google search or social media post, where the source is at least somewhat visible. Some chatbots do provide references, but frequently, you receive an answer without any citations.

“With [chatbots], we don’t really know the origin of the information,” Mahadevan noted.

As a result, you may need to research elsewhere to uncover the original sources of information. Generative AI, similar to social media, serves as a medium for sharing that information rather than being a direct source. Just as the origin of a social media post is significant, so too is the foundational source of any information you obtain from a chatbot.

How to obtain the truth (or something more accurate) from AI

The primary method for ensuring that you receive reliable information online remains unchanged: verify with multiple trustworthy sources.

However, when engaging with generative AI, here are several ways to enhance the quality of the information you receive:

Utilize a tool that offers citations

A number of chatbots and generative AI models today will include citations in their responses, although you may need to request them or enable a specific setting. This capability is also present in other AI applications, such as Google’s AI Overviews within search results.

The existence of citations is a positive sign — that response could be more dependable than one lacking citations — but the model may fabricate sources or misinterpret what the source conveys. Especially when the stakes are high, you likely want to click the links and verify that the summary you were given is correct.

Inquire about the AI’s level of confidence

You can receive a better response from a chatbot by thoughtfully crafting your prompt. One effective approach is to ask for confidence levels. For example: “Tell me when the first iPhone was released. Also, share how confident you are in your answer.”

You will still need to apply skepticism when evaluating the response. If the chatbot states that the first iPhone was launched in 1066, you will probably have doubts, even if it expresses complete confidence.

Mahadevan suggests recognizing the gap between your chat window and the source of the information: “You need to assume it’s being communicated to you secondhand,” he remarked.

Avoid only posing a simple question

Including phrases such as, “indicate your confidence level,” “supply links to your sources,” “present alternative viewpoints,” “only utilize information from authoritative sources,” and “thoroughly assess your information” can accumulate when you’re formulating your prompt. Pearce indicated that effective prompts are lengthy — a paragraph or more to pose a question.

You can also request it to assume a specific role to receive answers in the desired tone or perspective.

“When crafting your prompts, incorporating all these specifications is crucial,” she noted.

Provide your own data

Large language models may particularly struggle when retrieving information from their training data or from their searches for online information. However, if you supply the documents, they can perform more effectively.

Both Mahadevan and Pearce mentioned that they have experienced success with generative AI tools when summarizing or extracting insights from extensive documents or datasets.

Pearce explained that while she was looking for a car, she provided ChatGPT with all the relevant information she wanted it to consider — PDFs of listings, Carfax reports, etc. — and requested it to concentrate solely on those specific vehicles. It provided a comprehensive and detailed analysis of the cars. What it did not do was suggest random other vehicles it found online, which was her intention to avoid.

“I had to supply all that data to it,” she explained. “If I had merely said to search on any used car dealership’s website, the response wouldn’t have been as thorough.”

Employ AI as a preliminary resource

Mahadevan likened today’s generative AI to how people once regarded the reliability of Wikipedia many years ago. Numerous individuals were doubtful about Wikipedia’s accuracy due to its freely editable nature. However, one advantage of the open encyclopedia is that its sources are easily accessible. You might start with a Wikipedia entry and end up reviewing the articles or documents that the entry references, obtaining a more complete understanding of what you were investigating. Mahadevan refers to this approach as “reading upstream.”

Leave a Reply

Your email address will not be published. Required fields are marked *