The capabilities and limitations of LLMs

Abstract

Can artificial intelligence understand nuance and context? What other issues may AI have that we may not even be aware of yet? In this article, Mayank Kejriwal shares why we must be mindful of both what LLMs can’t do and what they do that is undesirable.

AI, data manipulation and the future of tech

Editor’s note: Mayank Kejriwal is a research assistant professor and principal scientist in the University of Southern California's Viterbi School of Engineering.

Large language models (LLMs) have brought about advances in artificial intelligence that few researchers thought possible before, at least so soon. ChatGPT set records for its fastest-growing user base in internet history a mere two months after its release to the public. Now there’s frequent talk about the inevitability of artificial general intelligence, where machines have the same types of abilities as humans. Before we get too excited though, we must be mindful of both what LLMs can’t do and what they do that is undesirable.

A lot has been written about both issues, and I won’t attempt to summarize all the research. Instead, I focus instead on one problem that LLMs have now been widely documented to have – making up information that is not true – and that some pioneers in the field have suggested might be inevitable in the current generation of LLMs – barring significant changes in the technology itself that could be on the horizon but haven’t arrived yet.

Fittingly, researchers like to colorfully say that LLMs hallucinate when they make up information. The choice of language is not accidental. Hallucinations can include simple factual mistakes that even a human might make – e.g., that Vladimir Lenin and James Joyce met in Zurich (unconfirmed, and unlikely) – but in many cases, are disturbing, even bizarre, fabrications that can have severe, unintended consequences. A recent instance of a hallucination befell Michael Cohen, Donald Trump’s former "fixer," who cited fake cases in court that another LLM – Google’s Bard (now, Gemini) – allegedly produced. According to reporting by the New York Times, Cohen was forced to ask the judge for “discretion and mercy” and claimed ignorance about the fact that LLMs could produce artifacts that “looked real but actually were not.”

Understanding what LLMs can (and can’t) do

These problems above raise the disturbing question of how much artificial intelligence can understand nuance and context, and what other issues AI may have that we may not even be aware of yet. For example, do LLMs have common sense – something we humans take for granted – and are they interpreting our words in the way that we mean them to? Did Gemini even know that it was producing fake cases? Whatever way we choose to splice it, the answer is not good.

Researchers like myself strongly believe that we only have limited understanding of what these models can and can’t do, and while the same could be said of our own brains, we have centuries of social norms and psychology to rely on (not to mention everyday experience) when we deal with human behavior.

Fortunately for Gemini, Michael Cohen was the center of his own political news and the model itself ended up playing a secondary, if not innocuous, role in the whole saga. Unfortunately, the controversy over Gemini is far from over. In February 2024, Google had to halt its image generation abilities after Gemini generated images of individuals ranging from the U.S. founding fathers to Nazi-era soldiers as people of color. One reason that this news caught fire is that it shows how models can be made to overcorrect for racial bias issues in AI models that we have known about for a while now. To my knowledge, this is the first time we have seen such overcorrection occur with a major model from the likes of Google, and it sparked enormous debate and outrage. Google CEO Sundar Pichai was soon forced to issue a memo calling Gemini’s responses “completely unacceptable,” and the conservative media had a field day.

A look at algorithms and human manipulation of data

An even more disturbing issue is human manipulation of the data that is being fed into LLMs. Gemini did not learn to produce such skewed images on its own. There was a deliberate human hand, so to speak. We should always keep in mind that this hand is unelected and unaccountable because no one person is really to blame here. Following Pichai’s memo and a handful of blog posts from some other Google executives, the company’s main response was to “temporarily” pause image generation of people while seeking to improve it. There are sparse details on why the problem occurred in the first place and what human involvement was. One could argue that’s the core of the problem: that we don’t know how humans contributed to the problem in the first place and where the motivation for this overcorrection arose.

More recently, Google tried to safeguard the model by further restricting Gemini from answering certain types of election-related questions. The policy has already been implemented in India, which will be holding general elections in April. Not all questions about elections or political parties are off-limits, as the BBC showed in its own tests when reporting.

Google’s Gemini AI problem ultimately shows that ordinary people are not irrational in their suspicions of big tech and its algorithms. The recent ban of TikTok in the House is just another plotline in that saga. Gemini’s training was skewed by humans, and the question we should be asking is: where else is big tech overplaying its influence? Is regulation – such as the EU’s recent AI act – really the answer here, or just a red herring? What else is changing behind the curtain, not just in the models themselves, but in the processes in place for training and vetting them? Are fears of viewpoint discrimination and censorship as real as members of both the progressive left and right frequently claim, depending on the issue at hand?

Compliance, regulation and the future of tech

The solution to Google’s Gemini AI problem, if we may label it that, is not technological but social. If we want the public to trust these systems, we need greater transparency from the entities building them, but we also have to be careful in how we force such transparency through regulation. We don’t want to stifle innovation through excessive compliance and regulatory burdens, and we have to be mindful about history itself – previous attempts to control the proverbial technology ‘genie’ once it’s out of the bottle have largely proven to be futile.

Any stifling actions we take must necessarily be evaluated in a geopolitical context. Unlike nuclear weapons, which we can prevent our enemies from acquiring, the core technology behind the LLMs is now well understood by most experts in the area and only requires GPUs, data and computational knowhow, none of which can be stopped from proliferating in the same way as nuclear technology. Therefore, I submit, we can’t just regulate away Google’s Gemini AI problem. In the long run, the public and free market itself serves as a bulwark against big tech over-intrusion. If Google, and others like it, want to survive in an age of rapid and seismic technological disruption, they simply can’t afford to have such problems.