Published by the Students of Johns Hopkins since 1896
April 27, 2024

Stefano Soatto demystifies large language models as ChatGPT advances

By ANNIE HUANG | October 9, 2023

artificial-neural-network-with-chip

MIKE MACKENZIE / CC BY 2.0

Soatto’s Sept. 26 talk addressed public anxiety regarding large language models.

In the Sept. 26 Department of Computer Science (CS) Distinguished Lecture Series, Stefano Soatto, a CS professor from the University of California, Los Angeles, and Vice President of Applied Science for Amazon Web Services AI, spoke about the learning and controllability of large language models (LLMs) and computer vision. His talk, titled "Foundational Issues in AI: Views from the Real and Ideal Worlds," used analytic methods to address several concerns about the controllability of LLMs.

For texts to be processed by computers, large paragraphs are “chopped up” into smaller word segments called “tokens,” which are then represented with high-dimensional vectors. These vectors, when combined, accurately represent the entire sentence.

LLMs, as epitomized by OpenAI’s ChatGPT, are powerful models that seem to resemble human intelligence, being capable of engaging in “conversation“ mirroring natural speech. However, the logic behind these LLMs is a simple framework called autoregressive generation. 

In simple terms, autoregressive generation is the process of repeatedly predicting what the next word should be based on all words coming before it. Incorrect predictions are considered “loss” and are penalized through a reward system, which helps steer the model to achieve better performance. The new tokens and feedback are integrated into the system, and the process continues iteratively until the sentence is complete.

As these models advance and become increasingly powerful, theory has fallen behind empirical results, and research is falling short of explaining why these models work so well. This is because of the nature of a neural network, which inputs data and outputs the result but has a “black box” in between that is not explainable. Meanwhile, people start to observe some unexpected behaviors, which are called “emergent behaviors” in those models.

An example of this emergent behavior, according to Soatto, is that if ChatGPT is asked to “think step by step,” the model will increase in accuracy from 25% to 90%. Even more stunning is that when the model is told to “take a deep breath,” its efficacy goes up by another 10%. These system prompts make it seem as if the AI bots have emotions and reasoning abilities like humans. 

The result has raised public concerns and led to speculations that AI bots are becoming “emotional” and even conceiving human intelligence.

So, are these LLMs on the brink of getting out of control? According to Soatto, the answer is a definite no. 

“An AI bot is controllable when restricted to the space of meanings,” he said after giving a series of proofs.

Soatto defines the space of meanings as the possible responses that are intelligible to humans, which can be constructed as a high-dimensional vector space. He gave proof that due to the invertible nature of the space, the LLMs are controllable. This means that AI bots are predictable to some extent. However, it also means that they can be used to generate a specific result by a sophisticated user — an example of this would be the use of LLM to create a given fake news story.

Another important question is, can LLMs actually understand human reasoning, or are they just machines good at imitation? 

Soatto explained that LLMs can understand human reasoning, however, an outside observer is unable to tell when the model has learned something. It is, however, possible to tell if the model has not learned something via experiments.

Soatto extended these arguments to the field of computer vision, which is another application of machine learning. To him, there is no fundamental distinction between LLMS and vision perception, although computer vision models have their limitations.

One of those limitations, as emphasized during the talk, is the inability to distinguish between scene and image. 

“Models like NeRFs [Neural Radiance Fields] can represent the images on which they are trained, but not the physical scene,” Soatto shared. 

However, composing a NeRF with a latent diffusion model or other inductively trained generative model yields a viable representation of the physical scene.

Tom Wang, a freshman in CS who attended this seminar, expressed his excitement about natural language processing models like ChatGPT. 

“Professor Soatto’s talk is illuminating and has allowed me to see what is really going on in the field of CS and how rigorous mathematical proofs can be used to prove the controllability of AI, which is seemingly uncontrollable,” he said. “I would love to take some upper-level NLP courses next year!”


Have a tip or story idea?
Let us know!

Comments powered by Disqus

Please note All comments are eligible for publication in The News-Letter.

Podcast
Multimedia
Be More Chill
Leisure Interactive Food Map
The News-Letter Print Locations
News-Letter Special Editions