ChatGPT lied to me! - a tale on how not to judge Language Models.

Introduction

Lately, the imagination of many people has been stirred up by ChatGPT, a wonderful creation by OpenAI that demonstrates incredible abilities in natural language understanding and information processing. News about surprising and brilliant responses, as well as the coding skills of experienced programmers, frequently circulates on the Internet. At the same time, there are rumors of mistakes made by ChatGPT, such as false accusations against various individuals for example, the case of the Australian mayor, as well as idiocies and inconsistencies in its statements. I am surprised to observe at least three attitudes towards ChatGPT:

Treating it as a “smarter Google search engine” - some people representing this view believe that ChatGPT is simply a “linguistic wrapper” for a content search engine;
Hype and conviction that we are dealing with “AI” that will revolutionize every aspect of life - this approach prevails among technological enthusiasts, corporate managers, and some AI enthusiasts.
Strong skepticism or even fear of losing their job and being replaced by a new invention - such voices come, among others, from the group of programmers observing the capabilities of ChatGPT, CODEX model, or Github Copilot.

Many people have yet to learn what ChatGPT and related solutions are. Consequently, they wonder why they work the way they do and make mistakes. Let’s try to answer this question.

Machine learning

ChatGPT belongs to a vast group of large language models (LLM), and machine learning (ML) algorithms that have been known for a long time and constitute one of the subdomains of research into artificial intelligence. Many definitions of this concept have been proposed, but we can mention two here.

Learning means adaptive changes in a system that enable it to do the same tasks or tasks of the same category better in the future.

Simon, H. A. (1983). Why should machines learn?. In Machine learning (pp. 25-37). Morgan Kaufmann.

The second definition was formulated by the outstanding Polish scientist and author of the excellent book “Systemy Uczące się” (“Learning Systems”), Paweł Cichosz:

Learning of the system is any autonomous change based on experiences, leading to an improvement in the quality of its operation.

Cichosz, P. (2007). Systemy uczące się. Wyd. 2. Warszawa: Wydawnictwa Naukowo-Techniczne.

Both of these definitions share essential points:

Gradual acquisition of experience.
Application of this experience to improve the system’s performance gradually.
Changes are autonomous - meaning the system regulates how it uses accumulated knowledge and does not require human reprogramming.

In a nutshell, this is how most machine-learning systems work. Many of them, including language models, are taught in a way very similar to how humans learn.

The following list shows a typical process of supervised learning of machine learning models:

We show the model some data (questions).
We allow the algorithm to present its own answers.
We verify the result with the “key” (correct answers), giving feedback: “right / wrong”. It can be used to by the system to correct its behavior.

graph LR questions[question or material] --sent to--> algorithm correct[correct answers] --> verify subgraph learning algorithm answers[proposed answers] verify algorithm --sents --> answers --> verify verify --correction--> algorithm end

The process shown above is called “supervised learning” and is one of many ways to train ML models.

Language models

Language models fit into the described schema, including ChatGPT, Bard, and others. In their case, learning involves showing natural language sentences along with their context and, for example, the thematic category to which they belong. The algorithm learns which sequences of words, sentences, paragraphs, and sections describe specific issues. The division into questions/learning material and proposed answers can look as follows:

Task	Question	Answer
What is the next word?	The cat sat on	mat
	The Battle of Grunwald took place in	1410
What category does the sentence belong to?	The stock market was gloomy, share prices are falling.	Economy
	The president gave a wonderful speech, which will strengthen his position in the upcoming elections.	Politics
Answer the question with a full sentence.	How many wheels does a typical car have?	A typical car has 4 wheels.

As you can see, many questions/tasks have more than one correct answer. Usually, it can be formulated in several ways in a given language. It also happens that multiple sentences or questions may be mutually contradictory (“The cat sat on the mat”, “The cat sat on the rug”, “The cat sat on the bed”). What is important - ChatGPT and other models of its kind learn to operate in natural language by analyzing many sentences and statements. Statements that can be false, or contradictory - as they express people’s opinions or emotions (especially when discussing non-technical topics). Their task is to formulate such statements in writing, as a person does when typing on a keyboard, they are not encyclopedias or symbolic knowledge bases.

When you look into an encyclopedia, you usually search by index (e.g., letter “u” → machine learning) or in a particular category (e.g., science category → letter “u” → machine learning).

Language models are indeed trained on sentence categories (e.g., sentences related to politics or economics as above). Still, their task is to formulate smooth statements on a given topic, not to search encyclopedically (with a few exceptions, see below 🙂).

How can these models understand written language? There are many ways to train, but one of them is the so-called attention mechanism, which forms the foundations of architecture with the well-sounding name “Transformers.”

Attention mechanism and transformers

When familiarizing themselves with a statement, a person using any language builds a “dependency tree” in their head - what the sentence describes, what the subject is, and what the predicate is. Each language has its rules - English has an advantage in some respects (smaller word variation than Polish, or Slavik in general). Still, the principle of operation is very similar.

Look at the following sentence:

The woman sat on the bench and looked around the park.

Who sat? The woman.
What did she sit on? On a bench.
Who looked around - the woman or the bench? The woman.

For a person who knows English, such dependencies are obvious. Foreigners, as well as language systems like ChatGPT, have to learn them. People learn grammar, tenses, voices, and moods by taking courses, exercising, and talking to tutors.

ChatGPT and other language models learn by analyzing many examples of sentences and their context. They learn to work with the feedback loop, checking if the model:

Correctly recognized the context;
Generated sentences that are similar to those used in a given language.

The attention mechanism allows algorithms to take into account the previous context of the sentence and the influence of its individual parts on what should be next (the next word or sentence), or on the analysis of the topic / issue (classification).

Attention mechanisms in artificial neural networks play a similar role to receptive fields and “competitive attention” in biological organisms - they focus the system on a selected piece of the environment, allowing it to analyze the context of the events taking place.

The diagram below shows a very simplified attention mechanism for an example sentence. The asterisks indicate the subject of the sentence: it is woman and undoubtedly, further sentence elements will refer to her as the main “heroine”.

graph subgraph sentence w1[word 1: *woman*] --> w2[word 2: sat] --> w3[word 3: on] --> w4[word 4: the] --???--> w5[word 5: ?] end subgraph representation w1 --> sym1[symbol1] w2 --> sym2[symbol2] w3 --> sym3[symbol3] w4 --> sym4[symbol4] sym1 --> sym2 --> sym3 --> sym4 end subgraph " " sym1 --*weight1*-->ca sym2 --weight2-->ca sym3 --weight3-->ca sym4 --weight4-->ca ca[context analysis] --prediction--> w5 end

Many attention mechanisms are used in the Transformer architecture, which is the basis for ChatGPT and other language models.

It is also worth remembering that machine learning algorithms do not process language directly - words are converted to numerical representation in a mechanism called embedding - which allows words to be represented as vectors of numbers. Several probabilistic and mathematical operations are performed in the learning process to reflect the learned content. If everything goes according to plan, synonyms are then close to each other on the numerical plane. In addition, arithmetic operations are possible on them, leading to surprisingly accurate transformations, such as:

king - man + woman ~ queen

An example of embedding words as numerical representations is shown below:

Some time ago, I had the pleasure of talking at the Data Science Summit conference about the exact mechanism of using embeddings in classification tasks or recommendations. The materials from the lecture can be found here

ChatGPT was learned from vast collections of texts from the web (e.g., BooksCorpus, WebText2 ) and the so-called Common Crawl. It is a collection of data consisting of billions of web pages that are regularly crawled and indexed by Google robots and other similar companies. OpenAI, the organization responsible for creating ChatGPT, used the 2019 version of this collection, which consisted of over 40 terabytes of text data. The collection included many text types: articles, reviews, blog posts, and even internet forums.

What is important: many texts and statements circulating online contradict each other. People give false information, answer off-topic, and create conspiracy theories…. All this, together with the statement’s context, became a learning material for LLM.

Are they walking encyclopedias?

The description given above, greatly simplified, should give us an idea of ChatGPT-related algorithms and how they work. In the simplest terms, they are machine learning models based on many examples, trained to generate new statements that answer specific questions in a free-form manner.

Each time they construct their texts anew. They are guided by context and attention mechanisms that allow them to recognize the topic of the “discussion,” issues raised so far, etc. In this process, they try to make their work similar to those presented to them during the learning process.

In this context, ChatGPT and LLM models cannot be considered “walking encyclopedias” or content search engines. They do not possess structured, comprehensive knowledge and are not conditioned to provide “correct answers” – they generate responses consistent with the language used in the conversation, the context of the conversation, its topic, etc. Therefore, they may be inaccurate.

When writing this text, plugins and extensions are already being created to allow LLM models to connect to knowledge bases, and use content search engines or mathematical tools such as Wolfram Alpha to perform intentionally correct calculations.

Hovewer, at their core, LLM models are language models, not walking encyclopedias. Therefore, “accusing” them of making a mistake or generating incorrect content is a complete misunderstanding of the purpose of these tools.

Artificial languages

Given the above description, why is ChatGPT so efficient at programming and writing code?

The word “language” concerning programming dialects is justified. Programming languages are symbolic, artificial languages designed for a practical purpose - creating working applications and algorithms imperatively, unambiguously indicating the operations to be performed. Unlike human languages, programming languages:

are much more unambiguous, devoid of unnecessary “ornaments”;
have a strictly defined syntax.
In addition: publicly available code on the Internet is often written correctly and works.

If we add up the above three points:

“Unambiguity + transparent, formalized language + many correct and accurate examples”

, we get an excellent set of learning statements for LLM. Unlike forums, blogs, and comments, where everyone expresses their opinions - “statements” written in programming languages are much more concise, correct, and unambiguous.

What’s next?

ChatGPT and related LLM models constitute a revolution in practical AI applications. They will allow for the automation of many tasks, generating content previously reserved for human creators, including source code for applications.

LLM poses many challenges, among others, for the education system - how to recognize works created independently by students? How to protect people from laziness and relying on LLM to perform many tasks instead of stimulating…

Furthermore, the increasing use of LLMs raises ethical questions, such as the potential for misuse and the need for transparency in their development and application. It is essential to ensure that LLMs are used ethically and for the benefit of people rather than causing harm or exacerbating existing inequalities.

Overall, LLMs represent a significant advancement in AI technology. Still, their full potential can only be realized through responsible development and application, as well as proper education and awareness among users.