ChatGPT- What? Why? And how?
ChatGPT: the word everyone has heard of, but why has this thing become such a big revolution and earned such huge praise and a name in such a short time? What does it even do? Why is it being spread so much? How do I use it exactly?
All these questions will be discussed and explained in this blog. ChatGPT has garnered significant attention and praise in the AI community due to its innovative capabilities as a language model. This blog aims to provide an in-depth understanding of ChatGPT, including its purpose, function, and utilization. By exploring the technology behind ChatGPT, the reader will gain a deeper appreciation for its potential applications and the impact it has had on the field of artificial intelligence. Join us as we delve into the world of ChatGPT and uncover its significance in the development of advanced AI systems.
Outline:
- In-depth architecture and working of ChatGPT (part 1)
- ChatGPT and InstructGPT (part 1)
- How ChatGPT is used in the industry (part 2)
- Microsoft and ChatGPT (part 2)
- Limitations of ChatGPT (part 2)
- Prompt Engineering and its Importance for ChatGPT (part 2)
Architecture and working of ChatGPT:
ChatGPT is a language model designed to provide human-like responses to the thing asked and spoken by humans with it. It is a model which can perform several human-like tasks like question answering, generating text, writing poems, providing explanations to the questions it receives, and many more.
The increase in the size of parameters for the models
The image above illustrates the way parameters have increased from GPT-1 to GPT-3, resulting in better performance and efficiency of the model: the higher the number of training parameters the greater the model’s knowledge. Basically, parameters are a synonym for weights, which is the term most people use for neural network parameters. It is challenging to train a model with such a high number of parameters, it requires access to a supercomputer with ~10,000 GPUs and ~285,000 CPU cores and around 12-15 months of training time with such high resources.
ChatGPT is based upon an advanced machine learning technique called Transformer, which is a type of neural network architecture. The training process involved a large corpus of text data, in the order of billions of words; the resulting model is able to predict the next word in a sentence given the preceding context. This training was done by OpenAI, a leading AI research organization. During training, the model was exposed to a massive amount of diverse text data from various sources, including books, articles, websites, and more. This allowed the model to learn about a wide range of topics and styles of writing, as well as the relationships between words and phrases.
The training process also involved fine-tuning the model on specific tasks – in the conversational domain, such as language translation or question answering, to improve its performance further. This fine-tuning process involved adjusting the model's parameters to better fit the task-specific data and was done using supervised learning, where a general-purpose model – GPT 3.5 - was trained on labeled examples of the specific task. Overall, the training process involved a combination of unsupervised and supervised learning, as well as transfer learning, where the model was able to leverage its knowledge from the large corpus of text data to better perform specific tasks.
ChatGPT uses the Transformer architecture, which is a deep learning model designed for natural language processing (NLP) tasks.
The Transformer model is made up of two main components: the encoder and the decoder.
The encoder does the work of processing input text, while the decoder is used to generate the output text. Also, a multi-head attention mechanism is used, which is a key component of the Transformer architecture, allowing the model to attend to different parts of the input sequence simultaneously, which helps it to generate more accurate responses. Furthermore, the transformer architecture also consists of Position-wise feed-forward networks, where a simple neural network is applied to each position in the input sequence separately. The purpose of this structure is to allow the model to learn more complex representations of the input text.
ChatGPT is a pre-trained language model, which means that it has been trained on a large corpus of text. It is done using a technique called transfer learning. Transfer learning is a technique that allows a pre-trained model to be used for a new task by fine-tuning it on a smaller dataset. This allows the model to leverage the knowledge it has gained from pre-training to improve its performance on the new task.
Transformer Architecture (Source Paper: Attention is all you need)
ChatGPT can be further fine-tuned for specific NLP tasks, such as text classification or language translation, It is also possible to fine-tune a model for specific domains and use cases, such as medical or legal language. This helps the model to better understand the specific language used in these domains, which can improve its performance on tasks related to those domains.
ChatGPT was created by fine-tuning GPT 3.5, a general-purpose model whose most common feature is text completion. The language abilities of ChatGPT were enhanced through the collection of data from various sources and the use of a reward model. Reinforcement learning techniques such as Reinforcement Learning from Human Feedback (RLHF) and Proximal Policy Optimization (PPO) were utilized to further improve its language abilities. The methods used for training both InstructGPT and ChatGPT are similar, but there are minor differences in the data collection process. These techniques enabled ChatGPT to better understand human language and communicate more effectively.
Reinforcement learning from human feedback is a subfield of machine learning that involves learning from a human expert's feedback to improve an agent's decision-making skills. In this approach, the expert provides feedback to the agent in the form of rewards or penalties, which are used to update the agent's policy. There are several methods for implementing reinforcement learning from human feedback which include imitation learning, reward shaping, and interactive learning.
Imitation Learning: In imitation learning, the agent learns to mimic the behavior of a human expert. For example, in the game of chess, an expert can provide the agent with a series of moves to make in a given position, and the agent can learn to imitate those moves. This approach can be useful in situations where there is a clear right or wrong action to take, and the expert's actions can be easily observed.
Reward Shaping: In reward shaping, the expert provides additional rewards or penalties to the agent's behavior to guide its decision-making. For example, in a driving simulation, the expert can provide a reward for staying within the lanes or a penalty for colliding with other vehicles. This approach can be useful in situations where the optimal behavior is not easily defined, and the expert can provide additional guidance to the agent.
Interactive Learning: In interactive learning, the agent and the expert work together to improve the agent's decision-making. For example, in a robotics task, the expert can guide the agent's actions in real-time, providing feedback as the agent explores the environment. This approach can be useful in situations where the agent's actions have a significant impact on the environment and the expert can provide real-time feedback to prevent mistakes.
RLHF explained for chatGPT (source: OpenAI website)
The other algorithm that is introduced by OpenAI and is used in the modeling and training process is Proximal Policy Optimization (PPO) which is a class of Reinforcement learning and comes mostly under the reward shaping type of reinforcement learning. PPO is a reinforcement learning algorithm that is commonly used for training large language models like ChatGPT and Instruct GPT. It works by adjusting the parameters of the model such that the reward is maximized for it, and it does small updations on the policy. A policy is the inner working and parameters of a model like the decision-making function.
The PPO algorithm has two key components: the policy network and the value network. The policy network generates actions based on the current state, while the value network estimates the expected reward for the current state. PPO uses an interaction algorithm that updates the policy and value networks based on actions and rewards.
To optimize the network, PPO uses a surrogate objective function that approximates the expected reward of the current policy. Stochastic gradient descent (SGD) is used for optimization, along with techniques like clipping and regularization.
PPO is an efficient algorithm that works well with large language models, providing stable and efficient network updates. It's also an interactive model that helps language models learn from their environment. For more information on PPO, you can visit this link: https://openai.com/blog/openai-baselines-ppo/#ppo.
Working of Language Model
The above figure explains how the language models respond to different inputs and give output according to the input. One single language model does many works based on input changes, and we can get different and desired outputs with appropriate prompts. Prompts are sentences the model’s user provides as input, which describes the output desired from the model, like“ Please answer the following question” or “Answer the following question by reasoning step-by-step”. For the same input prompt, we might get different outputs based on what we have asked the model since large language models like ChatGPT are not deterministic. So prompts play a major role in the way a model processes the input and finally gives an output for it, there is a special field called prompt engineering for this purpose which teaches one how and when to use which prompts, how to create certain prompts based on certain needs and more.
ChatGPT and InstructGPT:
ChatGPT and InstructGPT are both variants of the GPT (Generative Pre-trained Transformer) model, a type of deep learning model for natural language processing tasks. While they share many similarities, there are also a few differences between ChatGPT and InstructGPT. ChatGPT is trained to generate conversational responses to a wide range of inputs, while InstructGPT is trained to generate instructional text, such as how-to guides or manuals. Another difference is in the training data setup:
- InstructGPT is trained with interactive learning from human instructors who provide corrective feedback on the generated text;
- ChatGPT is trained using supervised fine-tuning, with a training dataset which is a combination of the InstructGPT dataset - transformed into a dialogue dataset - and a new dialogue dataset made up of conversations in which the trainers played both sides - the user and an AI assistant.
Despite their differences, both ChatGPT and InstructGPT demonstrate the power of GPT-based models in generating high-quality, context-aware text for a variety of applications.