AI Magazine November 2024 | Page 95

DEEP LEARNING

ChatGPT ’ s Massive Model

GPT-4 ’ s exact parameter count is not public , but is estimated to be in the trillions
and Decoder-only groups of models ,” says Pramod Beligere , Vice President of Generative AI Practice Head , Hexaware .
This helped make way for the introduction of the Transformer architecture , aiding developments of key releases like language model BERT in 2018 , which introduced bidirectional training , and the subsequent release of GPT-2 in 2019 , which demonstrated impressive text generation capabilities .
All this paved the way for the Gen AI revolution we now find ourselves in , with the release of models such as GPT-3 , which boasts an impressive 175 billion parameters , significantly enhancing context comprehension and text generation capabilities .
But it is these advancements working in tandem with the deep learning that occurs in LLMs that brought about the new capabilities .
“ By leveraging neural networks , especially transformer architectures , deep learning enables LLMs to process and generate human-like text . Techniques such as attention mechanisms allow models to focus on relevant sections of the input data , enabling better understanding of the context ,” explains Pramod .
The black box problem Despite their impressive capabilities , it is difficult to see inside LLMs to understand their decision-making processes . Considering a model like GPT-3 with 175 billion parameters , the sheer number creates an intricate web of interconnections , making it nearly impossible to trace the exact path from input to output .
This complexity is further compounded by the model ’ s use of attention mechanisms , which allow it to focus on different parts of the input when generating each word of the output , which means it ’ s not linear or easily interpretable , but rather a result of countless subtle interactions between these parameters .
Understanding how specific inputs lead to specific outputs is therefore comparable to finding a needle in a haystack .
From an enterprise perspective , not knowing how a faulty assumption has been made may mean they do not know where the problem from an output came from , making error rectification difficult .
Equally , this lack of transparency in LLMs has led to significant issues and misunderstandings .
“ The ‘ black box ’ nature of LLMs raises concerns about bias and accountability ,” explains Fred Werner , Co-founder of the UN ’ s AI for Good . aimagazine . com 95