AI Magazine September 2024 | Page 114

MACHINE LEARNING
Because these systems are so data hungry , the quality of data is sometimes overlooked in order to meet the quantity these systems need .
Although pre-processing of such data can occur , Brad argues it would be better to be more mindful of the source : “ The quality of the written or spoken words on Reddit or social media streams isn ’ t always the best quality and therefore a good use of data .”
Poor , incomplete or biased training data can lead to skewed results , perpetuating existing inaccuracies , flaws , biases , or creating new ones .
Halting hallucinations As NLP technology continues to evolve , new approaches are emerging to address these challenges .
One promising solution is known as Retrieval-augmented generation ( RAG ). This technique aims to reduce hallucinations by grounding language models in verified information sources .
“ RAG is a practical way to overcome the limitations of general LLMs by making enterprise data and information available for LLM processing ,” explains Bern . “ It is essentially a way to allow targeted information to be retrieved ( often via
114 September 2024