Large Language Models are a fascinating technology capable of many classic and advanced NLP tasks, from text-classification and sentiment analysis to reading comprehension and logical interference. During their evolution, starting with Gen1 in 2018 with model like GPT and Bert, to Gen4 2024 models like GPT-4 and LLAmA2, they have gained significant skills and capabilities.
LLMs are ubiquitous tools used for the processing and producing of natural language texts. Since its inception in 2018, several generations of LLMs continuously pushed the frontier of LLM capabilities. Today’s LLMs such as LLaMA 2 and GPT-4 are universally applicable to all classical NLP tasks, but this is not the case for the early models of 2018. These gen1 LLMs are around 150M parameters strong. They are typically trained in the Toronto Book Corpus and Wikipedia Text, with the goal to optimize the prediction of a word given its context of previous and following word. While the model architectures differ, e.g. number of attention heads and hidden dimensions, the resulting models need to be finetuned for any downstream NLP task.
Not long ago, question answering systems were built as complex information storage and retrieval systems. The first component processes text sources, extracts their verbatim meaning as well as specific information. Then another component extracts knowledge from these sources and represents the facts in a database or a graph datastructure. And finally, the retriever parses a user query, determines relevant parts of the processed text and its knowledge databases, and then composes a natural language answer.
For a very long time, I have been reading several articles, some papers, and lots of blog post about artificial intelligence and machine learning. The recent advances in neural networks were especially fascinating, such as the GPT3.5 model which produces human-level like texts. In order to understand the state of the art in natural language processing using neural networks, I want to design a question-answer system that parses, understands, and answers a question about a specific topic, for example the content of a book. With this far-reaching goal in mind, I started this blog series to cover all relevant knowledge and engineering areas: machine learning, natural language processing, expert systems, and neuronal networks.
In the Python ecosystem, several libraries for NLP emerged that incorporate or work with LLMs. [Haystack](https://docs.haystack.deepset.ai/docs) is a library with many features and functions. Its core use case is to build NLP applications, which includes effective information storage and retrieval as well as using LLM for classical and advanced NLP tasks.
With the advent of ChatGPT and a free to use chat client available on their website, OpenAI pushed the frontier of easy-to-use personal assistants. Running a powerful LLM that supports knowledge worker tasks like text summarization, text generation, natural languages queries about any domain, or even the capability to produce source code, are astonishing and helpful. However, these assistants require paid accounts and can only be used through vendor specific clients or APIs.
Large Language Models (LLMs) are neural networks trained on terabytes of input data that exhibit several emergent behaviors with which advanced semantic NLP tasks like translation, question answering, and text generation evolve from the model itself without any further finetuning. This remarkable feature has given them the name Foundation Models, language models capable to push the frontier of NLP to almost human-level language competence.
Natural Language Processing with Large Language Modesl is the current state of the art. Investigating how classical NLP tasks became to be solved by LLMs can be observed with using the HuggingFace [transformers](https://huggingface.co/docs/transformers/index) library.
Large Language Models are a ubiquitous technology that revolutionizes the way we work with computers. LLMs in the size of around 7B provide good capabilities and include up-to-date knowledge. With a combination of a specialized data format and quantization, these models can be executed on modes consumer hardware with a six-core CPU and 16GB RAM.