The [LangChain](https://python.langchain.com) library spearheaded agent development with LLMs. When running an LLM in a continuous loop, and providing the capability to browse external data stores and a chat history, context-aware agents can be created. These agents repeatedly questioning their output until a solution to a given task is found. This opened the door for creative applications, like automatically accessing web pages for making reservations or ordering products and services, and iteratively fact-checking information.
Large Language Models (LLMs) are complex neural networks of the transformer architecture with millions or billions of parameters. Trained on terabytes of multi-domain and often multi-lingual texts, these models generate astonishing texts. With a correctly formatted prompt, these models can solve tasks defined in natural language. For example, classical NLP tasks like sentiment analysis, classification, translation, or question answering can be solved with LLMs, providing state-of-the-art performance over other NLP algorithms. On top of that, LLMs show advanced emergent behavior, enabling them to generalize to unforeseen tasks.
With an Retrieval Augmented Generation (RAG) framework, documents relevant for a given user query can be extracted from a database and used to enrich prompts for an LLM. This enables LLM invocation with both up-to-date and private data, greatly improving answer quality.
Large Language Models need accurate and up-to-date information when generating text for specific domains or with content from private data sources. For this challenge, Retrieval Augmented Generation pipelines are an effective solution. With this, relevant content from a vector database is identified and added to the LLM prompt, providing the necessary context for an ongoing chat.
Large Language Models have one crucial limitation: They can only generate text determined by the training material that they consumed. To produce accurate and correct facts, and to access recent or additional information, a Retrieval Augmented Generation (RAG) framework is added to the LLM invocation. The basic idea is to fetch relevant content from a (vector)database, optionally transpose or summarize the findings, and the to insert this into the prompt for the LLM. With this, a specific context for the LLM language generation is provided.
Large Language Models are vast neural networks trained on billions of text token. They can work with natural language in a never-before-seen way, reflecting a context to give precise answers.
Large Language Models are vast neural networks trained on billions of text token. They can work with natural language in a never-before-seen way, reflecting a context to give precise answers.
Large Language Models have fascinating abilities to understand and output natural language texts. From knowledge databases to assistants and live chatbots, many applications can be build with an LLM as a component. The capability of an LLM to follow instructions is essential for these use cases. While closed-source LLMs handle instructions very good, pretrained open-source model may not have the skill to follow instructions rigorously. To alleviate this, instruction fine-tuning can be utilized.
Starting in 2023, Large Language Models evolved to form or be a component of information retrieval systems. In such a system, domain knowledge is encoded in a special format. Then, given a user query, the most relevant chunks from the knowledge base are determined and an answer is formulated. In LLMs, the knowledge base is all learned training material. However, given the learned vector representations of words, other content can be embedded in the same vector space. And in this vector space, similarity search between user queries and stored knowledge can be made to identify the context from which an LLM answers. This is a LLM retrieval system in a nutshell.
Large Language Models are a fascinating technology that becomes embedded into several applications and products. In my blog series about LLMs, I stated the goal to design a closed-book question answering systems with LLMs as the core or sole components. Following my overview to [question answer system architecture](https://admantium.com/blog/llm13_question_answer_system_architectures/), this is the second article, and its focus is to add question-answering skills to a Gen1 LLM.