Large Language Models are a unique emergent technology of 2024. Its fascinating capabilities to produce coherent text influences many areas and use cases, especially pushing the boundaries of classic natural language tasks.
In my blog series, I covered several aspects of LLMs: Understanding their evolution, investigating libraries, researching, and trying how LLMs can be fine-tuned. Ultimately, I want to use LLMs for a personal assistant that has access to documents and databases, providing a natural language interface to books and sensor data. The explosion of open-sourced LLMs beginning in 2023 lead to exponential growth that now in earl 2024 culminates into one question: Which model to choose for which application type?
Fine-Tuning and evaluating LLMs require significant hardware resources, mostly GPUs. Building an on-premise machine learning computer is always an option. But unless you are running this machine 24-7, rented infrastructure for a short period of time may be the better option. And additionally, you get access to scalable hardware for the workload type: Why stop with a single 24GB GPU when you can have 10?
Fine-Tuning LLMs with 7B or more parameters require substantial hardware resources. One option is to build and on-premise computer with powerful and costly GPUs. The other option is to use cloud environments, including free services, like Collab and Kaggle, and paid services, like Replicate and Paperspace. These environments offer Jupyter notebooks in which you can run your LLM fine-tuning code. However, these environments have constraints and limitations that need to be considered, such as the maximum amount of time that a notebook can run.
When using Large Language Models (LLMs) via an API or locally, a quasi-standard for representing the chat history is recognizable: A list of messages, and each message denominates the speaker and the actual content. This format is provided by any OpenAI API compatible LLM engine, and it is also used internally by tools that provide a CLI-like invocation, for example AutoGen.
Agent frameworks powered by LLMs promise to catapult autonomous task solving to unprecedented levels. Instead of rigid programming, LLMs reflect tasks, utilize tools, and check each other’s outputs to solve tasks creatively.
An agent is a Large Language Models customized with a system prompt so that it behaves in a specific way. The prompt typically details task types, expected task solution behavior, and constraints. Typically, an agent is invoked by a human user, and every interaction needs to be moderated. But what happens if an agent LLM interacts with other agents? And how does an agent behave when he has access to additional tools, e.g. to read additional data sources or to execute program code?
Large Language Models used as agents promise automatic task solution and to promote LLM usage to the next level. Effectively, an agent is created with a specific and refined prompt, detailing task types, expected task solution behavior, constraints, and even linguistic tone. Tools are the necessary ingredients to make the agent effective for its tasks. But what are these tools? And how can they be added to an agent?
In my ongoing quest to design a question-answer system, agents are the final available design. An LLM agent is an instance of an LLM with a specifically crafted prompt so that it incorporates a defined behavior and mode of talking. A further enhancement are tools, essentially functions that provide access to additional source of information’s or enable the application and execution of programming code.