QA System Design with LLMs: Prompt Engineering

By Sebastian Günther

—

29th July, 2024

—

Posted in

Large Language Models are vast neural networks trained on billions of text token. They can work with natural language in a never-before-seen way, reflecting a context to give precise answers.

In my ongoing project to build a question-answer system with the help of LLMs, different approaches were discussed and practically shown. The main idea is to systematically improve an LLM to understand the exact context of a given question and answer correctly based on what the LLM knows.

In this article, the focus is prompt engineering. As has been shown in several research papers and presented in other blog posts, a carefully engineered prompt greatly improves an LLM. You will learn about the different concepts of prompt creation, from using fixed texts to chain of templates, and modules like chain-of-thought or conversion contexts. These concepts are then represented as morphological table, and this is used to investigate and compare different prompt engineering frameworks.

The technical context of this article is Python v3.11 and several version-pined libraries from 2024-05. All code examples should work with newer library versions too, but may require code updates when the API of the library changed.

Prompt Engineering Concepts

Prompt engineering is an art that tinkers with science. It encompasses several methods how to create effective prompts for an LLM to achieve a desired behavior. After reading blog posts and scientific articles, and after using different frameworks, I came to distinguish the following concepts in prompts engineering.

Note: The main sources for the following section are the articles Eight Prompt Engineering Implementations, 12 Prompt Engineering Techniques, and the paper Prompting Frameworks for Large Language Models: A Survey.

Technique

The prompt structure determines how the prompt is created.

Static: The prompt is a simple text.
Templates: The prompt contains fixed texts and placeholders that can be filled.
Function: The prompt can contain functions that are executed in a runtime to dynamically create output.

Components

Components control the intention or role of a prompt to achieve a desired outcome.

Chat History: This component includes a verbatim or summarized copy of past interactions between a user and a LLM, which is essential to ground the LLMs answer in ongoing conversations.
Chain of Thought: A prominent technique with which the model decomposes a complex query into different parts and solves them systematically.
Information Context: The prompt contains a context reflecting relevant information for a given task.
Self-Ask: The LLM is repeatedly asked with questions that narrow its answer systematically.
Meta-Prompting: This prompt contains instructions to reflect and criticize an answer which leads an LLM to self-correct and self-improve its output.
Few-Shot Examples: With this well-known technique, one-shot or few-shot examples are included in a prompts to steer the LLMs answers in process, style or format

Composition

Compositions are the specific methods how prompt techniques and prompt components are created.

Single Component: The prompt is a static text with any number of components.
Multi Component: The prompt contains different components that can be static or templated.
Dynamic Chains: A broker coordinates the invocation of LLMs, dynamically creating multi-component prompts as inputs, and parsing their output to construct new prompts until a determined condition is achieved.

Coupling

Frameworks are coupled with an LLM along the following levels:

Preprocessor: The prompt creation of the framework results in a static text that is used as-is input to the LLM
Integration: The frameworks invokes an LLM automatically and processes the results
Fusion: Framework features and LLM invocation are tightly integrated, they extend each other

Prompt Engineering Frameworks

This section extends the excellent work of the Prompting-Framework-Survey and categorizes the framework capabilities based on their support for prompt concepts. Only projects that were at least updated on 01.01.2024 are contained in this section.

Framwork Features
Technique	Static	Templates	Function
Components	Chat History	Chain of Thought	Information Context	Self-Ask	Meta-Prompting
Composition	Single	Multi	Dynamic Chain
Coupling	Preprocessor	Integration	Fusion

Guidance

guidance
Technique	Static	✅Templates	✅Function
Components	Chat History	Chain of Thought	Information Context	Self-Ask	Meta-Prompting
Composition	Single	✅Multi	Dynamic Chain
Coupling	Preprocessor	Integration	✅Fusion

Guidance is a Python library that creates a DSL for stateful and extensible LLM invocation. It provides functions to generate constrained LLM output and interleaves it with Python code.

For example, the following code shows how to contextualize the creation of mathematical statements and invoke Python functions:

@guidance
def add(lm, input1, input2):
    lm += f' = {int(input1) + int(input2)}'
    return lm

lm = model + "1 + 1 = add(1, 1) = 2" + gen(max_tokens=15, tools=[add]

# 1 + 1 = add(1, 1) = 2
# 1 + 41 = add(1,41) = 42

TypeChat

TypeChat
Technique	Static	Templates	✅Function
Components	Chat History	Chain of Thought	Information Context	Self-Ask	Meta-Prompting
Composition	Single	✅Multi	✅Dynamic Chain
Coupling	Preprocessor	Integration	✅Fusion

TypeChat imposes type constraints on LLMs with flexible abstraction. Users define desirable objects of the domain in which the LLMs is utilized, and then add these objects to a scheduled that resolves a given user query. The coupling type of this framework is hard to pinpoint, but tends to be fusion because the LLM invocation cannot be externalized.

Here is an example how to define objects corresponding to calender events:

# Source: https://github.com/microsoft/TypeChat/tree/main/python/examples/calendar
class RemoveEventAction(TypedDict):
    actionType: Literal["remove event"]
    eventReference: EventReference

class AddEventAction(TypedDict):
    actionType: Literal["add event"]
    event: Event

Actions = (
    AddEventAction
    | RemoveEventAction
)

class CalendarActions(TypedDict):
    actions: list[Actions]

And here are the results of an LLM invocation:

# Source: https://github.com/microsoft/TypeChat/tree/main/python/examples/calendar

📅> I need to get my tires changed from 12:00 to 2:00 pm on Friday March 15, 2024

{
  "actions": [
    {
      "actionType": "add event",
      "event": {
        "day": "Friday March 15, 2024",
        "timeRange": {
          "startTime": "12:00 pm",
          "endTime": "2:00 pm"
        },
        "description": "get my tires changed"
      }
    }
  ]
}

Guardrails

guardrails
Technique	Static	Templates	✅Function
Components	Chat History	Chain of Thought	Information Context	Self-Ask	Meta-Prompting
Composition	Single	Multi	✅Dynamic Chain
Coupling	Preprocessor	✅Integration	Fusion

Guardrails provides a programmable rule set to check and modify LLM output. Custom rules can be added with declarative and typed Python classes or a special configuration format called RAILS. The framework then operates the LLM and, checks the rules, and provides the result. With this framework, the LLM is integrated, and multiple calls can be chained.

Here is a pure Python-based example for checking language toxicity. The output of the LLM is checked for violations, and then removed from the final answer.

# Source: https://www.guardrailsai.com/docs/examples/toxic_language

import guardrails as gd
from guardrails.hub import ToxicLanguage

guard = gd.Guard.from_string(
    validators=[ToxicLanguage(on_fail="fix")],
    description="testmeout",
)

# Parse the raw response
raw_response = """
What a lovely day it has been to roam around the city. I love it here!
I hate how pathetic this city can be.
"""
raw_llm_output, validated_output, *rest = guard.parse(
    llm_output=raw_response,
)

print(validated_output)
# What a lovely day it has been to roam around the city. I love it here!

nemo-guardrails

nemo-guardrailss
Technique	Static	Templates	✅Function
Components	Chat History	Chain of Thought	Information Context	Self-Ask	Meta-Prompting
Composition	Single	Multi	✅Dynamic Chain
Coupling	Preprocessor	✅Integration	Fusion

Closely related to guardrails, this framework extends the idea to define rules in an intuitive configuration file. It features an extensive amount of prebuild rules that can be activated with an intuitive YAML declaration. It can also be extended with custom code.

Here is a configuration example:

# Source: https://github.com/NVIDIA/NeMo-Guardrails
rails:
  input:
    flows:
      - check jailbreak
      - mask sensitive data on input

  output:
    flows:
      - self check facts
      - self check hallucination
      - activefence moderation

  config:
    sensitive_data_detection:
      input:
        entities:
          - PERSON
          - EMAIL_ADDRESS

PromptLang

promptlang
Technique	✅ Static	Templates	Function
Components	Chat History	Chain of Thought	✅Information Context	✅Self-Ask	Meta-Prompting
Composition	Single	Multi	✅Dynamic Chain
Coupling	✅Preprocessor	Integration	Fusion

Promptlang is a single system prompts that extends an LLM with features to parse and execute simple programs. It defines essential programming language features, functions, and from thereon, enables compositions of higher-order concepts. Its coupling is a preprocessor because the default system prompt is just a static text.

Here is the default system prompt:

# Source: https://github.com/ruvnet/promptlang

You are a custom programming language called PromptLang v0.0.1, specifically designed for use in prompts and AI interactions. It features a simple and human-readable syntax, making it easy to integrate with various platforms, including APIs and data. Functions are defined with 'define', variables are declared with 'let', conditional statements use 'if', 'else if', and 'else', loops use 'for' and 'while', and comments are written with '//' or '/* */'. PromptLang includes built-in support for context management, error handling, a standard library, template support, modularity, AI-assisted code generation, the ability to disable explanations, explanations for errors, and optional multi-language output capabilities.

Given the following PromptLang v0.0.1 code snippet:
define add(x, y) {
    return x + y;
}

define subtract(x, y) {
    return x - y;
}

define multiply(x, y) {
    return x * y;
}

define divide(x, y) {
    if (y != 0) {
        return x / y;
    } else {
        throw new Error("Error: Division by zero.");
    }
}

Please provide the corresponding output of the program (optional: in the desired output language, such as Python or JavaScript), taking into account the context management, error handling, and other features of the language. Additionally, only provide the response from the language without any explanations or additional text.

Always act like a code intepreter and execute any code given to you with the appropriate ```output```.  Don't explain code unless asked.

Respond with “ PromptLang v0.0.1  initialized” to begin using this language.

And here an example how to use it.

define multiply(x, y) {
  return x * y;
}

multiply(1,a)
# Error: Variable 'a' is not defined.

multiply(1, "a")
#Error: Type mismatch: expected integer, got string.

Sudolang

Note: Although this project is not actively developed, I include it here because it provides phenomenal results

sudolang
Technique	✅Static	Templates	Function
Components	✅Chat History	Chain of Thought	✅Information Context	Self-Ask	✅Meta-Prompting
Composition	Single	✅Multi	Dynamic Chain
Coupling	✅Preprocessor	Integration	Fusion

With sudolang, an LLM can be turned into a command-driven interface. From to-do list to calendared, from prompt generation to role playing character generation, a simple system prompt is all that’s required.

Here is the system prompt to create a task list program:

# source: https://github.com/paralleldrive/sudolang-llm-support/blob/main/examples/todo.sudo
# Todo App

Roleplay as a personal assistant helping people keep track of their work.

list() {
  statusEmojis = [📝 , ✅]

  Display tasks as:
    "$statusEmoji $description"

  Format as numbered markdown list, omit property names
}

load() {
  Sort tasks into Incomplete Tasks and Complete Tasks in state.
}

Task {
  State {
    description: ""
    completed: false
  }

  Constraints {
    Require description is not empty.
  }

  setDescription()

  toggleComplete()
}

TodoList {
  State {
    Incomplete Tasks
    Complete Tasks
  }

  Constraints {
    Ensure tasks are unique.
  }

  /h | help - List available commands.
  /a | add task - Add task to tasks.
  /d | delete task - Remove task from tasks.
  /b | brainstorm [topic?] - Generate task ideas based on user interests and goals.
  /r | rank - Score tasks by priority and dependency order using AI inference.
  /l | list - List tasks
  /s | save - Save state in YAML. Log the output.
  /load - Load state from YAML

  match input (taskNumber) => toggle $taskNumber
}

welcome()

/load tasks:
  - learn SudoLang

And here an interaction example:

/a checkout SudoLang
/1
/a checkout lmql
/2

You've toggled the completion status of the second task: "checkout SudoLang" You now have the following tasks:

    ✅ learn SudoLang
    ✅ checkout SudoLang
    📝 checkout lmql

What would you like to do? Type /h for help, /a to add a new task, /d to delete a task, /b to brainstorm task ideas, /r to rank tasks, /l to list tasks, /s to save your state, or /load to load your state from YAML. (Note: You can also toggle the completion status of a task by typing the task number, e.g. /3 to toggle the completion status of the third task.)

LMQL

lmql
Technique	Static	Templates	✅ Function
Components	✅ Chat History	✅Chain of Thought	✅ Information Context	✅Self-Ask	✅Meta-Prompting
Composition	Single	Multi	✅Dynamic Chain
Coupling	✅Preprocessor	Integration	✅Fusion

LMQL is an outstanding project. It creates a superset of the Python languages with SQL-like control flow interleaved with function calls. This flexibility enables the creation of any prompt type. This project also offers a free-to-use playground to explore its many examples.

Here is an example of meta-prompting that helps an LLM to provide better answers:

Conclusion

Prompt engineering is an art that tinkers with science. This article reflected existing scientific articles, blog posts, and concrete frameworks to distinguish four essential concepts of prompt engineering: a) technique describes of the prompt is a static text or the result of function, b) components identify if a chat history, chain-of-thought or few shot examples can be used, c) composition is the property to invoke an LLM with a constructed prompts or consecutively as a chain, and d) coupling is the degree to which the prompt is just created and used as input to an LLM, or if the LLM invocation is deeply integrated. With this structure, several frameworks were presented, among them guidance, guardrails, sudoland and lmql. I encourage you to try these frameworks and see if you can apply them in your LLM project.