TL;DR
In just a few months since the rise of transformer-based models, Large Language Models have demonstrated their stunning ability to engage in human-like conversations, revolutionizing business productivity and rendering traditional search engines more or less obsolete.
However, this ability is limited to the information on which they have been trained, or publicly available information via internet searches, while corporate data remains inaccessible.
Retrieval Augmented Generation (RAG) [1] attempts to address this limitation by allowing a model to respond based on a document database, thus providing justified answers and significantly reducing hallucinations. However, their scope is still restricted to answering questions rather than performing tasks autonomously, using different tools at their disposal (searching the internet, searching a database, rephrasing/clarifying user questions...) to accomplish complex tasks.
This learning-by-doing process is a revolution in the making for Large Language Models (LLMs): it is called developing “Agentic capabilities.”
1. What is an Agent?
To build an effective agent, we can tap into the intrinsic capabilities of LLMs, which extend beyond mere Natural Language Processing (NLP) tasks like text understanding and generation to include genuine reasoning abilities. Recent advancements in Generative AI focus on enhancing LLMs with agentic capabilities, enabling them to plan and execute tasks.
Agentic capabilities involve using a range of tools to achieve specific goals. To accomplish complex tasks, an agent must select the right tools for the job and use them effectively. This requires the agent to understand the task at hand, identify the necessary tools, and apply them in the correct order.
For example, if a user asks this query: “Respond to this customer complaint after reviewing our contract and research for relevant civil code articles.”
The agent's task list might look like this:
- Clarify the user's question to ensure understanding
- Analyze the attached customer complaint document
- Retrieve the client's contract from the company's document database
- Research relevant civil code articles
- Generate a response to the customer complaint
By breaking down complex tasks into manageable steps and using the right tools for each step, the agent can effectively respond to the customer complaint and provide a high-quality solution
2. What is a tool?
A tool is a predefined workflow designed to accomplish a specific task, which can be accessed and utilized by an agent to achieve a particular goal or respond to a user request.
In this sense, a tool is a self-contained module or component that performs a specific function or set of functions, and can be combined with other tools to accomplish more complex tasks. Tools can be thought of as building blocks that an agent can use to construct a solution to a user's request.
Examples of tools might include:
- An NLP module that can analyze text and extract relevant information
- A database query tool that can retrieve specific data from a database
- A machine learning model that can generate text or make predictions based on input data
- A workflow automation tool that can execute a series of tasks in a specific order
By accessing and utilizing these tools, an agent can perform complex tasks and respond to user requests in a more efficient and effective manner.
3. The benefit of developing an Agent Capability: a better “grounding” of the LLM into reality
This agentic augmentation of the LLM has a triple benefit:
- Extend the use of the LLM to action planning while better leveraging on its reasoning capabilities. In this regard, this first benefit is for the user of the LLM.
- Have an augmented access to the world outside through the use of dedicated tools, thus extending (and potentially updating, like in RAG) the information the LLM can access beyond the data seen during its training phase.
- Leverage predefined models to execute specific tasks and achieve goals with ease.
Moreover, through its agentic performance, the LLM receives new feed-backs from its actions and thus increases its reasoning capabilities while learning from confronting the outside world.
The agentic capabilities can be considered as a first step to ground the LLM into the real world.
4. The example of TRICE, a two-steps approach to Agent Capability Building
Several research papers are currently being published reflecting the growing interest in Agents. The “Toolformer” paper [2] explains how LLMs can “teach themselves to use external tools” via simple APIs.
The CRAFT methodology (Customizing LLMs by Creating and Retrieving from Specialized Toolsets [3]) aims at creating tool sets specifically curated for some given tasks and equipping LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks.
A recent research experiment [4] introduces a new methodology: TRICE (Tool leaRning wIth exeCution fEedback) which leverages on the grounding capability of the LLM into the real world through the usage of tool. The LLM is trained to use tools in two steps:
- a behavioral training step, whereby the LLM learns how to use a tool by behavioral imitation.
- a reinforcement learning step, where the learning is reinforced through execution feed-back (RLEF, Reinforcement Learning through Execution Feedback) which helps the LLM select the appropriate tool among a set of available APIs in order to perform a given task.
The experiment shows that this methodology outperforms other methodologies like “Toolformer” and gives a hint at how an agentic LLM can potentially evolve towards general complex task-solving capabilities when properly leveraged through the grounding dimension of its role as agent. In particular, for mathematical reasoning, which is one of the most challenging use cases for the classical use of the LLM, the TRICE approach uses the Calculator tool, as does the “Toolformer” experiment, yet outperforms “Toolformer” in this exercise.
5. The Agent Capability: overcoming the traditional limitations of LLMs in order to build automatic workflows
The difference in performance between “Toolformer” and “TRICE” is somehow acknowledged in both papers, pointing to the intrinsic limitation of the LLM and how both methodologies managed to overcome them. In the “Toolformer” methodology, main limitations come from two factors:
- The LLM is very sensitive to prompt engineering: in this respect, the lecturing phase in the TRICE methodology aims at improving the accuracy by which the LLM manages to identify the adequate tool to be used to solve a given use-case, overcoming the ambiguity related to the prompting phase.
- The LLM is limited to its pre-training knowledge and is not grounded in reality: in the TRICE methodology allows for better user feedback when selecting the adequate tools to solve a problem.
For Business use, the agentic capabilities of the LLM enable the creation of business workflows by better leveraging on the LLM reasoning capabilities while overcoming its traditional limitations (length of its context windows, lack of proper memory, complex prompt engineering, limited zero-shot capabilities, hallucinations...).
LightOn has implemented its own version of Agentic capabilities in Paradigm as an augmented version of its current “task-builder” and chat with docs (RAG) capability already available with the previous version of Paradigm (see next blog to come).
[1] Retrieval Augmented Generation: LLMs instructions are Augmented with information available in a private database
[2] “BOLAA: BENCHMARKING AND ORCHESTRATING LLM-AUGMENTED AUTONOMOUS AGENTS”, arXiv:2308.05960v1 [cs.AI] 11 Aug 2023
[3] “CRAFT: CUSTOMIZING LLMS BY CREATING AND RETRIEVING FROM SPECIALIZED TOOLSETS”, arXiv:2309.17428v1 [cs.CL] 29 Sep 2023
[4] “Making Language Models Better Tool Learners with Execution Feedback”, arXiv:2305.13068v1 [cs.CL] 22 May 2023