我们聊聊如何构建通用LLM Agent 精华
1. LLM Agent是什么
2. 构建通用型大语言模型智能体的步骤
2.1 选择合适的大语言模型
- 大规模多任务语言理解(MMLU)(推理)
- 伯克利函数调用排行榜(工具选择与工具调用)
- HumanEval 和 BigCodeBench(编码)
- 模型的上下文窗口也是一个关键因素。智能体工作流程可能会消耗大量token,有时甚至超过 10 万个,较大的上下文窗口非常有帮助。
- 可供考虑的模型:
前沿模型(GPT4 - o、Claude 3.5);
2.2 定义智能体的控制逻辑(即通信结构)
- 工具使用:智能体决定何时将查询路由到适当的工具或依赖自身知识。
- 反思:智能体在回答用户之前审查并纠正自己的答案。大多数大语言模型系统也可以添加反思步骤。
- 推理 - 行动(ReAct):智能体迭代地推理如何解决查询,执行一个动作,观察结果,并决定是否采取另一个动作或提供响应。
- 计划 - 执行:智能体预先计划,将任务分解为子步骤(如果需要),然后执行每个步骤。
最后两种模式(ReAct 和计划 - 执行)通常是构建通用型单智能体的最佳起点。
示例:以下是来自 Bee Agent 框架的反应式(ReAct)风格智能体的提示(https://github.com/i-am-bee/bee-agent-framework/blob/main/src/agents/bee/prompts.ts)。
# Communication structure
You communicate only in instruction lines. The format is: "Instruction: expected output\n". You must only use these instruction lines and must not enter empty lines between them. Each instruction must start on a new line.
You must skip the instruction lines Function Name, Function Input and Function Output if no function calling is required.
Message: User's message. You never use this instruction line.
Thought: A single-line plan of how to answer the user's message, including an explanation of the reasoning behind it. It must be immediately followed by Final Answer.
Thought: A single-line step-by-step plan of how to answer the user's message, including an explanation of the reasoning behind it. You can use the available functions defined above. This instruction line must be immediately followed by Function Name if one of the available functions defined above needs to be called, or by Final Answer. Do not provide the answer here.
Function Name: Name of the function. This instruction line must be immediately followed by Function Input.
Function Input: Function parameters. Empty object is a valid parameter.
Function Output: Output of the function in JSON format.
Thought: Continue your thinking process.
Final Answer: Answer the user or ask for more information or clarification. It must always be preceded by Thought.
## Examples
Message: Can you translate "How are you" into French?
Thought: The user wants to translate a text into French. I can do that.
Final Answer: Comment vas-tu?
2.3 定义智能体的核心指令
- 智能体名称和角色:智能体的名称以及它的用途。
- 语气和简洁性:它应该听起来多么正式或随意,以及应该多么简短。
- 何时使用工具:决定何时依赖外部工具而不是模型自身的知识。
- 处理错误:当工具或过程出现问题时,智能体应该怎么做。
示例:以下是 Bee Agent 框架中指令部分的一个片段。
# Instructions
User can only see the Final Answer, all answers must be provided there.
You must always use the communication structure and instructions defined above. Do not forget that Thought must be a single-line immediately followed by Final Answer.
You must always use the communication structure and instructions defined above. Do not forget that Thought must be a single-line immediately followed by either Function Name or Final Answer.
You must use Functions to retrieve factual or historical information to answer the message.
If the user suggests using a function that is not available, answer that the function is not available. You can suggest alternatives if appropriate.
When the message is unclear or you need more information from the user, ask in Final Answer.
# Your capabilities
Prefer to use these capabilities over functions.
- You understand these languages: English, Spanish, French.
- You can translate, analyze and summarize, even long documents.
# Notes
- If you don't know the answer, say that you don't know.
- The current time and date in ISO format can be found in the last message.
- When answering the user, use friendly formats for time and date.
- Use markdown syntax for formatting code snippets, links, JSON, tables, images, files.
- Sometimes, things don't go as planned. Functions may not provide useful information on the first few tries. You should always try a few different approaches before declaring the problem unsolvable.
- When the function doesn't give you what you were asking for, you must either use another function or a different function input.
- When using search engines, you try different formulations of the query, possibly even in a different language.
- You cannot do complex calculations, computations, or data manipulations without using functions.
2.4 定义和优化核心工具
- 工具名称:功能的唯一、描述性名称。
- 工具描述:清晰解释工具的作用以及何时使用它。这有助于智能体确定何时选择正确的工具。
- 工具输入模式:一个模式,概述必需和可选参数、它们的类型以及任何约束。智能体根据用户查询使用此模式填写所需的输入。
- 指向运行工具的位置 / 方式。
示例:以下是来自 Langchain (https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/tools/arxiv/tool.py)社区的 Arxiv 工具实现的摘录。此实现需要一个 ArxivAPIWrapper 实现(https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/utilities/arxiv.py)。
"""Tool for the Arxiv API."""
from typing import Optional, Type
from langchain_core.callbacks import CallbackManagerForToolRun
from langchain_core.tools import BaseTool
from pydantic import BaseModel, Field
from langchain_community.utilities.arxiv import ArxivAPIWrapper
class ArxivInput(BaseModel):
"""Input for the Arxiv tool."""
query: str = Field(descriptinotallow="search query to look up")
class ArxivQueryRun(BaseTool): # type: ignore[override, override]
"""Tool that searches the Arxiv API."""
name: str = "arxiv"
description: str = (
"A wrapper around Arxiv.org "
"Useful for when you need to answer questions about Physics, Mathematics, "
"Computer Science, Quantitative Biology, Quantitative Finance, Statistics, "
"Electrical Engineering, and Economics "
"from scientific articles on arxiv.org. "
"Input should be a search query."
api_wrapper: ArxivAPIWrapper = Field(default_factory=ArxivAPIWrapper) # type: ignore[arg-type]
args_schema: Type[BaseModel] = ArxivInput
def _run(
query: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Use the Arxiv tool."""
return self.api_wrapper.run(query)
2.5 决定内存处理策略
大语言模型受其上下文窗口的限制,即它们一次可以 “记住” 的令牌数量。在多轮对话中的过去交互、冗长的工具输出或智能体所基于的额外上下文等情况下,内存可能会很快填满。这就是为什么拥有可靠的内存处理策略至关重要。
- 滑动内存:在内存中保留最后 k 轮对话,并丢弃更早的对话。
- token内存:保留最后 n 个token,丢弃其余的。
- 总结式内存:使用大语言模型在每一轮对话时总结对话内容,并丢弃单个消息
此外,你还可以让大语言模型检测关键时刻并存储在长期记忆中。这使得智能体能够 “记住” 关于用户的重要事实,从而使体验更加个性化。
User Message: Extract key insighs from this dataset
Files: bill-of-materials.csv
Thought: First, I need to inspect the columns of the dataset and provide basic data statistics.
Function Name: Python
Function Input: {"language":"python","code":"import pandas as pd\n\ndataset = pd.read_csv('bill-of-materials.csv')\n\nprint(dataset.columns)\nprint(dataset.describe())","inputFiles":["bill-of-materials.csv"]}
Function Output:
2.6 解析智能体的原始输出
对于我们正在构建的智能体,解析器需要识别我们在 2.2 中定义的通信结构,并返回结构化输出,如 JSON。这使应用程序更容易处理和执行智能体的下一步骤。
注意:一些模型提供商(如 OpenAI)默认可以返回可解析的输出。对于其他模型,尤其是开源模型,这需要进行配置。
2.7 编排Agent的下一步骤
- 执行工具调用,或者
- 返回答案,即对用户查询的最终响应或要求更多信息的后续请求。
def orchestrator(llm_agent, llm_output, tools, user_query):
Orchestrates the response based on LLM output and iterates if necessary.
- llm_agent (callable): The LLM agent function for processing tool outputs.
- llm_output (dict): Initial output from the LLM, specifying the next action.
- tools (dict): Dictionary of available tools with their execution methods.
- user_query (str): The original user query.
- str: The final response to the user.
while True:
action = llm_output.get("action")
if action == "tool_call":
# Extract tool name and parameters
tool_name = llm_output.get("tool_name")
tool_params = llm_output.get("tool_params", {})
if tool_name in tools:
# Execute the tool
tool_result = tools[tool_name](**tool_params)
# Send tool output back to the LLM agent for further processing
llm_output = llm_agent({"tool_output": tool_result})
except Exception as e:
return f"Error executing tool '{tool_name}': {str(e)}"
return f"Error: Tool '{tool_name}' not found."
elif action == "return_answer":
# Return the final answer to the user
return llm_output.get("answer", "No answer provided.")
return "Error: Unrecognized action type from LLM output."
2.8 多智能体系统在什么情况下适用
- 了解任务的哪些部分确实能从智能体方法中获益。
- 确定在更大的工作流程中可作为独立流程拆分出来的组件。
3. 入门建议
- 如果你计划使用像 Llama 3 这样的开源模型,可以尝试 Bee Agent Framework (https://github.com/i-am-bee/bee-agent-framework)的入门模板(https://github.com/i-am-bee/bee-agent-framework-starter)。
- 如果你计划使用像 OpenAI 这样的前沿模型,可以尝试 LangGraph 的教程(https://langchain-ai.github.io/langgraph/how-tos/react-agent-from-scratch/#define-nodes-and-edges)。
本文转载自鸿煊的学习笔记,作者: 鸿煊