我们聊聊如何构建通用LLM Agent 精华

发布于 2025-1-2 12:20

浏览

0收藏

我们聊聊如何构建通用LLM Agent-AI.x社区

1. LLM Agent是什么

大语言模型智能体是一种程序，其执行逻辑由其底层模型控制。

我们聊聊如何构建通用LLM Agent-AI.x社区

大语言模型智能体与少样本提示或固定工作流程等方法的区别在于，它能够定义和调整执行用户查询所需的步骤。通过使用一组工具（如代码执行或网络搜索），智能体可以决定使用哪个工具、如何使用它，并根据输出对结果进行迭代。这种适应性使系统能够以最小的配置处理各种用例。

我们聊聊如何构建通用LLM Agent-AI.x社区

智能体架构存在于一个范围内，从固定工作流程的可靠性到自主智能体的灵活性。例如，像检索增强生成（RAG）这样的固定流程可以通过自我反思循环进行增强，使程序在初始响应不足时能够进行迭代。或者，一个反应式（ReAct）智能体可以配备固定流程作为工具，提供一种灵活而结构化的方法。架构的选择最终取决于用例以及在可靠性和灵活性之间的期望权衡。

2. 构建通用型大语言模型智能体的步骤

2.1 选择合适的大语言模型

选择合适的模型对于实现期望的性能至关重要。需要考虑多个因素，如许可证、成本和语言支持。构建大语言模型智能体时，最重要的考虑因素是模型在关键任务（如编码、工具调用和推理）上的性能。评估基准包括：

大规模多任务语言理解（MMLU）（推理）
伯克利函数调用排行榜（工具选择与工具调用）
HumanEval 和 BigCodeBench（编码）
模型的上下文窗口也是一个关键因素。智能体工作流程可能会消耗大量token，有时甚至超过 10 万个，较大的上下文窗口非常有帮助。
可供考虑的模型：

前沿模型（GPT4 - o、Claude 3.5）；

开源模型（Llama3.2、Qwen2.5）。

一般来说，较大的模型往往性能更好，但能够在本地运行的较小模型仍然是一个不错的选择。使用较小的模型时，你将局限于更简单的用例，并且可能只能将智能体连接到一两个基本工具。

2.2 定义智能体的控制逻辑（即通信结构）

我们聊聊如何构建通用LLM Agent-AI.x社区

LLM和智能体之间的主要区别在于系统提示。在LLM的背景下，系统提示是在模型处理用户查询之前提供给它的一组指令和上下文信息。大语言模型的智能体行为可以在系统提示中进行编码。以下是一些常见的智能体模式，可以根据需要进行定制：

工具使用：智能体决定何时将查询路由到适当的工具或依赖自身知识。
反思：智能体在回答用户之前审查并纠正自己的答案。大多数大语言模型系统也可以添加反思步骤。
推理 - 行动（ReAct）：智能体迭代地推理如何解决查询，执行一个动作，观察结果，并决定是否采取另一个动作或提供响应。
计划 - 执行：智能体预先计划，将任务分解为子步骤（如果需要），然后执行每个步骤。

最后两种模式（ReAct 和计划 - 执行）通常是构建通用型单智能体的最佳起点。

我们聊聊如何构建通用LLM Agent-AI.x社区

为了有效地实现这些行为，你需要进行一些提示词工程。你可能还想使用结构化生成技术。这基本上意味着塑造大语言模型的输出以匹配特定的格式或模式，以便智能体的响应与你所期望的通信风格保持一致。

示例：以下是来自 Bee Agent 框架的反应式（ReAct）风格智能体的提示(https://github.com/i-am-bee/bee-agent-framework/blob/main/src/agents/bee/prompts.ts)。

# Communication structure
You communicate only in instruction lines. The format is: "Instruction: expected output\n". You must only use these instruction lines and must not enter empty lines between them. Each instruction must start on a new line.
{{#tools.length}}
You must skip the instruction lines Function Name, Function Input and Function Output if no function calling is required.
{{/tools.length}}


Message: User's message. You never use this instruction line.
{{^tools.length}}
Thought: A single-line plan of how to answer the user's message, including an explanation of the reasoning behind it. It must be immediately followed by Final Answer.
{{/tools.length}}
{{#tools.length}}
Thought: A single-line step-by-step plan of how to answer the user's message, including an explanation of the reasoning behind it. You can use the available functions defined above. This instruction line must be immediately followed by Function Name if one of the available functions defined above needs to be called, or by Final Answer. Do not provide the answer here.
Function Name: Name of the function. This instruction line must be immediately followed by Function Input.
Function Input: Function parameters. Empty object is a valid parameter.
Function Output: Output of the function in JSON format.
Thought: Continue your thinking process.
{{/tools.length}}
Final Answer: Answer the user or ask for more information or clarification. It must always be preceded by Thought.


## Examples
Message: Can you translate "How are you" into French?
Thought: The user wants to translate a text into French. I can do that.
Final Answer: Comment vas-tu?

2.3 定义智能体的核心指令

我们往往认为大语言模型开箱即用就带有许多功能。其中一些功能很棒，但其他功能可能不完全符合你的需求。为了获得你期望的性能，在系统提示中明确列出你想要和不想要的所有功能非常重要。这可能包括以下指令：

智能体名称和角色：智能体的名称以及它的用途。
语气和简洁性：它应该听起来多么正式或随意，以及应该多么简短。
何时使用工具：决定何时依赖外部工具而不是模型自身的知识。
处理错误：当工具或过程出现问题时，智能体应该怎么做。

示例：以下是 Bee Agent 框架中指令部分的一个片段。

# Instructions
User can only see the Final Answer, all answers must be provided there.
{{^tools.length}}
You must always use the communication structure and instructions defined above. Do not forget that Thought must be a single-line immediately followed by Final Answer.
{{/tools.length}}
{{#tools.length}}
You must always use the communication structure and instructions defined above. Do not forget that Thought must be a single-line immediately followed by either Function Name or Final Answer.
You must use Functions to retrieve factual or historical information to answer the message.
{{/tools.length}}
If the user suggests using a function that is not available, answer that the function is not available. You can suggest alternatives if appropriate.
When the message is unclear or you need more information from the user, ask in Final Answer.


# Your capabilities
Prefer to use these capabilities over functions.
- You understand these languages: English, Spanish, French.
- You can translate, analyze and summarize, even long documents.


# Notes
- If you don't know the answer, say that you don't know.
- The current time and date in ISO format can be found in the last message.
- When answering the user, use friendly formats for time and date.
- Use markdown syntax for formatting code snippets, links, JSON, tables, images, files.
- Sometimes, things don't go as planned. Functions may not provide useful information on the first few tries. You should always try a few different approaches before declaring the problem unsolvable.
- When the function doesn't give you what you were asking for, you must either use another function or a different function input.
  - When using search engines, you try different formulations of the query, possibly even in a different language.
- You cannot do complex calculations, computations, or data manipulations without using functions.

2.4 定义和优化核心工具

工具赋予智能体超能力。通过一组定义明确的有限工具，你可以实现广泛的功能。关键工具包括代码执行、网络搜索、文件读取和数据分析。对于每个工具，你需要定义以下内容并将其作为系统提示的一部分：

工具名称：功能的唯一、描述性名称。
工具描述：清晰解释工具的作用以及何时使用它。这有助于智能体确定何时选择正确的工具。
工具输入模式：一个模式，概述必需和可选参数、它们的类型以及任何约束。智能体根据用户查询使用此模式填写所需的输入。
指向运行工具的位置 / 方式。

示例：以下是来自 Langchain (https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/tools/arxiv/tool.py)社区的 Arxiv 工具实现的摘录。此实现需要一个 ArxivAPIWrapper 实现(https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/utilities/arxiv.py)。

"""Tool for the Arxiv API."""


from typing import Optional, Type


from langchain_core.callbacks import CallbackManagerForToolRun
from langchain_core.tools import BaseTool
from pydantic import BaseModel, Field


from langchain_community.utilities.arxiv import ArxivAPIWrapper




class ArxivInput(BaseModel):
    """Input for the Arxiv tool."""


    query: str = Field(descriptinotallow="search query to look up")




class ArxivQueryRun(BaseTool):  # type: ignore[override, override]
    """Tool that searches the Arxiv API."""


    name: str = "arxiv"
    description: str = (
        "A wrapper around Arxiv.org "
        "Useful for when you need to answer questions about Physics, Mathematics, "
        "Computer Science, Quantitative Biology, Quantitative Finance, Statistics, "
        "Electrical Engineering, and Economics "
        "from scientific articles on arxiv.org. "
        "Input should be a search query."
    )
    api_wrapper: ArxivAPIWrapper = Field(default_factory=ArxivAPIWrapper)  # type: ignore[arg-type]
    args_schema: Type[BaseModel] = ArxivInput


    def _run(
        self,
        query: str,
        run_manager: Optional[CallbackManagerForToolRun] = None,
    ) -> str:
        """Use the Arxiv tool."""
        return self.api_wrapper.run(query)

在某些情况下，仍需要优化工具以获得期望的性能。这可能涉及通过一些提示工程调整工具名称或描述，设置高级配置来处理常见错误，或者过滤工具的输出。

2.5 决定内存处理策略

大语言模型受其上下文窗口的限制，即它们一次可以 “记住” 的令牌数量。在多轮对话中的过去交互、冗长的工具输出或智能体所基于的额外上下文等情况下，内存可能会很快填满。这就是为什么拥有可靠的内存处理策略至关重要。

在智能体中，内存是指系统存储、记忆和利用过去交互信息的能力。这使智能体能够随着时间的推移保持上下文，根据先前的交流改进其响应，并提供更个性化的体验。

常见内存处理策略：

滑动内存：在内存中保留最后 k 轮对话，并丢弃更早的对话。
token内存：保留最后 n 个token，丢弃其余的。
总结式内存：使用大语言模型在每一轮对话时总结对话内容，并丢弃单个消息

此外，你还可以让大语言模型检测关键时刻并存储在长期记忆中。这使得智能体能够 “记住” 关于用户的重要事实，从而使体验更加个性化。

到目前为止我们所涵盖的五个步骤为设置智能体奠定了基础。但是，如果在这个阶段我们通过大语言模型运行用户查询会发生什么呢？

我们聊聊如何构建通用LLM Agent-AI.x社区

以下是可能出现的情况的一个示例：

User Message: Extract key insighs from this dataset
Files: bill-of-materials.csv
Thought: First, I need to inspect the columns of the dataset and provide basic data statistics.
Function Name: Python
Function Input: {"language":"python","code":"import pandas as pd\n\ndataset = pd.read_csv('bill-of-materials.csv')\n\nprint(dataset.columns)\nprint(dataset.describe())","inputFiles":["bill-of-materials.csv"]}
Function Output:

在这一点上，智能体会产生原始文本输出。那么我们如何让它实际执行下一步呢？这就是解析和编排发挥作用的地方。

2.6 解析智能体的原始输出

解析器是一种将原始数据转换为应用程序可以理解和处理的格式（如具有属性的对象）的函数。

对于我们正在构建的智能体，解析器需要识别我们在 2.2 中定义的通信结构，并返回结构化输出，如 JSON。这使应用程序更容易处理和执行智能体的下一步骤。

注意：一些模型提供商（如 OpenAI）默认可以返回可解析的输出。对于其他模型，尤其是开源模型，这需要进行配置。

2.7 编排Agent的下一步骤

最后一步是设置编排逻辑。这决定了大语言模型输出结果后会发生什么。根据输出，你将：

执行工具调用，或者
返回答案，即对用户查询的最终响应或要求更多信息的后续请求。

我们聊聊如何构建通用LLM Agent-AI.x社区

如果触发了工具调用，工具的输出将被发送回大语言模型（作为其工作记忆的一部分）。然后，大语言模型将决定如何处理这个新信息：要么执行另一个工具调用，要么向用户返回答案。

以下是这种编排逻辑在代码中可能呈现的示例：

def orchestrator(llm_agent, llm_output, tools, user_query):
    """
    Orchestrates the response based on LLM output and iterates if necessary.


    Parameters:
    - llm_agent (callable): The LLM agent function for processing tool outputs.
    - llm_output (dict): Initial output from the LLM, specifying the next action.
    - tools (dict): Dictionary of available tools with their execution methods.
    - user_query (str): The original user query.


    Returns:
    - str: The final response to the user.
    """
    while True:
        action = llm_output.get("action")


        if action == "tool_call":
            # Extract tool name and parameters
            tool_name = llm_output.get("tool_name")
            tool_params = llm_output.get("tool_params", {})


            if tool_name in tools:
                try:
                    # Execute the tool
                    tool_result = tools[tool_name](**tool_params)
                    # Send tool output back to the LLM agent for further processing
                    llm_output = llm_agent({"tool_output": tool_result})
                except Exception as e:
                    return f"Error executing tool '{tool_name}': {str(e)}"
            else:
                return f"Error: Tool '{tool_name}' not found."


        elif action == "return_answer":
            # Return the final answer to the user
            return llm_output.get("answer", "No answer provided.")


        else:
            return "Error: Unrecognized action type from LLM output."

现在你拥有了一个能够处理各种各样场景的系统，从竞争分析和高级研究到自动化复杂工作流程。

2.8 多智能体系统在什么情况下适用

虽然这一代大语言模型非常强大，但它们有一个关键限制：它们难以处理信息过载。过多的上下文或工具可能会使模型不堪重负，导致性能问题。通用型单智能体最终会遇到这个瓶颈，尤其是因为Agent通常消耗大量token。

对于某些应用场景而言，采用多智能体的设置可能更合理。通过将职责分配给多个智能体，你能够避免单个大语言模型智能体的上下文负担过重，并提高整体效率。

话虽如此，通用的单智能体设置对于制作原型来说是一个很好的开始。它可以帮助你快速测试你的使用案例，并确定在哪些地方开始出现问题。通过这个过程，你可以：

了解任务的哪些部分确实能从智能体方法中获益。
确定在更大的工作流程中可作为独立流程拆分出来的组件。

从单个智能体入手能让你获得宝贵的见解，以便在扩展到更复杂的系统时改进你的方法。

3. 入门建议

使用框架是快速测试和迭代智能体配置的好方法。

如果你计划使用像 Llama 3 这样的开源模型，可以尝试 Bee Agent Framework （https://github.com/i-am-bee/bee-agent-framework）的入门模板（https://github.com/i-am-bee/bee-agent-framework-starter）。
如果你计划使用像 OpenAI 这样的前沿模型，可以尝试 LangGraph 的教程（https://langchain-ai.github.io/langgraph/how-tos/react-agent-from-scratch/#define-nodes-and-edges）。

原文链接：https://towardsdatascience.com/build-a-general-purpose-ai-agent-c40be49e74

本文转载自鸿煊的学习笔记，作者：鸿煊

标签

LLM

Agent

OpenAI

51CTO

51CTO博客

51CTO学堂

我们聊聊如何构建通用LLM Agent 精华

1. LLM Agent是什么

2. 构建通用型大语言模型智能体的步骤

2.1 选择合适的大语言模型

2.2 定义智能体的控制逻辑（即通信结构）

2.3 定义智能体的核心指令

2.4 定义和优化核心工具

2.5 决定内存处理策略

2.6 解析智能体的原始输出

2.7 编排Agent的下一步骤

2.8 多智能体系统在什么情况下适用

3. 入门建议

目录