基于Gemini 2.0和LangGraph实现自主多工具AI代理-51CTO.COM

译者 | 朱先忠

审校 | 重楼

简介

当今，LLM应用非常出色——它们可以记住大量信息、回答一般知识问题、编写代码、生成故事，甚至还可以纠正语法错误，等等。然而，它们并非没有局限性。它们会产生幻觉，知识断点可能从几个月到几年不等，并且仅限于生成文本，无法与现实世界互动。显然，这限制了它们在需要实时数据、来源引用或文本生成以外功能的任务中的实用性。这正是代理和工具试图解决的主要问题：它们通过为LLM增加附加功能来弥补这一差距。这些改进使LLM能够访问最新的信息、与API交互、搜索，甚至能够影响例如调节智能家居的温度参数等的物理世界。

在本文中，我们将构建一个简单的LLM代理，该代理配备了四个工具，可用于回答用户的问题。具体地讲，此代理将实现以下功能：

可以使用最新的可验证信息回答一般知识问题。
可以使用四种类型的工具：DuckDuckGo搜索、获取网页内容、维基百科搜索、获取维基百科页面内容。
允许LLM推理用户的输入、消息历史记录、上一个工具调用及其结果，以决定是否接下来使用某个工具——如果使用的话，要使用哪个参数。
代理允许在每个时间步骤中同时使用多个工具。

基于Gemini 2.0和LangGraph实现自主多工具AI代理-AI.x社区

示例程序代理组件整体架构

实现过程中，我们将使用LangGraph作为代理实现，并使用Gemini 2.0作为LLM。但是，你将能够以最少的代码修改把它切换到大多数你的其他类型的LLM应用程序。本示例项目的完整代码位于地址：https://github.com/CVxTz/document_ai_agents/blob/master/document_ai_agents/document_multi_tool_agent.py

工具

首先，让我们构建其中一个工具，例如Web搜索工具：

from duckduckgo_search import DDGS
from pydantic import BaseModel

class PageSummary(BaseModel):
page_title: str
page_summary: str
page_url: str

class SearchResponse(BaseModel):
page_summaries: list[PageSummary]

def search_duck_duck_go(search_query: str) -> SearchResponse:
"""
通过 duckduckgo 页面进行搜索。
:参数search_query: 发送到 DuckDuckGo 搜索的查询。
每次只搜索一个项目，即使这意味着多次调用该工具。
:不返回任何值:
"""
max_results = 10

with DDGS() as dd:
results_generator = dd.text(
search_query,
max_results=max_results,
backend="api",
)

return SearchResponse(
page_summaries=[
PageSummary(
page_title=x["title"], page_summary=x["body"], page_url=x["href"]
)
for x in results_generator
]
)

在这里，我们的“工具”是一个简单的Python函数，它使用duckduckgo_search库来获取与搜索查询相关的搜索结果。LLM将根据用户的消息决定选择什么查询。此函数的输出如下所示：

{'page_summaries': [{'page_summary': 'Stevia is a plant-based sweetener that '
'is 200 to 400 times sweeter than sugar '
'and has no calories or carbohydrates. '
'Learn about its health benefits, side '
'effects, and how to use it in cooking '
'and baking.',
'page_title': 'Stevia: Health Benefits and Risks - WebMD',
'page_url': 'https://www.webmd.com/food-recipes/what-is-stevia'},
{'page_summary': 'Stevia is a herb that can be used as a '
'zero-calorie, zero-carb sugar '
'substitute. Learn about its history, '
'safety, potential health benefits and '
'drawbacks from Cleveland Clinic experts.',
'page_title': 'Stevia: What Is It and Is It Healthy? - '
'Cleveland Clinic Health Essentials',
'page_url': 'https://health.clevelandclinic.org/what-is-stevia'},
{'page_summary': 'Stevia is a sugar substitute extracted '
'from the leaves of Stevia rebaudiana, a '
'plant native to Paraguay and Brazil. '

...

接下来，让我们继续看一下get_wikipedia_page函数：

def get_wikipedia_page(page_title: str, max_text_size: int = 16_000):
"""
获取维基百科页面的全部内容
:参数page_title: 首先调用工具“search_wikipedia”以确保该页面存在。
:参数max_text_size: 默认值为16000
:return:
"""

page = wikipedia.page(title=page_title, auto_suggest=False)
full_content = strip_tags(page.html())
full_page = FullPage(
page_title=page.title,
page_url=page.url,
content=full_content[:max_text_size],
)

return full_page

此函数获取页面的完整html，并将其删除的html标签返回给调用者。输出如下所示：

{'content': 'Sweetener and sugar substitute\n'
'This article is about the sweetener. For other uses, see Stevia '
'(disambiguation).\n'
'\n'
'Stevia (/ˈstiːviə, ˈstɛviə/)[1][2] is a sweet sugar substitute '
'that is about 50 to 300\xa0times sweeter than sugar.[3] It is '
'extracted from the leaves of Stevia rebaudiana, a plant native to 
...
...
'^ Samuel, Priscilla; Ayoob, Keith T; Magnuson, Bernadene A; '
'Wölwer-Rieck, Ursula; Jeppesen, Per Bendix; Rogers, Peter J; '
'Rowland, Ian; Mathews, Rebecca (1 July 2018). "Stevia Leaf to '
'Stevia Sweetener: Exploring Its Science, Benefits, and Future '
'Potential". The Journal of Nutrition. 148 (7): 1186S – 1205S. '
'doi:10.1093/jn/nxy102. '
'hdl:1983/618532e2-6caa-4fbe-bd6e-6eb0cb608981. ISSN\xa00022-3166. '
'PMID\xa029982648.\n'
'\n'
'^ a b c d "Has stevia been approved by FDA to be used as a '
'sweetener?". US Foo',
'page_title': 'Stevia',
'page_url': 'https://en.wikipedia.org/wiki/Stevia'}

总的来说，我们定义了四个这样的函数：

search_wikipedia(search_query: str)
get_wikipedia_page(page_title: str, max_text_size: int = 16_000):
search_duck_duck_go(search_query: str)
get_page_content(page_title: str, page_url: str)

所有这些函数都传递给Gemini客户端，如下所示：

model = genai.GenerativeModel(
"gemini-2.0-flash-exp",
tools=[
get_wikipedia_page,
search_wikipedia,
search_duck_duck_go,
get_page_content,
]
)

客户端将从函数定义中推断出调用参数及其类型。它还将传递此生成的模型中的函数的文档字符串，因此我们需要向LLM解释该工具如何在此文档字符串中工作以获得最佳结果。

LLM

我们将使用Gemini客户端与LLM（特别是Gemini 2.0）进行交互。首先，你需要设置一个API密钥——可以从Google AI Studio获取。使用此客户端，我们将提示LLM生成响应或创建函数调用（或两者兼而有之）。然后，这些函数调用将应用于我们上面定义的工具。

当使用提示调用时，客户端可能会使用常规文本或包含函数调用的内容进行响应，例如：

{'function_call': 
{'name': 'search_wikipedia', 'args': {'search_query': 'Trey Parker'}}
}

这种类型的响应包含工具的名称及其参数。这允许LLM选择要访问哪些外部资源以及如何访问。

代理

以下是我们代理的完整实现（仅70行代码）：

class ToolCallAgent:
def __init__(self, tools: list[Callable], model_name="gemini-2.0-flash-exp"):
self.model_name = model_name
self.model = genai.GenerativeModel(
self.model_name,
tools=tools,
system_instruction="You are a helpful agent that has access to different tools. Use them to answer the "
"user's query if needed. Only use information from external sources that you can cite. "
"You can use multiple tools before giving the final answer. "
"If the tool response does not give an adequate response you can use the tools again with different inputs."
"Only respond when you can cite the source from one of your tools."
"Only answer I don't know after you have exhausted all ways to use the tools to search for that information.",
)
self.tools = tools
self.tool_mapping = {tool.__name__: tool for tool in self.tools}
self.graph = None
self.build_agent()
def call_llm(self, state: AgentState):
response = self.model.generate_content(
state.messages,
request_options=RequestOptions(
retry=retry.Retry(initial=10, multiplier=2, maximum=60, timeout=300)
),
)
return {
"messages": [
type(response.candidates[0].content).to_dict(
response.candidates[0].content
)
]
}
def use_tool(self, state: AgentState):
assert any("function_call" in part for part in state.messages[-1]["parts"])

tool_result_parts = []

for part in state.messages[-1]["parts"]:
if "function_call" in part:
name = part["function_call"]["name"]
func = self.tool_mapping[name]
result = func(**part["function_call"]["args"])
tool_result_parts.append(
{
"function_response": {
"name": name,
"response": result.model_dump(mode="json"),
}
}
)
return {"messages": [{"role": "tool", "parts": tool_result_parts}]}
@staticmethod
def should_we_stop(state: AgentState) -> str:
logger.debug(
f"Entering should_we_stop function. Current message: {state.messages[-1]}"
)  # Added log
if any("function_call" in part for part in state.messages[-1]["parts"]):
logger.debug(f"Calling tools: {state.messages[-1]['parts']}")
return "use_tool"
else:
logger.debug("Ending agent invocation")
return END
def build_agent(self):
builder = StateGraph(AgentState)
builder.add_node("call_llm", self.call_llm)
builder.add_node("use_tool", self.use_tool)

builder.add_edge(START, "call_llm")
builder.add_conditional_edges("call_llm", self.should_we_stop)
builder.add_edge("use_tool", "call_llm")
self.graph = builder.compile()

首先，在上述代码中，我们定义LLM客户端，其中包含一组工具和一个简单的提示。

其次，我们定义第一个节点“call_llm”，使用消息历史记录调用LLM API。

第三，我们定义一个工具调用节点，执行LLM请求的所有函数调用。

最后，我们定义代理的状态转换图：

基于Gemini 2.0和LangGraph实现自主多工具AI代理-AI.x社区

示例项目所定义代理的状态图

代理将继续运行，直到LLM的新响应不包含任何工具调用。这会在状态图中创建一个循环，并允许代理的执行优化搜索查询和工具使用，直到找到用户询问的内容。

示例运行

现在，让我们对这个代理进行测试运行。我们将从一个简单的查询开始。

英文提问：

Is puffer fish poisonous ? if so, explain why and list some other poisonous (not venomous) fish. Don’t cite Wikipedia only.

译文：

河豚有毒吗？如果是，请解释原因并列出一些其他有毒（无毒）的鱼。不要只引用维基百科。

以下是与代理交流的过程（我为了简洁而删减了大部分文本）：

User（用户）：

英文提问：

Is pufferfish poisonous? If so, explain why and list some other poisonous (not venomous) fish. Don’t cite Wikipedia only.

译文：

河豚有毒吗？如果是，请解释原因并列出一些其他有毒（无毒）的鱼。不要只引用维基百科。

模型（Model）：

function_call：
name：search_duck_duck_go
args：
search_query：are puffer fish poisonous

工具（Tool）：

function_response：
name：search_duck_duck_go
response：
page_summaries：
page_title: How The Poisonous Pufferfish Became A Delicacy (And The Deadly Mishaps …
page_summary: Pufferfish, or fugu, is a popular and expe….
page_url: https://www.tastingtable.com/1616127/poisonous-pufferfish-explained/
page_title: Are Puffer Fish Poisonous To Touch? Here’s What You Need To Know
page_summary: Puffer fish contain a deadly toxin called tetrodotoxin that can cause paralysis and death if ingested or touched. …
page_url: https://safeharborfishing.com/are-puffer-fish-poisonous-to-touch-heres-what-you-need-to-know/
….

部分译文：

page_title：有毒的河豚如何成为美味佳肴（以及致命的事故……
page_summary：河豚，或称河豚，是一种流行且昂贵的……
page_url：https://www.tastingtable.com/1616127/poisonous-pufferfish-explained/
page_title：触摸河豚有毒吗？以下是你需要了解的内容
page_summary：河豚含有一种致命的毒素，称为河豚毒素，如果摄入或接触会导致瘫痪和死亡。……
page_url： https://safeharborfishing.com/are-puffer-fish-poisonous-to-touch-heres-what-you-need-to-know/
….

模型（Model）响应：

Yes, pufferfish are poisonous. They contain a deadly neurotoxin called tetrodotoxin, which can cause paralysis and death if ingested or touched. This toxin is not destroyed by cooking or freezing. The pufferfish’s diet, which includes invertebrates and algae like mussels and shellfish, contributes to the presence of tetrodotoxin.

Here are some other poisonous (not venomous) fish:

Northern Stargazer: These fish have poisonous spines behind their eyes.
Toadfish: Certain species of toadfish have poisonous spines.
Rabbitfish: These fish have spines containing toxins.
…

译文：

是的，河豚有毒。它们含有一种致命的神经毒素，称为河豚毒素，如果摄入或接触会导致瘫痪和死亡。这种毒素不会被烹饪或冷冻破坏。河豚的食物包括无脊椎动物和藻类，如贻贝和贝类，这导致了河豚毒素的存在。
以下是其他一些有毒（无毒）的鱼：
北方观星鱼：这些鱼的眼睛后面有毒刺。
蟾鱼：某些种类的蟾鱼有有毒的刺。
兔鱼：这些鱼的刺含有毒素。
…

这次交流涉及一轮工具调用，包括使用DuckDuckGo进行网络搜索，这足以让LLM获得所需的所有信息。

现在，让我们尝试一个更复杂的场景，其中代理需要进行多次工具调用：

英文提问：

What is the number and season of the south park episode where they get time traveling immigrants? Who was the director of that episode? Where and when was he born ? Give me his wikipedia page link.

译文：

南方公园第几集和第几季有时间旅行移民？那一集的导演是谁？他出生在哪里和什么时候？给我有关它的维基百科页面链接。

如果没有工具支持，模型会回答：

英文提问：

The South Park episode you\’re referring to is **Season 18, Episode 7 titled “Grounded Vindaloop”**. It does not feature time-traveling immigrants, but rather focuses…

译文：

你提到的南方公园剧集是**第18季，第7集，标题为“Grounded Vindaloop”**。它没有时间旅行移民，而是关注……

显然，这个答案是错误的。

下面，借助于工具支持，有如下结果：

用户（User）：

What is the number and season of the South Park episode where they get time-traveling immigrants? Who was the director of that episode? Where and when was he born? Give me his Wikipedia page link.

译文：

南方公园第几季出现了穿越时空的移民？那一集的导演是谁？他何时何地出生？给我它的维基百科页面链接。

模型（Model）响应：

function_call:
name: search_wikipedia
args:
search_query: south park time traveling immigrants episode

工具（Tool）：

function_response:
name: search_wikipedia
response:
page_summaries:

page_title: Goobacks
page_summary: “Goobacks” is the seventh episode of the eighth season of the animated television series South Park, and the 118th overall episode of the series. In production order, …
page_url: https://en.wikipedia.org/wiki/Goobacks
page_title: List of time travel works of fiction
page_summary: Time travel is a common plot element in fiction. Works where it plays a prominent role are listed below. For stories of time travel in antiquity, see the history of the time travel concept.
page_url: https://en.wikipedia.org/wiki/List_of_time_travel_works_of_fiction
…

模型（Model）响应：

The South Park episode you’re referring to is called “Goobacks.” It is the seventh episode of the eighth season.

译文：

你提到的《南方公园》剧集名为“Goobacks”。它是第八季的第七集。

模型（Model）响应：

function_call:
name: get_wikipedia_page
args:
page_title: Goobacks

工具（Tool）：

function_response:
name: get_wikipedia_page
response:
page_title: Goobacks
page_url: https://en.wikipedia.org/wiki/Goobacks
content: 7th episode of the 8th season of South Park. “Goobacks” is the seventh episode of the eighth season of the animated television series South Park, and the 118th overall episode of the series. In production order, it is the season’s sixth episode. It originally aired on Comedy Central in the United States on April 28, 2004. The episode was directed by Trey Parker.

部分译文：

内容：《南方公园》第八季第七集。《Goobacks》是动画电视连续剧《南方公园》第八季第七集，也是该系列的第118集。按制作顺序，它是该季的第六集。它最初于2004年4月28日在美国喜剧中心播出。该集由Trey Parker执导。

模型（Model）响应：

The episode “Goobacks” was directed by Trey Parker.

译文：

“Goobacks”一集的导演是特雷·帕克 (Trey Parker)。

模型（Model）响应：

function_call:
name: search_wikipedia
args:
search_query: Trey Parker

工具（Tool）：

function_response:
name: search_wikipedia
response:
page_summaries:

page_title: Trey Parker
page_summary: Randolph Severn “Trey” Parker III (born October 19, 1969) is an American actor, animator, writer, producer, director, and musician…
page_url: https://en.wikipedia.org/wiki/Trey_Parker
…

模型（Model）响应：

Trey Parker was born on October 19, 1969.

译文：

崔·帕克出生于1969年10月19日。

模型（Model）响应：

function_call:
name: get_wikipedia_page
args:
page_title: Trey Parker

工具（Tool）：

function_response:
name: get_wikipedia_page
response:
page_title: Trey Parker
page_url: https://en.wikipedia.org/wiki/Trey_Parker
content: Randolph Severn “Trey” Parker III (born October 19, 1969) is an American actor, animator, writer, producer, director, and musician. He was born in Conifer, Colorado, U.S.

部分译文：

内容：兰道夫·塞文·“特雷”·帕克三世（1969年10月19日出生）是美国演员、动画师、作家、制片人、导演和音乐家。他出生于美国科罗拉多州康尼弗。

模型（Model）响应：

Trey Parker was born in Conifer, Colorado, U.S. on October 19, 1969.
Here is his Wikipedia page link: https://en.wikipedia.org/wiki/Trey_Parker

译文：

特雷·帕克于1969年10月19日出生于美国科罗拉多州康尼弗。
这是他的维基百科页面链接：https://en.wikipedia.org/wiki/Trey_Parker

该代理能够迭代地进行工具调用（总共四次调用），以回答我们这个复杂的问题。但是，其中明确的答案不在第一个结果中，而是需要多个推理步骤和搜索查询。

结论

我们在本文中开发了一个自主型的LLM代理，它能够搜索网络、浏览维基百科，并使用Gemini2.0作为其核心LLM推理检索到的信息。该代理不仅能够获取相关数据，还实现了根据初始结果优化其搜索查询，以找到用户请求的确切信息。

上述实现为创建你自己的自主代理奠定了良好的基础。通过将自定义工具定义为Python函数并将其集成到代理中，你可以轻松地根据你的特定需求进行调整。另外，开发本示例代理过程中得益于LangGraph框架，其实现的简单性使得整个定制过程变得更为简单。

现在，你可以开始构建适用于你自己的使用场景和应用程序的强大、合适的代理应用了。

本文完整源代码下载地址：https://github.com/CVxTz/document_ai_agents/blob/master/document_ai_agents/document_multi_tool_agent.py

译者介绍

朱先忠，51CTO社区编辑，51CTO专家博客、讲师，潍坊一所高校计算机教师，自由编程界老兵一枚。

原文标题：Building Autonomous Multi-Tool Agents with Gemini 2.0 and LangGraph，作者：Youness Mansar