
回复
Chonkie:实用的RAG分块库,轻量级、速度快,可随时对文本进行分块。
图片
Chonkie 提供了多个分块器,可高效地为RAG应用程序拆分文本。以下是可用分块器的简要概述:
图片
####
pip install chonkie
# First import the chunker you want from Chonkie
from chonkie import TokenChunker
# Import your favorite tokenizer library
# Also supports AutoTokenizers, TikToken and AutoTikTokenizer
from tokenizers import Tokenizer
tokenizer = Tokenizer.from_pretrained("gpt2")
# Initialize the chunker
chunker = TokenChunker(tokenizer)
# Chunk some text
chunks = chunker("Woah! Chonkie, the chunking library is so cool! I love the tiny hippo hehe.")
# Access chunks
for chunk in chunks:
print(f"Chunk: {chunk.text}")
print(f"Tokens: {chunk.token_count}")
https://github.com/bhavnicksm/chonkie
https://pypi.org/project/chonkie/
本文转载自PaperAgent