参考:
https://github.com/TommyTang930/LangChain_LLM_ChatBot
https://python.langchain.com/docs/integrations/vectorstores/faiss

1、文本切割RecursiveCharacterTextSplitter

这里对着类进行了改写,对中文切分更友好

import refrom typing import List, Optional, Anyfrom langchain.text_splitter import RecursiveCharacterTextSplitterimport logginglogger = logging.getLogger(__name__)def _split_text_with_regex_from_end(text: str, separator: str, keep_separator: bool) -> List[str]:# Now that we have the separator, split the textif separator:if keep_separator:# The parentheses in the pattern keep the delimiters in the result._splits = re.split(f"({separator})", text)splits = ["".join(i) for i in zip(_splits[0::2], _splits[1::2])]if len(_splits) % 2 == 1:splits += _splits[-