VDB之Chroma:Chroma/chromadb(一款优秀的向量数据库)的简介、安装、使用方法之详细攻略

目录

相关文章

DB之VDB:向量数据库(Vector Database)的简介、常用库、使用方法之详细攻略

chroma的简介

chroma的安装

chroma的使用方法

1、基础用法


相关文章

DB之VDB:向量数据库(Vector Database)的简介、常用库、使用方法之详细攻略

https://yunyaniu.blog.csdn.net/article/details/129106195

chroma的简介

2023年4月,Chroma获得1800万美元种子轮融资,除了机构投资者外,Chroma还获得了MongoDB、Scale、Hugging Face、Jasper等公司创始人或高管的投资,受到了整个生成式AI生态的欢迎。
Chroma是一个基于向量检索库实现的轻量级向量数据库,内置了入门所需的一切,并提供了简单的API。它目前只支持CPU计算,但可以利用乘积量化的方法,将一个向量的维度切成多段,每段分别进行k-means,从而减少存储空间和提高检索效率。它还可以与LangChain集成,实现基于语言模型的应用。

2023年6月正式发布Chroma,Chroma是一款AI 原生的开源嵌入数据库。Chroma 是一个开源的嵌入数据库。Chroma通过使知识事实和技能可以插拔到 LLM(语言模型)中,从而使构建 LLM 应用变得简单
Chroma 为您提供以下工具:存储嵌入及其元数据、嵌入文档和查询、搜索嵌入。Chroma 的优势在于简洁性和开发者生产力、在搜索之上的分析、它同时也非常快速。Chroma 包括 Python 客户端 SDK、JavaScript/TypeScript 客户端 SDK 和一个服务器应用程序。
在 Python 中,Chroma 可以在内存中运行,也可以在客户端/服务器(尚处于 alpha 阶段)模式下运行。在 JavaScript 中,Chroma 在客户端/服务器模式下运行,并与 Python 后端进行通信。
特点如下所示:
>> 轻松生成嵌入:Chroma 拥有您使用嵌入所需的所有工具
>> 简单易用:就像 pip install 一样,在笔记本中 5 秒内使用
>> 丰富的功能:搜索、过滤等等
>> 集成:直接插入 LangChain、LlamaIndex、OpenAI 等
>> JavaScript 客户端
>> 免费:Apache 2.0 开源许可
Chroma的优点是易用、轻量、智能,缺点是功能相对简单、不支持GPU加速。后续 Chroma 还会推出托管产品(Serverless 类产品),该产品将提供无服务器存储和检索功能,支持向上和向下扩展,让开发者开箱即用不需要自己搭建基础设施。

官网:Chroma

文档: Home | Chroma

GitHub:https://github.com/chroma-core/chroma

chroma的安装

pip install chromadbpip install -i https://pypi.tuna.tsinghua.edu.cn/simple chromadbnpm install --save chromadb # 或者 yarn add chromadb
C:\Windows\system32>pip install -i https://pypi.tuna.tsinghua.edu.cn/simple chromadbLooking in indexes: https://pypi.tuna.tsinghua.edu.cn/simpleCollecting chromadbDownloading https://pypi.tuna.tsinghua.edu.cn/packages/3c/ff/ac74735884031a3b9ddf7b1abecee0885ec61660588b1e7c6862bccf5116/chromadb-0.4.14-py3-none-any.whl (448 kB) |████████████████████████████████| 448 kB 726 kB/sRequirement already satisfied: typing-extensions>=4.5.0 in d:\programdata\anaconda3\lib\site-packages (from chromadb) (4.8.0)Collecting tqdm>=4.65.0Downloading https://pypi.tuna.tsinghua.edu.cn/packages/00/e5/f12a80907d0884e6dff9c16d0c0114d81b8cd07dc3ae54c5e962cc83037e/tqdm-4.66.1-py3-none-any.whl (78 kB) |████████████████████████████████| 78 kB 1.5 MB/sCollecting grpcio>=1.58.0Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ed/b2/f37fa2dc8b9942c5d444adee073d683ff23a31a418214cc7d80f53f3285c/grpcio-1.59.0-cp39-cp39-win_amd64.whl (3.7 MB) |████████████████████████████████| 3.7 MB 142 kB/sCollecting pulsar-client>=3.1.0Downloading https://pypi.tuna.tsinghua.edu.cn/packages/6f/13/b4b3f9282d274bacacf6268b946d00986ab14c35fe9f4113080bc9629ff8/pulsar_client-3.3.0-cp39-cp39-win_amd64.whl (3.4 MB) |████████████████████████████████| 3.4 MB 99 kB/sRequirement already satisfied: onnxruntime>=1.14.1 in d:\programdata\anaconda3\lib\site-packages (from chromadb) (1.14.1)Collecting chroma-hnswlib==0.7.3Downloading https://pypi.tuna.tsinghua.edu.cn/packages/f0/f0/e197039fa81a122544fceccda4e6a2d08bdd4a70638f0a88b6b16dbc4adc/chroma_hnswlib-0.7.3-cp39-cp39-win_amd64.whl (150 kB) |████████████████████████████████| 150 kB 344 kB/sRequirement already satisfied: tokenizers>=0.13.2 in d:\programdata\anaconda3\lib\site-packages (from chromadb) (0.13.3)Collecting pypika>=0.48.9Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c7/2c/94ed7b91db81d61d7096ac8f2d325ec562fc75e35f3baea8749c85b28784/PyPika-0.48.9.tar.gz (67 kB) |████████████████████████████████| 67 kB 468 kB/sInstalling build dependencies ... doneGetting requirements to build wheel ... donePreparing wheel metadata ... doneCollecting fastapi>=0.95.2Downloading https://pypi.tuna.tsinghua.edu.cn/packages/db/30/b8d323119c37e15b7fa639e65e0eb7d81eb675ba166ac83e695aad3bd321/fastapi-0.104.0-py3-none-any.whl (92 kB) |████████████████████████████████| 92 kB 122 kB/sCollecting bcrypt>=4.0.1Downloading https://pypi.tuna.tsinghua.edu.cn/packages/46/81/d8c22cd7e5e1c6a7d48e41a1d1d46c92f17dae70a54d9814f746e6027dec/bcrypt-4.0.1-cp36-abi3-win_amd64.whl (152 kB) |████████████████████████████████| 152 kB 192 kB/sCollecting typer>=0.9.0Downloading https://pypi.tuna.tsinghua.edu.cn/packages/bf/0e/c68adf10adda05f28a6ed7b9f4cd7b8e07f641b44af88ba72d9c89e4de7a/typer-0.9.0-py3-none-any.whl (45 kB) |████████████████████████████████| 45 kB 1.0 MB/sCollecting overrides>=7.3.1Downloading https://pypi.tuna.tsinghua.edu.cn/packages/da/28/3fa6ef8297302fc7b3844980b6c5dbc71cdbd4b61e9b2591234214d5ab39/overrides-7.4.0-py3-none-any.whl (17 kB)Collecting numpy>=1.22.5Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2d/ed/022fc4106f6d97e41e156201274138e0369b27dbfc8c206034f24ebd97d9/numpy-1.26.1-cp39-cp39-win_amd64.whl (15.8 MB) |████████████████████████████████| 15.8 MB 202 kB/sRequirement already satisfied: pydantic>=1.9 in d:\programdata\anaconda3\lib\site-packages (from chromadb) (2.4.2)Collecting requests>=2.28Downloading https://pypi.tuna.tsinghua.edu.cn/packages/70/8e/0e2d847013cb52cd35b38c009bb167a1a26b2ce6cd6965bf26b47bc0bf44/requests-2.31.0-py3-none-any.whl (62 kB) |████████████████████████████████| 62 kB 1.5 MB/sCollecting uvicorn[standard]>=0.18.3Downloading https://pypi.tuna.tsinghua.edu.cn/packages/79/96/b0882a1c3f7ef3dd86879e041212ae5b62b4bd352320889231cc735a8e8f/uvicorn-0.23.2-py3-none-any.whl (59 kB) |████████████████████████████████| 59 kB 3.3 MB/sCollecting posthog>=2.4.0Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a7/73/35758818228c70348be4c3c66a76653c62e894e0e3c3461453c5341ca926/posthog-3.0.2-py2.py3-none-any.whl (37 kB)Collecting importlib-resourcesDownloading https://pypi.tuna.tsinghua.edu.cn/packages/65/6e/09d8816b5cb7a4006ef8ad1717a2703ad9f331dae9717d9f22488a2d6469/importlib_resources-6.1.0-py3-none-any.whl (33 kB)Collecting starlette=0.27.0Downloading https://pypi.tuna.tsinghua.edu.cn/packages/58/f8/e2cca22387965584a409795913b774235752be4176d276714e15e1a58884/starlette-0.27.0-py3-none-any.whl (66 kB) |████████████████████████████████| 66 kB 1.2 MB/sCollecting anyio=3.7.1Downloading https://pypi.tuna.tsinghua.edu.cn/packages/19/24/44299477fe7dcc9cb58d0a57d5a7588d6af2ff403fdd2d47a246c91a3246/anyio-3.7.1-py3-none-any.whl (80 kB) |████████████████████████████████| 80 kB 1.7 MB/sRequirement already satisfied: idna>=2.8 in d:\programdata\anaconda3\lib\site-packages (from anyio=3.7.1->fastapi>=0.95.2->chromadb) (3.3)Collecting exceptiongroupDownloading https://pypi.tuna.tsinghua.edu.cn/packages/ad/83/b71e58666f156a39fb29417e4c8ca4bc7400c0dd4ed9e8842ab54dc8c344/exceptiongroup-1.1.3-py3-none-any.whl (14 kB)Requirement already satisfied: sniffio>=1.1 in d:\programdata\anaconda3\lib\site-packages (from anyio=3.7.1->fastapi>=0.95.2->chromadb) (1.2.0)Requirement already satisfied: protobuf in d:\programdata\anaconda3\lib\site-packages (from onnxruntime>=1.14.1->chromadb) (3.19.6)Requirement already satisfied: flatbuffers in d:\programdata\anaconda3\lib\site-packages (from onnxruntime>=1.14.1->chromadb) (2.0.7)Requirement already satisfied: sympy in d:\programdata\anaconda3\lib\site-packages (from onnxruntime>=1.14.1->chromadb) (1.10.1)Requirement already satisfied: packaging in d:\programdata\anaconda3\lib\site-packages (from onnxruntime>=1.14.1->chromadb) (21.3)Requirement already satisfied: coloredlogs in d:\programdata\anaconda3\lib\site-packages (from onnxruntime>=1.14.1->chromadb) (15.0.1)Requirement already satisfied: six>=1.5 in d:\programdata\anaconda3\lib\site-packages (from posthog>=2.4.0->chromadb) (1.16.0)Collecting backoff>=1.10.0Downloading https://pypi.tuna.tsinghua.edu.cn/packages/df/73/b6e24bd22e6720ca8ee9a85a0c4a2971af8497d8f3193fa05390cbd46e09/backoff-2.2.1-py3-none-any.whl (15 kB)Requirement already satisfied: monotonic>=1.5 in d:\programdata\anaconda3\lib\site-packages (from posthog>=2.4.0->chromadb) (1.5)Requirement already satisfied: python-dateutil>2.1 in d:\programdata\anaconda3\lib\site-packages (from posthog>=2.4.0->chromadb) (2.8.2)Requirement already satisfied: certifi in d:\programdata\anaconda3\lib\site-packages (from pulsar-client>=3.1.0->chromadb) (2021.10.8)Requirement already satisfied: annotated-types>=0.4.0 in d:\programdata\anaconda3\lib\site-packages (from pydantic>=1.9->chromadb) (0.6.0)Requirement already satisfied: pydantic-core==2.10.1 in d:\programdata\anaconda3\lib\site-packages (from pydantic>=1.9->chromadb) (2.10.1)Requirement already satisfied: charset-normalizer=2 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.28->chromadb) (2.0.12)Requirement already satisfied: urllib3=1.21.1 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.28->chromadb) (1.26.9)Requirement already satisfied: colorama in d:\programdata\anaconda3\lib\site-packages (from tqdm>=4.65.0->chromadb) (0.4.4)Requirement already satisfied: click=7.1.1 in d:\programdata\anaconda3\lib\site-packages (from typer>=0.9.0->chromadb) (8.0.4)Collecting h11>=0.8Downloading https://pypi.tuna.tsinghua.edu.cn/packages/95/04/ff642e65ad6b90db43e668d70ffb6736436c7ce41fcc549f4e9472234127/h11-0.14.0-py3-none-any.whl (58 kB) |████████████████████████████████| 58 kB 1.6 MB/sCollecting websockets>=10.4Downloading https://pypi.tuna.tsinghua.edu.cn/packages/f4/3f/65dfa50084a06ab0a05f3ca74195c2c17a1c075b8361327d831ccce0a483/websockets-11.0.3-cp39-cp39-win_amd64.whl (124 kB) |████████████████████████████████| 124 kB 1.7 MB/sCollecting httptools>=0.5.0Downloading https://pypi.tuna.tsinghua.edu.cn/packages/0a/0d/ca545a8a2831fc3e326fffecab268a2e7775e5ec4d57afc8f5ddc578cbd7/httptools-0.6.1-cp39-cp39-win_amd64.whl (60 kB) |████████████████████████████████| 60 kB 1.3 MB/sCollecting python-dotenv>=0.13Downloading https://pypi.tuna.tsinghua.edu.cn/packages/44/2f/62ea1c8b593f4e093cc1a7768f0d46112107e790c3e478532329e434f00b/python_dotenv-1.0.0-py3-none-any.whl (19 kB)Requirement already satisfied: pyyaml>=5.1 in d:\programdata\anaconda3\lib\site-packages (from uvicorn[standard]>=0.18.3->chromadb) (6.0)Collecting watchfiles>=0.13Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ef/c0/737ddb4c97efd7e4c98852973ca80d7fab65811d6c4d3a64182333d455b0/watchfiles-0.21.0-cp39-none-win_amd64.whl (280 kB) |████████████████████████████████| 280 kB 1.7 MB/sRequirement already satisfied: humanfriendly>=9.1 in d:\programdata\anaconda3\lib\site-packages (from coloredlogs->onnxruntime>=1.14.1->chromadb) (10.0)Requirement already satisfied: pyreadline3 in d:\programdata\anaconda3\lib\site-packages (from humanfriendly>=9.1->coloredlogs->onnxruntime>=1.14.1->chromadb) (3.4.1)Requirement already satisfied: zipp>=3.1.0 in d:\programdata\anaconda3\lib\site-packages (from importlib-resources->chromadb) (3.7.0)Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in d:\programdata\anaconda3\lib\site-packages (from packaging->onnxruntime>=1.14.1->chromadb) (3.0.4)Requirement already satisfied: mpmath>=0.19 in d:\programdata\anaconda3\lib\site-packages (from sympy->onnxruntime>=1.14.1->chromadb) (1.2.1)Building wheels for collected packages: pypikaBuilding wheel for pypika (PEP 517) ... doneCreated wheel for pypika: filename=PyPika-0.48.9-py2.py3-none-any.whl size=53835 sha256=211a200d50e727e0f22f68bd12cee283f29544900120b047b61d3826d1424f27Stored in directory: c:\users\99386\appdata\local\pip\cache\wheels\3b\17\34\0b716a7c87f148d258492c55fe890d051f5ca7bcf9e045e582Successfully built pypikaInstalling collected packages: exceptiongroup, h11, anyio, websockets, watchfiles, uvicorn, starlette, requests, python-dotenv, numpy, httptools, backoff, typer, tqdm, pypika, pulsar-client, posthog, overrides, importlib-resources, grpcio, fastapi, chroma-hnswlib, bcrypt, chromadbAttempting uninstall: anyioFound existing installation: anyio 3.5.0Uninstalling anyio-3.5.0:Successfully uninstalled anyio-3.5.0Attempting uninstall: requestsFound existing installation: requests 2.27.1Uninstalling requests-2.27.1:Successfully uninstalled requests-2.27.1Attempting uninstall: numpyFound existing installation: numpy 1.21.6Uninstalling numpy-1.21.6:Successfully uninstalled numpy-1.21.6Attempting uninstall: typerFound existing installation: typer 0.4.2Uninstalling typer-0.4.2:Successfully uninstalled typer-0.4.2Attempting uninstall: tqdmFound existing installation: tqdm 4.63.2Uninstalling tqdm-4.63.2:Successfully uninstalled tqdm-4.63.2Attempting uninstall: grpcioFound existing installation: grpcio 1.42.0Uninstalling grpcio-1.42.0:Successfully uninstalled grpcio-1.42.0Attempting uninstall: bcryptFound existing installation: bcrypt 3.2.0Uninstalling bcrypt-3.2.0:Successfully uninstalled bcrypt-3.2.0ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.daal4py 2021.5.0 requires daal==2021.4.0, which is not installed.conda-repo-cli 1.0.4 requires pathlib, which is not installed.anaconda-project 0.10.2 requires ruamel-yaml, which is not installed.xarray 2023.4.2 requires pandas>=1.4, but you have pandas 1.3.5 which is incompatible.thinc 8.1.10 requires pydantic!=1.8,!=1.8.1,=1.7.4, but you have pydantic 2.4.2 which is incompatible.streamlit 1.24.0 requires protobuf=3.20, but you have protobuf 3.19.6 which is incompatible.spacy 3.5.3 requires pydantic!=1.8,!=1.8.1,=1.7.4, but you have pydantic 2.4.2 which is incompatible.spacy 3.5.3 requires typer=0.3.0, but you have typer 0.9.0 which is incompatible.scipy 1.8.0 requires numpy=1.17.3, but you have numpy 1.26.1 which is incompatible.onnx 1.14.0 requires protobuf>=3.20.2, but you have protobuf 3.19.6 which is incompatible.numba 0.55.1 requires numpy=1.18, but you have numpy 1.26.1 which is incompatible.ludwig 0.7.4 requires transformers=4.10.1, but you have transformers 4.28.1 which is incompatible.jupyter-server 1.13.5 requires pywinpty<2; os_name == "nt", but you have pywinpty 2.0.2 which is incompatible.en-core-web-sm 3.0.0 requires spacy=3.0.0, but you have spacy 3.5.3 which is incompatible.Successfully installed anyio-3.7.1 backoff-2.2.1 bcrypt-4.0.1 chroma-hnswlib-0.7.3 chromadb-0.4.14 exceptiongroup-1.1.3 fastapi-0.104.0 grpcio-1.59.0 h11-0.14.0 httptools-0.6.1 importlib-resources-6.1.0 numpy-1.26.1 overrides-7.4.0 posthog-3.0.2 pulsar-client-3.3.0 pypika-0.48.9 python-dotenv-1.0.0 requests-2.31.0 starlette-0.27.0 tqdm-4.66.1 typer-0.9.0 uvicorn-0.23.2 watchfiles-0.21.0 websockets-11.0.3

chroma的使用方法

1、基础用法

import chromadb# setup Chroma in-memory, for easy prototyping. Can add persistence easily!client = chromadb.Client()# Create collection. get_collection, get_or_create_collection, delete_collection also available!collection = client.create_collection("all-my-documents")# Add docs to the collection. Can also update and delete. Row-based API coming soon!collection.add(documents=["This is document1", "This is document2"], # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as wellmetadatas=[{"source": "notion"}, {"source": "google-docs"}], # filter on these!ids=["doc1", "doc2"], # unique for each doc)# Query/search 2 most similar results. You can also .get by idresults = collection.query(query_texts=["This is a query document"],n_results=2,# where={"metadata_field": "is_equal_to_this"}, # optional filter# where_document={"$contains":"search_string"}# optional filter)