Quick and Dirty Vector DB

https://github.com/lummm/babyvec

BabyVec

'BabyVec' is intended to provide a simple API around common operations over embeddings of text.

It supports an optional HTTP interface via uvicorn and FastAPI.

Some examples of usage:

  1. Building an embedding 'store' for later use in an index.

Code: ./examples/embed_python_documentation.py

Since it is useful to compute embeddings on a beefy machine for later use on a smaller one, the computation of embeddings is generally treated separately from building an index from those embeddings.

  1. Building a searchable embedding 'index'.

Code: ./examples/search_python_documentation.py

Assuming the Python documentation embeddings were created via example 1, this loads them into a searchable index. This script runs a 'prompt / response' flow where the user's input is searched against the Python documentation.

Next steps

  • testing at larger volumes of data, particularly when all computed embeddings will not fit in RAM
  • testing on GPU
{
"by": "ltengelis",
"descendants": 1,
"id": 40249564,
"kids": [
40249565
],
"score": 1,
"time": 1714754256,
"title": "Quick and Dirty Vector DB",
"type": "story",
"url": "https://github.com/lummm/babyvec"
}
{
"author": "Black-Tusk-Data",
"date": null,
"description": "Tiny vector embedding interface for natural language processing - Black-Tusk-Data/babyvec",
"image": "https://opengraph.githubassets.com/33d9b7e7df4a07e16ab3fb755ac9699449031eb1e3a337ba989298a39759ab46/Black-Tusk-Data/babyvec",
"logo": "https://logo.clearbit.com/github.com",
"publisher": "GitHub",
"title": "GitHub - Black-Tusk-Data/babyvec: Tiny vector embedding interface for natural language processing",
"url": "https://github.com/Black-Tusk-Data/babyvec"
}
{
"url": "https://github.com/Black-Tusk-Data/babyvec",
"title": "GitHub - Black-Tusk-Data/babyvec: Tiny vector embedding interface for natural language processing",
"description": "BabyVec 'BabyVec' is intended to provide a simple API around common operations over embeddings of text. It supports an optional HTTP interface via uvicorn and FastAPI. Some examples of usage: Building an...",
"links": [
"https://github.com/Black-Tusk-Data/babyvec",
"https://github.com/lummm/babyvec"
],
"image": "https://opengraph.githubassets.com/33d9b7e7df4a07e16ab3fb755ac9699449031eb1e3a337ba989298a39759ab46/Black-Tusk-Data/babyvec",
"content": "<div><article><p></p><h2>BabyVec</h2><a target=\"_blank\" href=\"https://github.com/Black-Tusk-Data/babyvec#babyvec\"></a><p></p>\n<p>'BabyVec' is intended to provide a simple API around common operations over embeddings of text.</p>\n<p>It supports an optional HTTP interface via uvicorn and FastAPI.</p>\n<p>Some <a target=\"_blank\" href=\"https://github.com/Black-Tusk-Data/babyvec/blob/main/examples\">examples</a> of usage:</p>\n<ol>\n<li>Building an embedding 'store' for later use in an index.</li>\n</ol>\n<p>Code: <a target=\"_blank\" href=\"https://github.com/Black-Tusk-Data/babyvec/blob/main/examples/embed_python_documentation.py\">./examples/embed_python_documentation.py</a></p>\n<p>Since it is useful to compute embeddings on a beefy machine for later use on a smaller one, the computation of embeddings is generally treated separately from building an index from those embeddings.</p>\n<ol>\n<li>Building a searchable embedding 'index'.</li>\n</ol>\n<p>Code: <a target=\"_blank\" href=\"https://github.com/Black-Tusk-Data/babyvec/blob/main/examples/search_python_documentation.py\">./examples/search_python_documentation.py</a></p>\n<p>Assuming the Python documentation embeddings were created via example 1, this loads them into a searchable index. This script runs a 'prompt / response' flow where the user's input is searched against the Python documentation.</p>\n<p></p><h2>Next steps</h2><a target=\"_blank\" href=\"https://github.com/Black-Tusk-Data/babyvec#next-steps\"></a><p></p>\n<ul>\n<li>testing at larger volumes of data, particularly when all computed embeddings will not fit in RAM</li>\n<li>testing on GPU</li>\n</ul>\n</article></div>",
"author": "",
"favicon": "https://github.githubassets.com/favicons/favicon.svg",
"source": "github.com",
"published": "",
"ttr": 28,
"type": "object"
}