Introduction to Matryoshka Embedding Models

https://huggingface.co/blog/matryoshka

Back to Articles

In this blogpost, we will introduce you to the concept of Matryoshka Embeddings and explain why they are useful. We will discuss how these models are theoretically trained and how you can train them using Sentence Transformers.

Additionally, we will provide practical guidance on how to use Matryoshka Embedding models and share a comparison between a Matryoshka embedding model and a regular embedding model. Finally, we invite you to check out our interactive demo that showcases the power of these models.

Table of Contents

Understanding Embeddings

Embeddings are one of the most versatile tools in natural language processing, enabling practitioners to solve a large variety of tasks. In essence, an embedding is a numerical representation of a more complex object, like text, images, audio, etc.

embedding model

The embedding model will always produce embeddings of the same fixed size. You can then compute the similarity of complex objects by computing the similarity of the respective embeddings!

embedding similarity

This has an enormous amount of use cases, and serves as the backbone for recommendation systems, retrieval, one-shot or few-shot learning, outlier detection, similarity search, paraphrase detection, clustering, classification, and much more!

πŸͺ† Matryoshka Embeddings

As research progressed, new state-of-the-art (text) embedding models started producing embeddings with increasingly higher output dimensions, i.e., every input text is represented using more values. Although this improves performance, it comes at the cost of efficiency of downstream tasks such as search or classification.

Consequently, Kusupati et al. (2022) were inspired to create embedding models whose embeddings could reasonably be shrunk without suffering too much on performance.

matryoshka model

These Matryoshka embedding models are trained such that these small truncated embeddings would still be useful. In short, Matryoshka embedding models can produce useful embeddings of various dimensions.

πŸͺ† Matryoshka Dolls

For those unfamiliar, "Matryoshka dolls", also known as "Russian nesting dolls", are a set of wooden dolls of decreasing size that are placed inside one another. In a similar way, Matryoshka embedding models aim to store more important information in earlier dimensions, and less important information in later dimensions. This characteristic of Matryoshka embedding models allows us to truncate the original (large) embedding produced by the model, while still retaining enough of the information to perform well on downstream tasks.

matryoshka models

Why would you use πŸͺ† Matryoshka Embedding models?

Such variable-size embedding models can be quite valuable to practitioners, for example:

  1. Shortlisting and reranking: Rather than performing your downstream task (e.g., nearest neighbor search) on the full embeddings, you can shrink the embeddings to a smaller size and very efficiently "shortlist" your embeddings. Afterwards, you can process the remaining embeddings using their full dimensionality.
  2. Trade-offs: Matryoshka models will allow you to scale your embedding solutions to your desired storage cost, processing speed, and performance.

How are πŸͺ† Matryoshka Embedding models trained?

Theoretically

The Matryoshka Representation Learning (MRL) approach can be adopted for almost all embedding model training frameworks. Normally, a training step for an embedding model involves producing embeddings for your training batch (of texts, for example) and then using some loss function to create a loss value that represents the quality of the produced embeddings. The optimizer will adjust the model weights throughout training to reduce the loss value.

For Matryoshka Embedding models, a training step also involves producing embeddings for your training batch, but then you use some loss function to determine not just the quality of your full-size embeddings, but also the quality of your embeddings at various different dimensionalities. For example, output dimensionalities are 768, 512, 256, 128, and 64. The loss values for each dimensionality are added together, resulting in a final loss value. The optimizer will then try and adjust the model weights to lower this loss value.

In practice, this incentivizes the model to frontload the most important information at the start of an embedding, such that it will be retained if the embedding is truncated.

In Sentence Transformers

Sentence Tranformers is a commonly used framework to train embedding models, and it recently implemented support for Matryoshka models. Training a Matryoshka embedding model using Sentence Transformers is quite elementary: rather than applying some loss function on only the full-size embeddings, we also apply that same loss function on truncated portions of the embeddings.

For example, if a model has an original embedding dimension of 768, it can now be trained on 768, 512, 256, 128 and 64. Each of these losses will be added together, optionally with some weight:

from sentence_transformers import SentenceTransformer
from sentence_transformers.losses import CoSENTLoss, MatryoshkaLoss
model = SentenceTransformer("microsoft/mpnet-base")
base_loss = CoSENTLoss(model=model)
loss = MatryoshkaLoss(
    model=model,
    loss=base_loss,
    matryoshka_dims=[768, 512, 256, 128, 64],
    matryoshka_weight=[1, 1, 1, 1, 1],
)
model.fit(
    train_objectives=[(train_dataset, loss)],
    ...,
)

Training with MatryoshkaLoss does not incur a notable overhead in training time.

References:

See the following complete scripts as examples of how to apply the MatryoshkaLoss in practice:

  • matryoshka_nli.py: This example uses the MultipleNegativesRankingLoss with MatryoshkaLoss to train a strong embedding model using Natural Language Inference (NLI) data. It is an adaptation of the NLI documentation.
  • matryoshka_nli_reduced_dim.py: This example uses the MultipleNegativesRankingLoss with MatryoshkaLoss to train a strong embedding model with a small maximum output dimension of 256. It trains using Natural Language Inference (NLI) data, and is an adaptation of the NLI documentation.
  • matryoshka_sts.py: This example uses the CoSENTLoss with MatryoshkaLoss to train an embedding model on the training set of the STSBenchmark dataset. It is an adaptation of the STS documentation.

How do I use πŸͺ† Matryoshka Embedding models?

Theoretically

In practice, getting embeddings from a Matryoshka embedding model works the same way as with a normal embedding model. The only difference is that, after receiving the embeddings, we can optionally truncate them to a smaller dimensionality. Do note that if the embeddings were normalized, then after truncating they will no longer be, so you may want to re-normalize.

After truncating, you can either directly apply them for your use cases, or store them such that they can be used later. After all, smaller embeddings in your vector database should result in considerable speedups!

Keep in mind that although processing smaller embeddings for downstream tasks (retrieval, clustering, etc.) will be faster, getting the smaller embeddings from the model is just as fast as getting the larger ones.

In Sentence Transformers

In Sentence Transformers, you can load a Matryoshka Embedding model just like any other model, but you can specify the desired embedding size using the truncate_dim argument. After that, you can perform inference using the SentenceTransformers.encode function, and the embeddings will be automatically truncated to the specified size.

Let's try to use a model that I trained using matryoshka_nli.py with microsoft/mpnet-base:

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
matryoshka_dim = 64
model = SentenceTransformer("tomaarsen/mpnet-base-nli-matryoshka", truncate_dim=matryoshka_dim)
embeddings = model.encode(
    [
        "The weather is so nice!",
        "It's so sunny outside!",
        "He drove to the stadium.",
    ]
)
print(embeddings.shape)
# => (3, 64)
# Similarity of the first sentence to the other two:
similarities = cos_sim(embeddings[0], embeddings[1:])
print(similarities)
# => tensor([[0.8910, 0.1337]])

Feel free to experiment with using different values for matryoshka_dim and observe how that affects the similarities. You can do so either by running this code locally, on the cloud such as with Google Colab, or by checking out the demo.

References:

Click here to see how to use the Nomic v1.5 Matryoshka Model

Note: Nomic specifically requires an F.layer_norm before the embedding truncation. As a result, the following snippet uses manual truncation to the desired dimension. For all other models, you can use the truncate_dim option in the constructor, as shown in the previous example.

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
import torch.nn.functional as F
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
matryoshka_dim = 64
embeddings = model.encode(
    [
        "search_query: What is TSNE?",
        "search_document: t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map.",
        "search_document: Amelia Mary Earhart was an American aviation pioneer and writer.",
    ],
    convert_to_tensor=True,
)
# The Nomic team uses a custom architecture, making them recommend Layer Normalization before truncation
embeddings = F.layer_norm(embeddings, normalized_shape=(embeddings.shape[1],))
embeddings[..., :matryoshka_dim]  # Shrink the embedding dimensions
similarities = cos_sim(embeddings[0], embeddings[1:])
# => tensor([[0.7154, 0.4468]])

Results

Now that Matryoshka models have been introduced, let's look at the actual performance that we may be able to expect from a Matryoshka embedding model versus a regular embedding model. For this experiment, I have trained two models:

Both of these models were trained on the AllNLI dataset, which is a concatenation of the SNLI and MultiNLI datasets. I have evaluated these models on the STSBenchmark test set using multiple different embedding dimensions. The results are plotted in the following figure:

results

In the top figure, you can see that the Matryoshka model reaches a higher Spearman similarity than the standard model at all dimensionalities, indicative that the Matryoshka model is superior in this task.

Furthermore, the performance of the Matryoshka model falls off much less quickly than the standard model. This is shown clearly in the second figure, which shows the performance at the embedding dimension relative to the maximum performance. Even at 8.3% of the embedding size, the Matryoshka model preserves 98.37% of the performance, much higher than the 96.46% by the standard model.

These findings are indicative that truncating embeddings by a Matryoshka model could: 1) significantly speed up downstream tasks such as retrieval and 2) significantly save on storage space, all without a notable hit in performance.

Demo

In this demo, you can dynamically shrink the output dimensions of the nomic-ai/nomic-embed-text-v1.5 Matryoshka embedding model and observe how it affects the retrieval performance. All of the embeddings are computed in the browser using πŸ€— Transformers.js.

References

{
"by": "ekiauhce",
"descendants": 0,
"id": 40244858,
"score": 2,
"time": 1714718584,
"title": "Introduction to Matryoshka Embedding Models",
"type": "story",
"url": "https://huggingface.co/blog/matryoshka"
}
{
"author": null,
"date": null,
"description": "We’re on a journey to advance and democratize artificial intelligence through open source and open science.",
"image": "https://huggingface.co/blog/assets/matryoshka/thumbnail.png",
"logo": "https://logo.clearbit.com/huggingface.co",
"publisher": "Hugging Face",
"title": "πŸͺ† Introduction to Matryoshka Embedding Models",
"url": "https://huggingface.co/blog/matryoshka"
}
{
"url": "https://huggingface.co/blog/matryoshka",
"title": "πŸͺ† Introduction to Matryoshka Embedding Models",
"description": "Back to Articles In this blogpost, we will introduce you to the concept of Matryoshka Embeddings and explain why they are useful. We will discuss how these models are theoretically trained and how you can...",
"links": [
"https://huggingface.co/blog/matryoshka"
],
"image": "https://huggingface.co/blog/assets/matryoshka/thumbnail.png",
"content": "<div>\n\t\t\t\t<p><a target=\"_blank\" href=\"https://huggingface.co/blog\">\n\t\t\t\t\t\tBack to Articles</a></p>\n<p>In this blogpost, we will introduce you to the concept of Matryoshka Embeddings and explain why they are useful. We will discuss how these models are theoretically trained and how you can train them using Sentence Transformers.</p>\n<p>Additionally, we will provide practical guidance on how to use Matryoshka Embedding models and share a comparison between a Matryoshka embedding model and a regular embedding model. Finally, we invite you to check out our interactive demo that showcases the power of these models.</p>\n<h2>\n\t<a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#table-of-contents\">\n\t</a>\n\t<span>\n\t\tTable of Contents\n\t</span>\n</h2>\n<ul>\n<li><a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#understanding-embeddings\">Understanding Embeddings</a></li>\n<li><a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#%F0%9F%AA%86-matryoshka-embeddings\">πŸͺ† Matryoshka Embeddings</a></li>\n<li><a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#%F0%9F%AA%86-matryoshka-dolls\">πŸͺ† Matryoshka Dolls</a></li>\n<li><a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#why-would-you-use-%F0%9F%AA%86-matryoshka-embedding-models\">Why would you use πŸͺ† Matryoshka Embedding models?</a></li>\n<li><a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#how-are-%F0%9F%AA%86-matryoshka-embedding-models-trained\">How are πŸͺ† Matryoshka Embedding models trained?</a><ul>\n<li><a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#theoretically\">Theoretically</a></li>\n<li><a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#in-sentence-transformers\">In Sentence Transformers</a></li>\n</ul>\n</li>\n<li><a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#how-do-i-use-%F0%9F%AA%86-matryoshka-embedding-models\">How do I use πŸͺ† Matryoshka Embedding models?</a><ul>\n<li><a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#theoretically-1\">Theoretically</a></li>\n<li><a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#in-sentence-transformers-1\">In Sentence Transformers</a></li>\n</ul>\n</li>\n<li><a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#results\">Results</a></li>\n<li><a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#demo\">Demo</a></li>\n<li><a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#references\">References</a></li>\n</ul>\n<h2>\n\t<a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#understanding-embeddings\">\n\t</a>\n\t<span>\n\t\tUnderstanding Embeddings\n\t</span>\n</h2>\n<p>Embeddings are one of the most versatile tools in natural language processing, enabling practitioners to solve a large variety of tasks. In essence, an embedding is a numerical representation of a more complex object, like text, images, audio, etc. </p>\n<p><a target=\"_blank\" href=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/matryoshka/embedding_model.png\"><img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/matryoshka/embedding_model.png\" alt=\"embedding model\" /></a></p>\n<p>The embedding model will always produce embeddings of the same fixed size. You can then compute the similarity of complex objects by computing the similarity of the respective embeddings!</p>\n<p><a target=\"_blank\" href=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/matryoshka/embedding_similarity.png\"><img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/matryoshka/embedding_similarity.png\" alt=\"embedding similarity\" /></a></p>\n<p>This has an enormous amount of use cases, and serves as the backbone for recommendation systems, retrieval, one-shot or few-shot learning, outlier detection, similarity search, paraphrase detection, clustering, classification, and much more!</p>\n<h2>\n\t<a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#%F0%9F%AA%86-matryoshka-embeddings\">\n\t</a>\n\t<span>\n\t\tπŸͺ† Matryoshka Embeddings\n\t</span>\n</h2>\n<p>As research progressed, new state-of-the-art (text) embedding models started producing embeddings with increasingly higher output dimensions, i.e., every input text is represented using more values. Although this improves performance, it comes at the cost of efficiency of downstream tasks such as search or classification.</p>\n<p>Consequently, <a target=\"_blank\" href=\"https://huggingface.co/papers/2205.13147\">Kusupati et al.</a> (2022) were inspired to create embedding models whose embeddings could reasonably be shrunk without suffering too much on performance.</p>\n<p><a target=\"_blank\" href=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/matryoshka/matryoshka_model.png\"><img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/matryoshka/matryoshka_model.png\" alt=\"matryoshka model\" /></a></p>\n<p>These Matryoshka embedding models are trained such that these small truncated embeddings would still be useful. In short, Matryoshka embedding models can produce useful embeddings of various dimensions.</p>\n<h2>\n\t<a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#%F0%9F%AA%86-matryoshka-dolls\">\n\t</a>\n\t<span>\n\t\tπŸͺ† Matryoshka Dolls\n\t</span>\n</h2>\n<p>For those unfamiliar, \"Matryoshka dolls\", also known as \"Russian nesting dolls\", are a set of wooden dolls of decreasing size that are placed inside one another. In a similar way, Matryoshka embedding models aim to store more important information in earlier dimensions, and less important information in later dimensions. This characteristic of Matryoshka embedding models allows us to truncate the original (large) embedding produced by the model, while still retaining enough of the information to perform well on downstream tasks.</p>\n<p><a target=\"_blank\" href=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/matryoshka/matryoshka-small.gif\"><img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/matryoshka/matryoshka-small.gif\" alt=\"matryoshka models\" /></a></p>\n<h2>\n\t<a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#why-would-you-use-%F0%9F%AA%86-matryoshka-embedding-models\">\n\t</a>\n\t<span>\n\t\tWhy would you use πŸͺ† Matryoshka Embedding models?\n\t</span>\n</h2>\n<p>Such variable-size embedding models can be quite valuable to practitioners, for example:</p>\n<ol>\n<li><strong>Shortlisting and reranking</strong>: Rather than performing your downstream task (e.g., nearest neighbor search) on the full embeddings, you can shrink the embeddings to a smaller size and very efficiently \"shortlist\" your embeddings. Afterwards, you can process the remaining embeddings using their full dimensionality.</li>\n<li><strong>Trade-offs</strong>: Matryoshka models will allow you to scale your embedding solutions to your desired storage cost, processing speed, and performance.</li>\n</ol>\n<h2>\n\t<a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#how-are-%F0%9F%AA%86-matryoshka-embedding-models-trained\">\n\t</a>\n\t<span>\n\t\tHow are πŸͺ† Matryoshka Embedding models trained?\n\t</span>\n</h2>\n<h3>\n\t<a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#theoretically\">\n\t</a>\n\t<span>\n\t\tTheoretically\n\t</span>\n</h3>\n<p>The Matryoshka Representation Learning (MRL) approach can be adopted for almost all embedding model training frameworks. Normally, a training step for an embedding model involves producing embeddings for your training batch (of texts, for example) and then using some loss function to create a loss value that represents the quality of the produced embeddings. The optimizer will adjust the model weights throughout training to reduce the loss value.</p>\n<p>For Matryoshka Embedding models, a training step also involves producing embeddings for your training batch, but then you use some loss function to determine not just the quality of your full-size embeddings, but also the quality of your embeddings at various different dimensionalities. For example, output dimensionalities are 768, 512, 256, 128, and 64. The loss values for each dimensionality are added together, resulting in a final loss value. The optimizer will then try and adjust the model weights to lower this loss value.</p>\n<p>In practice, this incentivizes the model to frontload the most important information at the start of an embedding, such that it will be retained if the embedding is truncated. </p>\n<h3>\n\t<a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#in-sentence-transformers\">\n\t</a>\n\t<span>\n\t\tIn Sentence Transformers\n\t</span>\n</h3>\n<p><a target=\"_blank\" href=\"https://sbert.net/\">Sentence Tranformers</a> is a commonly used framework to train embedding models, and it recently implemented support for Matryoshka models. Training a Matryoshka embedding model using Sentence Transformers is quite elementary: rather than applying some loss function on only the full-size embeddings, we also apply that same loss function on truncated portions of the embeddings.</p>\n<p>For example, if a model has an original embedding dimension of 768, it can now be trained on 768, 512, 256, 128 and 64. Each of these losses will be added together, optionally with some weight:</p>\n<pre><code><span>from</span> sentence_transformers <span>import</span> SentenceTransformer\n<span>from</span> sentence_transformers.losses <span>import</span> CoSENTLoss, MatryoshkaLoss\nmodel = SentenceTransformer(<span>\"microsoft/mpnet-base\"</span>)\nbase_loss = CoSENTLoss(model=model)\nloss = MatryoshkaLoss(\n model=model,\n loss=base_loss,\n matryoshka_dims=[<span>768</span>, <span>512</span>, <span>256</span>, <span>128</span>, <span>64</span>],\n matryoshka_weight=[<span>1</span>, <span>1</span>, <span>1</span>, <span>1</span>, <span>1</span>],\n)\nmodel.fit(\n train_objectives=[(train_dataset, loss)],\n ...,\n)\n</code></pre>\n<p>Training with <code>MatryoshkaLoss</code> does not incur a notable overhead in training time.</p>\n<p>References:</p>\n<ul>\n<li><a target=\"_blank\" href=\"https://sbert.net/docs/package_reference/losses.html#matryoshkaloss\"><code>MatryoshkaLoss</code></a></li>\n<li><a target=\"_blank\" href=\"https://sbert.net/docs/package_reference/losses.html#cosentloss\"><code>CoSENTLoss</code></a></li>\n<li><a target=\"_blank\" href=\"https://sbert.net/docs/package_reference/SentenceTransformer.html\"><code>SentenceTransformer</code></a></li>\n<li><a target=\"_blank\" href=\"https://sbert.net/docs/package_reference/SentenceTransformer.html#sentence_transformers.SentenceTransformer.fit\"><code>SentenceTransformer.fit</code></a></li>\n<li><a target=\"_blank\" href=\"https://sbert.net/examples/training/matryoshka/README.html#training\">Matryoshka Embeddings - Training</a></li>\n</ul>\n<p>See the following complete scripts as examples of how to apply the <code>MatryoshkaLoss</code> in practice:</p>\n<ul>\n<li><strong><a target=\"_blank\" href=\"https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/matryoshka/matryoshka_nli.py\">matryoshka_nli.py</a></strong>: This example uses the <code>MultipleNegativesRankingLoss</code> with <code>MatryoshkaLoss</code> to train a strong embedding model using Natural Language Inference (NLI) data. It is an adaptation of the <a target=\"_blank\" href=\"https://huggingface.co/nli/README\">NLI</a> documentation.</li>\n<li><strong><a target=\"_blank\" href=\"https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/matryoshka/matryoshka_nli_reduced_dim.py\">matryoshka_nli_reduced_dim.py</a></strong>: This example uses the <code>MultipleNegativesRankingLoss</code> with <code>MatryoshkaLoss</code> to train a strong embedding model with a small maximum output dimension of 256. It trains using Natural Language Inference (NLI) data, and is an adaptation of the <a target=\"_blank\" href=\"https://huggingface.co/nli/README\">NLI</a> documentation.</li>\n<li><strong><a target=\"_blank\" href=\"https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/matryoshka/matryoshka_sts.py\">matryoshka_sts.py</a></strong>: This example uses the <code>CoSENTLoss</code> with <code>MatryoshkaLoss</code> to train an embedding model on the training set of the <code>STSBenchmark</code> dataset. It is an adaptation of the <a target=\"_blank\" href=\"https://huggingface.co/sts/README\">STS</a> documentation.</li>\n</ul>\n<h2>\n\t<a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#how-do-i-use-%F0%9F%AA%86-matryoshka-embedding-models\">\n\t</a>\n\t<span>\n\t\tHow do I use πŸͺ† Matryoshka Embedding models?\n\t</span>\n</h2>\n<h3>\n\t<a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#theoretically-1\">\n\t</a>\n\t<span>\n\t\tTheoretically\n\t</span>\n</h3>\n<p>In practice, getting embeddings from a Matryoshka embedding model works the same way as with a normal embedding model. The only difference is that, after receiving the embeddings, we can optionally truncate them to a smaller dimensionality. Do note that if the embeddings were normalized, then after truncating they will no longer be, so you may want to re-normalize.</p>\n<p>After truncating, you can either directly apply them for your use cases, or store them such that they can be used later. After all, smaller embeddings in your vector database should result in considerable speedups!</p>\n<p>Keep in mind that although processing smaller embeddings for downstream tasks (retrieval, clustering, etc.) will be faster, getting the smaller embeddings from the model is just as fast as getting the larger ones.</p>\n<h3>\n\t<a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#in-sentence-transformers-1\">\n\t</a>\n\t<span>\n\t\tIn Sentence Transformers\n\t</span>\n</h3>\n<p>In Sentence Transformers, you can load a Matryoshka Embedding model just like any other model, but you can specify the desired embedding size using the <code>truncate_dim</code> argument. After that, you can perform inference using the <a target=\"_blank\" href=\"https://sbert.net/docs/package_reference/SentenceTransformer.html#sentence_transformers.SentenceTransformer.encode\"><code>SentenceTransformers.encode</code></a> function, and the embeddings will be automatically truncated to the specified size.</p>\n<p>Let's try to use a model that I trained using <a target=\"_blank\" href=\"https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/matryoshka/matryoshka_nli.py\"><code>matryoshka_nli.py</code></a> with <a target=\"_blank\" href=\"https://huggingface.co/microsoft/mpnet-base\"><code>microsoft/mpnet-base</code></a>:</p>\n<pre><code><span>from</span> sentence_transformers <span>import</span> SentenceTransformer\n<span>from</span> sentence_transformers.util <span>import</span> cos_sim\nmatryoshka_dim = <span>64</span>\nmodel = SentenceTransformer(<span>\"tomaarsen/mpnet-base-nli-matryoshka\"</span>, truncate_dim=matryoshka_dim)\nembeddings = model.encode(\n [\n <span>\"The weather is so nice!\"</span>,\n <span>\"It's so sunny outside!\"</span>,\n <span>\"He drove to the stadium.\"</span>,\n ]\n)\n<span>print</span>(embeddings.shape)\n<span># =&gt; (3, 64)</span>\n<span># Similarity of the first sentence to the other two:</span>\nsimilarities = cos_sim(embeddings[<span>0</span>], embeddings[<span>1</span>:])\n<span>print</span>(similarities)\n<span># =&gt; tensor([[0.8910, 0.1337]])</span>\n</code></pre>\n<ul>\n<li>Link to the model: <a target=\"_blank\" href=\"https://huggingface.co/tomaarsen/mpnet-base-nli-matryoshka\">tomaarsen/mpnet-base-nli-matryoshka</a></li>\n</ul>\n<p>Feel free to experiment with using different values for <code>matryoshka_dim</code> and observe how that affects the similarities. You can do so either by running this code locally, on the cloud such as with <a target=\"_blank\" href=\"https://colab.research.google.com/#fileId=https%3A//huggingface.co/tomaarsen/mpnet-base-nli-matryoshka/blob/main/inference.ipynb\">Google Colab</a>, or by checking out the <a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#demo\">demo</a>.</p>\n<p>References:</p>\n<ul>\n<li><a target=\"_blank\" href=\"https://sbert.net/docs/package_reference/SentenceTransformer.html\"><code>SentenceTransformer</code></a></li>\n<li><a target=\"_blank\" href=\"https://sbert.net/docs/package_reference/SentenceTransformer.html#sentence_transformers.SentenceTransformer.encode\"><code>SentenceTransformer.encode</code></a></li>\n<li><a target=\"_blank\" href=\"https://sbert.net/docs/package_reference/util.html#sentence_transformers.util.cos_sim\"><code>util.cos_sim</code></a></li>\n<li><a target=\"_blank\" href=\"https://sbert.net/examples/training/matryoshka/README.html#inference\">Matryoshka Embeddings - Inference</a></li>\n</ul>\n<details><summary><b>Click here to see how to use the Nomic v1.5 Matryoshka Model</b></summary>\n<p>Note: Nomic specifically requires an <code>F.layer_norm</code> before the embedding truncation. As a result, the following snippet uses manual truncation to the desired dimension. For all other models, you can use the <code>truncate_dim</code> option in the constructor, as shown in the previous example.</p>\n<pre><code><span>from</span> sentence_transformers <span>import</span> SentenceTransformer\n<span>from</span> sentence_transformers.util <span>import</span> cos_sim\n<span>import</span> torch.nn.functional <span>as</span> F\nmodel = SentenceTransformer(<span>\"nomic-ai/nomic-embed-text-v1.5\"</span>, trust_remote_code=<span>True</span>)\nmatryoshka_dim = <span>64</span>\nembeddings = model.encode(\n [\n <span>\"search_query: What is TSNE?\"</span>,\n <span>\"search_document: t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map.\"</span>,\n <span>\"search_document: Amelia Mary Earhart was an American aviation pioneer and writer.\"</span>,\n ],\n convert_to_tensor=<span>True</span>,\n)\n<span># The Nomic team uses a custom architecture, making them recommend Layer Normalization before truncation</span>\nembeddings = F.layer_norm(embeddings, normalized_shape=(embeddings.shape[<span>1</span>],))\nembeddings[..., :matryoshka_dim] <span># Shrink the embedding dimensions</span>\nsimilarities = cos_sim(embeddings[<span>0</span>], embeddings[<span>1</span>:])\n<span># =&gt; tensor([[0.7154, 0.4468]])</span>\n</code></pre>\n<ul>\n<li>Link to the model: <a target=\"_blank\" href=\"https://huggingface.co/nomic-ai/nomic-embed-text-v1.5\">nomic-ai/nomic-embed-text-v1.5</a></li>\n</ul>\n</details>\n<h2>\n\t<a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#results\">\n\t</a>\n\t<span>\n\t\tResults\n\t</span>\n</h2>\n<p>Now that Matryoshka models have been introduced, let's look at the actual performance that we may be able to expect from a Matryoshka embedding model versus a regular embedding model. For this experiment, I have trained two models:</p>\n<ul>\n<li><a target=\"_blank\" href=\"https://huggingface.co/tomaarsen/mpnet-base-nli-matryoshka\">tomaarsen/mpnet-base-nli-matryoshka</a>: Trained by running <a target=\"_blank\" href=\"https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/matryoshka/matryoshka_nli.py\"><code>matryoshka_nli.py</code></a> with <a target=\"_blank\" href=\"https://huggingface.co/microsoft/mpnet-base\"><code>microsoft/mpnet-base</code></a>.</li>\n<li><a target=\"_blank\" href=\"https://huggingface.co/tomaarsen/mpnet-base-nli\">tomaarsen/mpnet-base-nli</a>: Trained by running a modified version of <a target=\"_blank\" href=\"https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/matryoshka/matryoshka_nli.py\"><code>matryoshka_nli.py</code></a> where the training loss is only <code>MultipleNegativesRankingLoss</code> rather than <code>MatryoshkaLoss</code> on top of <code>MultipleNegativesRankingLoss</code>. I also use <a target=\"_blank\" href=\"https://huggingface.co/microsoft/mpnet-base\"><code>microsoft/mpnet-base</code></a> as the base model.</li>\n</ul>\n<p>Both of these models were trained on the AllNLI dataset, which is a concatenation of the <a target=\"_blank\" href=\"https://huggingface.co/datasets/snli\">SNLI</a> and <a target=\"_blank\" href=\"https://huggingface.co/datasets/multi_nli\">MultiNLI</a> datasets. I have evaluated these models on the <a target=\"_blank\" href=\"https://huggingface.co/datasets/mteb/stsbenchmark-sts\">STSBenchmark</a> test set using multiple different embedding dimensions. The results are plotted in the following figure:</p>\n<p><a target=\"_blank\" href=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/matryoshka/results.png\"><img src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/matryoshka/results.png\" alt=\"results\" /></a></p>\n<p>In the top figure, you can see that the Matryoshka model reaches a higher Spearman similarity than the standard model at all dimensionalities, indicative that the Matryoshka model is superior in this task.</p>\n<p>Furthermore, the performance of the Matryoshka model falls off much less quickly than the standard model. This is shown clearly in the second figure, which shows the performance at the embedding dimension relative to the maximum performance. <strong>Even at 8.3% of the embedding size, the Matryoshka model preserves 98.37% of the performance</strong>, much higher than the 96.46% by the standard model.</p>\n<p>These findings are indicative that truncating embeddings by a Matryoshka model could: 1) significantly speed up downstream tasks such as retrieval and 2) significantly save on storage space, all without a notable hit in performance.</p>\n<h2>\n\t<a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#demo\">\n\t</a>\n\t<span>\n\t\tDemo\n\t</span>\n</h2>\n<p>In this demo, you can dynamically shrink the output dimensions of the <a target=\"_blank\" href=\"https://huggingface.co/nomic-ai/nomic-embed-text-v1.5\"><code>nomic-ai/nomic-embed-text-v1.5</code></a> Matryoshka embedding model and observe how it affects the retrieval performance. All of the embeddings are computed in the browser using <a target=\"_blank\" href=\"https://github.com/xenova/transformers.js\">πŸ€— Transformers.js</a>.</p>\n<h2>\n\t<a target=\"_blank\" href=\"https://huggingface.co/blog/matryoshka#references\">\n\t</a>\n\t<span>\n\t\tReferences\n\t</span>\n</h2>\n<ul>\n<li>Kusupati, A., Bhatt, G., Rege, A., Wallingford, M., Sinha, A., Ramanujan, V., ... &amp; Farhadi, A. (2022). Matryoshka representation learning. Advances in Neural Information Processing Systems, 35, 30233-30249. <a target=\"_blank\" href=\"https://arxiv.org/abs/2205.13147\">https://arxiv.org/abs/2205.13147</a></li>\n<li>Matryoshka Embeddings β€” Sentence-Transformers documentation. (n.d.). <a target=\"_blank\" href=\"https://sbert.net/examples/training/matryoshka/README.html\">https://sbert.net/examples/training/matryoshka/README.html</a></li>\n<li>UKPLab. (n.d.). GitHub. <a target=\"_blank\" href=\"https://github.com/UKPLab/sentence-transformers\">https://github.com/UKPLab/sentence-transformers</a></li>\n<li>Unboxing Nomic Embed v1.5: Resizable Production Embeddings with Matryoshka Representation Learning. (n.d.). <a target=\"_blank\" href=\"https://blog.nomic.ai/posts/nomic-embed-matryoshka\">https://blog.nomic.ai/posts/nomic-embed-matryoshka</a></li>\n</ul>\n</div>",
"author": "",
"favicon": "",
"source": "huggingface.co",
"published": "",
"ttr": 352,
"type": "website"
}