Jina AI’s jina-embeddings-v2: an open source text embedding model that matches OpenAI’s ada-002


Jina AI released jina-embeddings-v2-base-en, a text embedding model that matches OpenAI’s ada-002 model in both benchmark performance and context window length. Deploy jina-embeddings-v2 with Baseten to use this text embedding model for search, recommendation, clustering, and retrieval-augmented generation.

Text embedding models in under a minute

A text embedding model takes chunks of text and turns them into vectors. More literally: you give it a string, it gives you back a list of floating-point numbers. These vectors encode the semantic meaning of the string and are always the same length regardless of how long the input text is.

On its own, a single text embedding is pretty useless. But when you create embeddings of a corpus of text, you can do search, recommendation, clustering, and more by comparing the embeddings for similarity. Embeddings are especially useful when paired with LLMs for uses like retrieval-augmented generation.

How jina-embeddings-v2 compares to OpenAI ada-002

The jina-embeddings-v2 model matches up well against OpenAI’s ada-002 based on two key factors:

  1. Benchmark performance, where jina-embeddings-v2 scores an average of 60.38 compared to ada-002’s 60.99 and slightly outperforms ada-002 on certain subsets of the benchmark datasets.

  2. Context window, where both models allow up to 8,192 tokens to be processed into a single embedding.

The context window is especially important. Compared to other popular open-source text embedding models like all-MiniLM-L6-v2 (256 tokens) and all-mpnet-base-v2 (384 tokens), jina-embeddings-v2’s context window opens up new use cases. Where before you’d need to make an embedding for each page of a book, now you can make an embedding for each chapter.

One limitation of jina-embeddings-v2 is that it only works on English text, though its makers say that models for more languages are coming soon, starting with Spanish and German.

Deploying jina-embeddings-v2

The fastest way to get started is to deploy jina-embeddings-v2 from our model library. You’ll be up and running in just a couple of clicks. You can also deploy the model using Truss.

By default, the model is configured to run on a CPU-only instance with 4 cores and 16 GiB of RAM. This inexpensive instance is great for experimentation and making a small number of embeddings, but if you need to process a large amount of text quickly, you’ll need a larger instance type, which you can easily switch to in your Baseten dashboard.

Running inference on jina-embeddings-v2

The model takes a dictionary as input with two keys:

  • text: A list of strings. Each string will be encoded into a text embedding and returned.

  • max_length (optional): The number of tokens per string to encode. Default is 8,192, which is also the maximum number of tokens the model can process per string.

You can test the model with:

truss predict -d '{"text": ["I want to eat pasta", "I want to eat pizza"], "max_length": 8192}'

Inference on a single string should take less than a second. On the 4x16 CPU-only instance type, the model encoded all 154 of Shakespeare’s sonnets (about 100 KB of text) in 40 to 45 seconds.

Text embeddings models let you do really cool stuff with LLMs, and I’m excited to try jina-embeddings-v2 for some of my upcoming projects, like building a retrieval augmented generation application with entirely open source models.