Models We Love: July 2023


Another month, another round of foundation models we love! Of course we’ll be talking about Llama and FreeWilly, but we’ll also get into some models that may have slipped under your radar: LayoutLM Document QA and NSQL 350M. Let’s go!

Llama, llama, llama

Remember LLaMA, the 65-billion parameter large language model from Meta that was released less than six months ago? Well now we’ve got Llama 2! And no, that’s not just a name rebranding, but a very real next generation of the Llama model. According to Meta and Microsoft, Llama 2 has been trained on 40% more data than the original LLaMA model and has double the context length. This means that Llama 2 has more information available to it, making it better at understand meaning as well as generating relevant and meaningful responses.

Llama 2, the next generation of Llama, is here!

A high-performing camelid

Llama 2 is available in three sizes, 7B, 13B, and 70B, both as a base model as well as in fine-tuned variants. Trained on over 1 million human annotations, the fine-tuned variants are ideal for chat and dialogue-based use cases. Llama 2 truly is state-of-the-art, and is outperforming multiple existing models on a variety of external benchmarks, making it an exciting addition to the growing repository of available foundation models.

A selection of model benchmarks where higher numbers indicate better performance. These benchmarks test the model’s ability to perform multi-task language understanding and question answering, among other tasks.

Two Llama 2 projects that caught our attention

Sample Llama-2-chat-7B results

The Llama-2-chat models have been fine-tuned on instruction dataset and over 1 million human annotations, making it ideal for chat-based applications! 


What’s the meaning of life?


1import baseten
2# You can retrieve your deployed model version ID from the Baseten UI
3model = baseten.deployed_model_version_id('YOUR_MODEL_ID')
5request = {
6    "prompt": "What's the meaning of life?",
7    "temperature": 0.1,
8    "top_p": 0.75,
9    "top_k": 40,
10    "num_beams": 4,
13response = model.predict(request)


"Thank you for asking! The meaning of life is a complex and philosophical question that has been debated by scholars, philosophers, and theologians for centuries. There is no one definitive answer to this question, as it can vary depending on one's beliefs, values, and experiences. Ultimately, the meaning of life is a deeply personal and subjective question, and one that each individual must answer for themselves based on their own beliefs and values"

We’ve edited Llama’s response for length, but it did go on to list out the five following examples, along with explanations, and made sure to clarify that there were additional perspectives beyond these:

  1. Religious or spiritual perspective

  2. Personal growth and fulfillment

  3. Social or cultural perspective

  4. Existentialist perspective

  5. Hedonistic perspective

Deploying Llama 2 on Baseten

We’ve made each of the Llama 2 model versions available with Truss, our open source model serving and deployment framework:

And we also have the Llama-2-Chat-7B model in the Baseten model library for a quick and easy deployment. However! It’s important to note that you need to get Llama 2 access and provide a HuggingFace access token before deploying the model on Baseten: 

  1. Go to and request access using the email associated with your HuggingFace account (it’s important that the email addresses match!)

  2. Go to and request access.

  3. Once you have Llama access, create a HuggingFace access token.

  4. Set your HuggingFace access token as a secret in your Baseten account with the name hf_access_token.

Once you’ve completed steps 1–4 you’re ready to deploy Llama 2 7B chat to Baseten and start building!

Come on, Willy!

Fast-following the release of Llama 2, Stability AI and its CarperAI lab announced the release of two foundation models for research: FreeWilly1 and FreeWilly2. FreeWilly1 is a fine-tuned version of the original LLaMA 65B foundation model, whereas FreeWilly2 leverages the Llama 2 70B parameter model

Lisa Frank's got nothing on Stability AI

Orca methods, synthetic data

Curious as to how models derived from Llama ended up being named after the 1993 American family classic, Free Willy? Look no further than the methods used to train Orca, a 13B parameter model developed by Microsoft. When researchers noticed that their attempts to enhance the capabilities of smaller models were resulting in models that could imitate the style of a large foundation model, but not the reasoning, they did what any concerned parent would do and hired a tutor.

In this case the tutor was ChatGPT. By using ChatGPT to act as a teacher to guide the model, the model is able to learn progressively through AI-generated step-by-step explanations.

You ever see him jump that high?

Both FreeWilly1 and FreeWilly2 show exceptional results across a variety of benchmarks, and are placing competitively on the Open LLM Leaderboard. The model creators state that the “...models excel in many areas, including intricate reasoning, understanding linguistic subtleties, and answering complex questions related to specialized domains, e.g. Law and mathematical problem-solving.”

The Open LLM Leaderboard tracks, ranks, and evaluates large language models on a variety of tasks, including reasoning (ARC), sentence completion (HellaSwag), multi-task language understanding (MMLU), and mimicry of human falsehoods (TruthfulQA).

The FreeWilly models were released just days before this writing, so we haven’t seen too many examples out in the world, although our Chief Scientist had it up and running in no time:

Phil Howe's tweet about FreeWilly 2

Got a FreeWilly sighting? Tag us on Twitter!

SDXL 1.0 is live!

Stability AI is on fire this month, releasing not just FreeWilly1 and FreeWilly2, but SDXL 1.0 as well. If you're anything like us, then you fell in love with Stable Diffusion from the get-go, but with Stable Diffusion XL 1.0, we can't help but be awed by the richness and vibrancy of the colors, along with the depth of color and improvements in contrast, lighting and shadows. With a 3.5B parameter base model and a 6.6B parameter refiner, SDXL 1.0 represents one of the largest open access image models available today.

We love everything about this!

“The latest SDXL model represents the next step in Stability AI’s innovation heritage and ability to bring the most cutting-edge open access models to market for the AI community,” said Emad Mostaque, Chief Executive Officer of Stability AI.

Sample SDXL 1.0 results

Improvements to Stable Diffusion mean that SDXL is even easier to use, and generates high quality images out of the box, without the need for qualifying terms like "masterpiece." And according to Stability, "SDXL can understand the differences between concepts like “The Red Square” (a famous place) vs a “red square” (a shape)."

We're just getting started working with SDXL, and are already blown away with the results. Take a look for yourself!


A tree in a field under the night sky


1import baseten
3# You can retrieve your deployed model version ID from the UI
4model = baseten.deployed_model_version_id('MODEL_VERSION_ID')
6request = {
7    "prompt": "A tree in a field under the night sky",
8    "use_refiner": True
11response = model.predict(request)


Hey, it's our first attempt!

Deploy SDXL 1.0 on Baseten today

SDXL is ready to deploy right now on the Baseten model library--what are you waiting for?

Ask your documents questions

If you’ve ever had to go searching through a stack of documents to find the answer to a question, then you’ll immediately understand the appeal of this model. LayoutLM Document QA from Impira is a fine-tuned version of LayoutLM, using Evol-Instruct methodologies and the Stanford Question Answering Dataset (SQuAD2.0) and Document Visual Question Answering (DocVQA) datasets. As you might imagine, this positions LayoutLM Document QA at being exceptionally good at question answering on documents. Imagine what it can accomplish when given legal briefings, or a decade’s worth of invoices!

No GPU needed

One thing we love about this model is that there’s no GPU needed to get great results. In fact we found that LayoutLM DocumentQA runs reasonably fast with 4 vCPUs and 16 GiB of RAM! The first prompt on a document image takes just under 10 seconds, of which most of the time is needed to download the image, with additional queries being completed in under three seconds.

Sample LayoutLM Document QA results


What is the invoice number?


import baseten
model = baseten.deployed_model_id('YOUR MODEL ID')
model.predict({'url': '', 'prompt': 'What is the invoice number?'})


answer': 'us-001', 'end': 16, 'score': 0.42514994740486145, 'start': 16

Deploying LayoutLM Document QA on Baseten

Deploy LayoutLM Document QA on Baseten in just a couple of clicks when you deploy directly from the Baseten model library.

Generate SQL from natural language

One of the many capabilities of LLMs is the ability to generate code from natural language prompts, and we’d be remiss if we didn’t talk about NSQL 350M. Released earlier in July by Numbers Station, NSQL 350M is part of a broader family of open source SQL-generating foundation models.

Numbers Station Approach in NSQL Model

NSQL is a specialist foundation model

It’s common practice to train foundation models on a wide variety of sources, creating large generalist models that are adequate at a wide variety of tasks, but require further manipulation through techniques such as prompting or fine-tuning to be adapted to specific use cases. Take StarCoder as an example. Trained on over 80 programming languages, StarCoder is a fantastic coding assistant, and it becomes even more powerful when it’s been adapted to a specific language. 

Three techniques to adapt LLMs to any use case

NSQL, however, was trained specifically on SQL code from the web, and was pretrained over SQL queries using self-supervised learning before fine-tuning over instruction to SQL pairs. This means that NSQL can generate complex and functional SQL queries without further adaptation. What NSQL gives up in flexibility and adaptability in generating code in a wide variety of languages it gains in being accurate and precise in SQL generation.

Sample NSQL 350M results


What is the maximum, the average, and the minimum capacity of stadiums?


1import baseten
2# You can retrieve your deployed model ID from the UI
3model = baseten.deployed_model_version_id('YOUR_MODEL_ID')
5schema = """CREATE TABLE stadium (
6    stadium_id number,
7    location text,
8    name text,
9    capacity number,
10    highest number,
11    lowest number,
12    average number
15CREATE TABLE singer (
16    singer_id number,
17    name text,
18    country text,
19    song_name text,
20    song_release_year text,
21    age number,
22    is_male others
25CREATE TABLE concert (
26    concert_id number,
27    concert_name text,
28    theme text,
29    stadium_id text,
30    year text
33CREATE TABLE singer_in_concert (
34    concert_id number,
35    singer_id text
38request = {
39    "schema": schema,
40    "query": "What is the maximum, the average, and the minimum capacity of stadiums?"
43response = model.predict(request)


SELECT MAX(capacity), AVG(capacity), MIN(capacity) FROM stadium;

Deploying NSQL 350M on Baseten

We’ve made it easy to deploy NSQL 350M on Baseten in just a couple of clicks when you use the Baseten model library!

Talk to us!

If there are open source foundation models you’d like to see in the Baseten model library, let us know on Twitter or Threads! And if you have any questions or run into issues while deploying your model on Baseten, please reach out to us at We’d love to hear from you!