The Proper Way to Generate Structured Data with LLMs

Every application that uses ML, especially LLMs, needs to parse the response from the model somehow. And programs usually need clear interfaces to interact with other systems so we need to expect a somewhat structured output from the LLM.

Usually, people create long prompts with the expected structure of the output and examples to get the JSON output.

Here’s for example what we are usually doing to get the output from LLM:

import openai
import os
import json

# Load your API key from an environment variable or secret management service
openai.api_key = os.getenv("OPENAI_API_KEY")

def generate_json(prompt):
    response = openai.Completion.create(
        max_tokens=150,  # Adjust as needed to get the full JSON

    # Extract the generated text from the response
    generated_text = response.choices[0].text.strip()

        # Parse the generated text as JSON
        generated_json = json.loads(generated_text)
    except json.JSONDecodeError:
        print("Failed to parse JSON. Generated text:")
        generated_json = {}

    return generated_json

# Define a prompt for generating JSON with detailed schema and example
prompt = """
Generate a JSON object representing a user's profile. The JSON schema should include the following fields:
- name (string)
- age (integer)
- email (string)
- interests (array of strings)

    "name": "John Doe",
    "age": 30,
    "email": "",
    "interests": ["reading", "hiking", "coding"]

Provide a JSON object based on this schema.

# Generate the JSON
generated_json = generate_json(prompt)

# Print the generated JSON
print(json.dumps(generated_json, indent=4))

However, there's no guarantee that you'll get valid JSON, so you also need to implement a lot of validation and retry logic.

What is Structured Generation?

Structured generation is a method used in natural language processing to make sure that the output from a language model follows a specific format or structure. This is especially useful when creating data that needs to fit into a predefined schema, like JSON or XML. The main goal is to produce outputs that are both correct in terms of structure and meaning.

To achieve this, developers define the rules and structure that the generated output must follow. They then ensure the output meets these rules by checking its correctness. Various methods, like regular expressions and Finite State Machines (FSMs), guide the generation process to stick to the required structure. Additionally, optimization techniques are used to make the generation process efficient, minimizing unnecessary computations while ensuring the output is correct.

By using structured generation, developers can create outputs that are consistent, valid, and ready for use in other applications or systems.


Now, I want to show you that there is a way to perform structured generation properly. The solution is completely open-source and works with any open-source LLM + OpenAI.

Outlines is a library for structured neural text generation.

When I first heard about the tool, I thought it was just another prompting library that takes your JSON structure and generates a long prompt for the LLM, making it slow and expensive. But it turns out that's not true.

  • So lets see some features of it.

  • then some examples of what you can do with it.

  • and then we will look into how it works underneath so those of you how are interested in learning will be pleased. i know i was.

Generate from Pydantic Model

Pydantic is a Python library used for data validation and settings management using Python type annotations. It allows developers to define data models with type annotations and automatically validates and converts input data to the specified types. Pydantic is often used in applications where data integrity is crucial, such as API development, configuration management, and data parsing.

Here's how we can generate data with Outines from the Pydantic model:

from pydantic import BaseModel

from outlines import models, generate

class User(BaseModel):
    name: str
    last_name: str
    id: int

model = models.transformers("mistralai/Mistral-7B-v0.1")
generator = generate.json(model, User)
result = generator(
    "Create a user profile with the fields name, last_name and id"
# User(name="John", last_name="Doe", id=11)

Generate from JSON Schema

from pydantic import BaseModel

from outlines import models
from outlines import generate

model = models.transformers("mistralai/Mistral-7B-v0.1")

schema = """
  "title": "User",
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "last_name": {"type": "string"},
    "id": {"type": "integer"}

generator = generate.json(model, schema)
result = generator(
    "Create a user profile with the fields name, last_name and id"
# User(name="John", last_name="Doe", id=11)

How is it Different?

You may ask: how is it different from the large prompt?

On the Outlines GitHub page, we can see these features:

  • It doesn't add any overhead during inference (cost-free)

  • It speeds up inference

  • It improves the performance of base models (GSM8K)

  • It improves the performance of finetuned models (CoNNL)

  • It improves model efficiency (less examples needed)

How Outlines Works Behind the Scenes

  1. Converting JSON to a Regular Expression:

    • JSON schema is transformed into a regular expression.

    • This regex ensures the generated output adheres to the JSON schema, making it parseable by Pydantic.

  2. Translating JSON Regex to a Finite State Machine (FSM):

    • The JSON regex is converted into an FSM.

    • The FSM controls the structured generation process by defining valid state transitions.

  3. Generating Valid JSON Using FSM:

    • Start from an initial state and generate allowed transition characters.

    • Follow transitions to new states until reaching a final state.

    • This guarantees that any generated string is valid according to the JSON schema.

  4. Optimization with Naive Character Merging:

    • Compress nodes with only one transition to skip unnecessary sampling steps.

    • This reduces the number of transitions, leading to faster generation.

  5. Working with Tokens:

    • Instead of individual characters, the FSM is adapted to work with tokens (fragments of words used by LLMs).

    • This requires deterministic transformation of the FSM to handle tokens efficiently.

  6. Coalescence for Speedup:

    • Identify and merge redundant paths in the FSM that lead to the same output.

    • Always append the longest token when multiple tokens share the same prefix.

    • This drastically reduces the number of calls to the LLM, resulting in a significant speedup (up to 5x).

  7. Maintaining Correctness:

    • Ensure that optimization does not prevent more likely sequences from being generated.

    • Properly balance speed and quality by considering the conditional probability distribution of sequences.

Long story short

Outlines optimizes JSON generation by reducing calls to the LLM when there's only one possible next token in the Finite State Machine (FSM).

Otherwise it uses the logits outputted from the LLM, which represent the probability distribution for the next token, to decide which next token to use from the allowed options in the FSM.

This approach ensures valid JSON output while enhancing performance by minimizing unnecessary calls to the LLM.

Is It Worth Using Outlines?

Yes, it's worth using Outlines. It ensures your generated data is valid and well-structured, while also speeding up the process by reducing unnecessary calls to the language model. This makes your applications more efficient and reliable.


In conclusion, Outlines provides an efficient and reliable way to generate valid JSON from large language models by leveraging Finite State Machines (FSMs) and optimizing token selection. By reducing unnecessary calls to the LLM and using the probability distribution of tokens, Outlines ensures both performance and correctness in structured generation. This innovative approach makes it a powerful tool for developers working with JSON and machine learning models.

For more detailed explanation check out the paper here: