Boundary

HomeBlogPodcastTeamJobs
DocsStar on GitHub
2,107
Researchover 1 year ago9 min read

Prompting vs JSON Mode vs Function Calling vs Constrained Generation vs SAP

A technical explanation of every way to extract structed data from an LLM

Vaibhav Gupta

Vaibhav Gupta

Pre-reading: What is structured generation?

Structured generation is the idea of coercing the LLM to generate some data that you can parse into a data model and then programatically use.

The most common way to extract structured data / do function calling out of an LLM is to somehow get the LLM to output JSON, and then call JSON.parse.

However, there is no reason to assume that JSON, the prevalant serialization for Web APIs, should be the ideal serialization for LLMs. Given the stochastic nature of LLMs, it might even be true that all strict serialization formats are suboptimal, since a single error can cause the entire serialization to be invalid.

In this article, we'll:

  1. Explain how every current technique of structured data extraction works
  2. Discuss the pros and cons of each technique
  3. Introduce a new technique, SAP (Schema-Aligned Parsing), that achieves state-of-the-art accuracy on the Berkeley Function Calling Leaderboard (Jump to SAP)

Problem Space

Problem Definition
Given a QUERY, and a SCHEMA, the three things we can do to change the likelihood of generating an output are:
PromptChange how we construct the prompt and render the schema
PromptChange how tokens are generated
PromptChange how we parse the output of the model into our desired structure

All 9 Techniques

CategoryTechnique
PromptNaive Approach
Prompt Engineering
Prompt Engineering + Parsing
ModelJSON Mode
Constrained Generation
Function Calling
ParserLLM Retries
AST Parsing
SAP

Technique Comparison

We ran the most popular techniques on the Berkeley Function Calling Leaderboard dataset. Here are the results:

ModelFunction CallingPython AST ParserSAP
gpt-3.5-turbo87.5%75.8%92%
gpt-4o87.4%82.1%93%
claude-3-haiku57.3%82.6%91.7%
gpt-4o-mini19.8%51.8%92.4%
claude-3-5-sonnet78.1%93.8%94.4%
llama-3.17b-60.9%76.8%
Dataset had n=1000 per model and comes from Berkeley Function Calling Leaderboard thanks to the Gorilla Team.

Technique Breakdown

Technique : Naive Approach
Inject the question and the schema as JSON schema to the model, call JSON.parse(..) on the response
LLM Hammer
My Personal Rating: N/A / 5
No notes available.
Complete Propmt
Generate a resume in JSON format based on the SCHEMA defined below:

{
 type: "object",
 properties: {
    name: {
      type: "string",
      required: true,
    },
    contact: {
      type: "object",
      properties: {
        email: {
          type: "string",
          required: true,
        },
        phone: {
          type: "string",
          required: false,
        }
      },
      required: true,
    },
    education: {
      type: "array",
      items: {
        type: "object",
        properties: {
          institution: {
            type: "string",
            required: true,
          },
          degree: {
            type: "string",
            required: true,
          },
          year: {
            type: "string",
            required: true,
          }
        }
      },
      required: true,
    },
    experience: {
      type: "array",
      items: {
        type: "object",
        properties: {
          company: {
            type: "string",
            required: true,
          },
          role: {
            type: "string",
            required: true,
          },
          duration: {
            type: "string",
            required: true,
          }
        }
      },
      required: true,
    },
    skills: {
      type: "array",
      items: {
        type: "string"
      },
      required: true,
    }
 }
}
Prompt
Technique : Prompt Engineering
Try and better explain the desired format the model. Example: Ask it to not make common JSON mistakes
LLM Hammer
My Personal Rating: N/A / 5
No notes available.
Prompt
Technique : Prompt Engineering + Parsing
Add some programatic robustness. Only parse with JSON.parse conditionally.
LLM Hammer
My Personal Rating: N/A / 5
No notes available.
Model
Technique : JSON Mode
Restrict the tokens the model is allowed to generate to only those that would be JSON parseable. Stop once the model completes the JSON object.
Example: After already generating { "key", the LLM must choose a token that starts with : to ensure valid JSON.
LLM Hammer
My Personal Rating: N/A / 5
No notes available.
Model
Technique : Constrained Generation
The more general cousin of JSON mode. Instead of only allowing tokens which would produce valid JSON, only allow very specific tokens at any given step of token generation.
Example:
Take a grammar restriction: [0-9]{1,2}\.[0-9]{0,2} - a regular expression that matches numbers with one or two digits before the decimal point and zero to two digits after the decimal point.
We first only allow the LLM to pick tokens that match numbers.
After say, 83, the LLM would be forced to pick tokens that started with ..
Tip
In this scenario, due to Constrained Generation, the LLM could never generate a number larger than 99.99, since the grammar would remove any 3+ digit numbers.
LLM Hammer
My Personal Rating: N/A / 5
No notes available.
Model
Technique : Tools / Function Calling

The idea here is to fine-tune the model to intelligently trigger when to use JSON mode.
Example Generation:

  1. Teach the model a new special token USE_TOOL.
  2. Whenever the model generates the USE_TOOL token, switch to JSON mode for all subsequent tokens.
  3. Once the JSON is complete (detect this programatically, not by the model), allow the model to pick from all tokens, including USE_TOOL.
  4. Loop until the model emits the END_TOKEN token.

LLM Hammer
My Personal Rating: N/A / 5
No notes available.
Parser
Technique : LLM Retries
Retry the model until it produces something that is parseable or pass parsing errors to the model and hope that when it tries again, you can get good results. This is the technique libraries like Langchain, Instructor, and Marvin use to get structured data reliably.
LLM Hammer
My Personal Rating: N/A / 5
No notes available.
Parser
Technique : Language-Specific AST parsers
Rely on the model's inherent ability to output code, and existing Abstract Syntax Tree Parsers for reading code, and then transform that into JSON. Example output: [GetTriangleArea(base=5, height=10)] (Note, that this is valid python syntax) After python_to_json: { "GetTriangleArea": {"base": 5, "base": 10} }
LLM Hammer
My Personal Rating: N/A / 5
No notes available.
Parser
Technique : SAP
Instead of relying on the model to strictly understand our desired format, write a parser that generously reads the output text and applies error correction techniques with knowledge of the original schema.
LLM Hammer
My Personal Rating: N/A / 5
No notes available.

What is SAP and why does it work so well?

The key idea behind SAP is to assume that the model will make mistakes, and to build a parser that is robust enough to handle them. This would be virtually impossible for some tasks, but in the context of structured data extraction, we have a schema to guide us. A key inspiration to us was Postel's Law, coined by Jon Postel, the creator of TCP/IP:

Be conservative in what you do, be liberal in what you accept from others.

We'll do a future post that outlines exactly how our SAP algorithm works. At a very high-level, you can think of the leetcode problem "Edit Distance" but instead of comparing two strings, we ask: "What is least cost edit I need to make to get from the model's output to something parseable by a schema?" The simplest cost function could be Levenshtein distance, but we use a custom cost function that takes into account the schema. (The code is open-source, so you can check it out here)

Meanwhile, here are three examples to show some of the error correction techniques we use:

Handling Invalid JSON sequence
SchemaSAP
LLM ResponseSAP
Post-SAPSAP
Mistakes by the LLM fixed by SAP:
  • Included a comment
  • Used a fraction instead of a float
  • Forgot quotes around stands_for
  • Didn't escape the newline or " within the string
  • Included a trailing comma
Handling Invalid JSON sequence
SchemaSAP
LLM OutputSAP
Post-SAPSAP
Mistakes by the LLM fixed by SAP:
  • "Amazon" was returned as a string, but Founder.prior_jobs should be a string[]
Handling Yapping
LLM OutputSAP
Post-SAPSAP
Mistakes by the LLM fixed by SAP:
  • Included a lot of prefix and suffix text that is not relevant to our desired output, but may be relevant to the LLM to generate the output

Some more error correction techniques we use in SAP include:

  • Unquoted strings
  • Unescaped quotes and new lines in strings
  • Missing commas
  • Missing colons
  • Missing brackets
  • Misnamed keys
  • Cast fractions to floats
  • Remove superfluous keys in objects
  • Strip yapping
  • Picking the best of many possible candidates in case LLM produces multiple outputs
  • Complete partial objects (due to streaming)
  • and more

Using SAP today

We wanted to offer SAP in all languages of your choice, so we've written it in Rust and provide a native interface for Python, Typescript, and Ruby. (We're working on more languages, but we're a small team!)

Since all interfaces use the same Rust code, you can expect the same performance and accuracy across all languages.

You can try it on our online playground: https://promptfiddle.com or add it to your code base already. (Go to documentation)

Code Snippet

  1. Write a schema in BAML
// my_app/baml_src/my_schema.baml
class Resume {
  name string @description("first and last name")
  email string?
  experience Experience[]
}

class Experience {
  title string
  company string
}
  1. Write an LLM prompt in BAML
// my_app/baml_src/my_schema.baml
// ...

function ExtractResume(text: string) -> Resume {
  client "openai/gpt-4o"
  prompt #"
    Describe this resume.
    {{ ctx.output_format }}

    {{ _.role('user') }}
    {{ text }}
  "#
}
  1. Create bindings in your language of choice and use the BAML defined function as if it were a native function (with autocomplete and types!).

Python

$ pip install baml-py
$ baml-cli generate --from /path/to/my_schema.baml --target "python/pydantic"
from baml_client import b

# resume will always be a Pydantic model of type Resume
resume = b.ExtractResume("""
  Vaibhav Gupta
  vbv@boundaryml.com
  - Founder @ BoundaryML
""")

# BAML will automatically validate the response via SAP and cast it to a Pydantic model

Typescript

$ npm install @boundaryml/baml
$ ./node_modules/.bin/baml-cli generate --from /path/to/my_schema.baml --target "typescript"
import { b } from './baml_client'

// resume will always be a TypeScript interface of type Resume
const resume = await b.ExtractResume(`
  Vaibhav Gupta
  vbv@boundaryml.com
  - Founder @ BoundaryML
`);

// BAML will automatically validate the response via SAP and cast it to a TypeScript interface

Ruby

$ gem install baml
$ baml-cli generate --from /path/to/my_schema.baml --target "ruby/sorbet"
require 'baml_client'

# resume will always be a Sorbet model of type Resume
resume = b.ExtractResume(<<~TEXT)
  Vaibhav Gupta
  vbv@boundaryml.com
  - Founder @ BoundaryML
TEXT

# BAML will automatically validate the response via SAP and cast it to a Sorbet model

Does this really matter since models will get better?

I think one could make an arguement for no. If models get better, then JSON mode will be sufficient.

But I personally think performance always matters. If I had some technique that can get the same quality 50% faster or 50% cheaper, then I would obviously use it. Besides, as models get better, I would much rather they spend their training data on being better at understanding the world, rather than understanding my schema. The schema part we can solve with engineering.

Mandatory Blurb 🤝

Our mission at Boundary is to provide the best possible developer experience for shipping AI pipelines. We started with BAML, our new programming language for providing a boundary between stochastic AI models, and deterministic, type-safe code.

What's next? Likely we'll be showing off demos of how to combine function calling and structured generation with SAP to get the best of both worlds. Stay tuned!

If you enjoyed this article, please consider giving us a star on GitHub.

Boundary

Open source toolkit for AI development. Build type-safe AI applications with your team - all with confidence and reliability.

  • Company
  • About Us
  • Why BAML?
  • Privacy Policy
  • Terms of Service
  • Resources
  • Changelog
  • Docs
  • Jobs
  • Social
  • GitHub
  • Twitter
  • Discord
  • LinkedIn
  • YouTube