#1
5 months ago
🦄 Large Scale Classification
LLMs are great at classification from 5, 10, maybe even 50 categories. But how do we deal with situations when we have over 1000? Perhaps it's an ever changing list of categories?
Project Details
Open in GitHub🦄 large scale classification
​llms are great at classification from 5, 10, maybe even 50 categories. but how do we deal with situations when we have over 1000? perhaps its an ever changing list of categories?
Running this code
# Install dependencies
uv sync
# Convert BAML files -> Python
uv run baml-cli generate
# Run the code
uv run hello.py
Followup Exercise - Tool Selection from 100s of tools
If you want to play with this code and try to extend it, you can try this exercise.
- Skim the file at ./tools.json
- Load in the list of tools as
Category
or create a similar class forTool
- Implement
f(tool) -> string
for embedding text andg(tool) -> string
for LLM text - Update the code to embed and search a user query to select the topk most likely tools
- Explore some different use inputs for ambiguous tools, see how accurate you can get it
If you want to add more MCP servers or other tools, the code to generate the json is at https://github.com/dexhorthy/thousands-of-tools-mcp
Followup Exercise - Post-LLM probe
- Change the core LLM prompt to select out a
Category[]
instead of a singleCategory
- Add a follow up step (deterministic or LLM-based) to take a list of
Category[]
and select out a finalCategory
- Write some examples where the final probe can solve closely-overlapping Categories
- If you did the tool selection exercise, you can use
Tool
instead ofCategory
if you prefer