Loading...
Loading...
Avg 1561.4 stars per repo.
249 followers.
TensorZero is an open-source stack for industrial-grade LLM applications:
Take what you need, adopt incrementally, and complement with other tools.
Website
·
Docs
·
Twitter
·
Slack
·
Discord
Quick Start (5min)
·
Deployment Guide
·
API Reference
·
Configuration Reference
[!NOTE]
Coming Soon: TensorZero Autopilot
TensorZero Autopilot is an automated AI engineer (powered by the TensorZero Stack) that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. Learn more Join the waitlist
Integrate with TensorZero once and access every major LLM provider.
Supported Model Providers: Anthropic, AWS Bedrock, AWS SageMaker, Azure, DeepSeek, Fireworks, GCP Vertex AI Anthropic, GCP Vertex AI Gemini, Google AI Studio (Gemini API), Groq, Hyperbolic, Mistral, OpenAI, OpenRouter, SGLang, TGI, Together AI, vLLM, and xAI (Grok). Need something else? TensorZero also supports any OpenAI-compatible API (e.g. Ollama).
You can access any provider using the TensorZero Python SDK.
pip install tensorzerofrom tensorzero import TensorZeroGateway # or AsyncTensorZeroGateway
with TensorZeroGateway.build_embedded(...) as t0:
response = t0.inference(
model_name="openai::gpt-4o-mini",
# Try other providers easily: "anthropic::claude-sonnet-4-5"
input={
"messages": [
{
"role": "user",
"content": "Write a haiku about TensorZero.",
}
]
},
)
See Quick Start for more information.
You can access any provider using the OpenAI Python SDK with TensorZero.
pip install tensorzerofrom openai import OpenAI # or AsyncOpenAI
from tensorzero import patch_openai_client
client = OpenAI()
patch_openai_client(client, ...)
response = client.chat.completions.create(
model="tensorzero::model_name::openai::gpt-4o-mini",
# Try other providers easily: "tensorzero::model_name::anthropic::claude-sonnet-4-5"
messages=[
{
"role": "user",
"content": "Write a haiku about TensorZero.",
}
],
)
See Quick Start for more information.
You can access any provider using the OpenAI Node SDK with TensorZero.
tensorzero/gateway using Docker.
Detailed instructions →import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:3000/openai/v1",
});
const response = await client.chat.completions.create({
model: "tensorzero::model_name::openai::gpt-4o-mini",
// Try other providers easily: "tensorzero::model_name::anthropic::claude-sonnet-4-5"
messages: [
{
role: "user",
content: "Write a haiku about TensorZero.",
},
],
});
See Quick Start for more information.
TensorZero supports virtually any programming language or platform via its HTTP API.
tensorzero/gateway using Docker.
Detailed instructions →curl -X POST "http://localhost:3000/inference" \
-H "Content-Type: application/json" \
-d '{
"model_name": "openai::gpt-4o-mini",
"input": {
"messages": [
{
"role": "user",
"content": "Write a haiku about TensorZero."
}
]
}
}'
See Quick Start for more information.
Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time — all using the open-source TensorZero UI.
| Observability » UI | Observability » Programmatic |
|
Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies — using the UI or programmatically.
Compare prompts, models, and inference strategies using evaluations powered by heuristics and LLM judges.
| Evaluation » UI | Evaluation » CLI |
|
Ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.
Build with an open-source stack well-suited for prototypes but designed from the ground up to support the most complex LLM applications and deployments.
How is TensorZero different from other LLM frameworks?
Can I use TensorZero with ___?
Yes. Every major programming language is supported. It plays nicely with the OpenAI SDK, OpenTelemetry, and every major LLM.
Is TensorZero production-ready?
Yes. TensorZero is used by companies ranging from frontier AI startups to the Fortune 50.
Here's a case study: Automating Code Changelogs at a Large Bank with LLMs
How much does TensorZero cost?
TensorZero Stack (LLMOps platform) is 100% self-hosted and open-source.
TensorZero Autopilot (automated AI engineer) is a complementary paid product powered by the TensorZero Stack.
Who is building TensorZero?
Our technical team includes a former Rust compiler maintainer, machine learning researchers (Stanford, CMU, Oxford, Columbia) with thousands of citations, and the chief product officer of a decacorn startup. We're backed by the same investors as leading open-source projects (e.g. ClickHouse, CockroachDB) and AI labs (e.g. OpenAI, Anthropic). See our $7.3M seed round announcement and coverage from VentureBeat. We're hiring in NYC.
How do I get started?
You can adopt TensorZero incrementally. Our Quick Start goes from a vanilla OpenAI wrapper to a production-ready LLM application with observability and fine-tuning in just 5 minutes.
Watch LLMs get better at data extraction in real-time with TensorZero!
Dynamic in-context learning (DICL) is a powerful inference-time optimization available out of the box with TensorZero. It enhances LLM performance by automatically incorporating relevant historical examples into the prompt, without the need for model fine-tuning.
https://github.com/user-attachments/assets/4df1022e-886e-48c2-8f79-6af3cdad79cb
Start building today. The Quick Start shows it's easy to set up an LLM application with TensorZero.
Questions? Ask us on Slack or Discord.
Using TensorZero at work? Email us at hello@tensorzero.com to set up a Slack or Teams channel with your team (free).
We are working on a series of complete runnable examples illustrating TensorZero's data & learning flywheel.
Optimizing Data Extraction (NER) with TensorZero
This example shows how to use TensorZero to optimize a data extraction pipeline. We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL). In the end, an optimized GPT-4o Mini model outperforms GPT-4o on this task — at a fraction of the cost and latency — using a small amount of training data.
Agentic RAG — Multi-Hop Question Answering with LLMs
This example shows how to build a multi-hop retrieval agent using TensorZero. The agent iteratively searches Wikipedia to gather information, and decides when it has enough context to answer a complex question.
Writing Haikus to Satisfy a Judge with Hidden Preferences
This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste. You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants. You'll see progress by fine-tuning the LLM multiple times.
Image Data Extraction — Multimodal (Vision) Fine-tuning
This example shows how to fine-tune multimodal models (VLMs) like GPT-4o to improve their performance on vision-language tasks. Specifically, we'll build a system that categorizes document images (screenshots of computer science research papers).
Improving LLM Chess Ability with Best-of-N Sampling
This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.
We write about LLM engineering on the TensorZero Blog. Here are some of our favorite posts:
MemberEvent on tensorzero/eventsource-stream
January 28th, 2026 10:19 PM
MemberEvent on tensorzero/metrics-cloudwatch-embedded
January 14th, 2026 4:52 PM