Tracing, evaluation, and error analysis for agents

Chat with AI to debug agent traces, create smart monitoring columns, and build out tailored evaluations.

Get Started Book a demo

Error analysis

Find and fix AI issues

Chat with your traces

Agent trace data is huge and hard to read. Gentrace Chat, inspired by Cursor, has full context of what's on your screen, allowing you to quickly answer questions like:

What issues happened here?

Was the user frustrated here?

Were there any failed tool calls here?

Learn more about Gentrace Chat

Generate custom monitoring code with AI

Generate monitoring code tailored to your use case - simple heuristics or intelligent LLM analysis - that automatically runs on every trace to spot issues in your AI output.

Tool errors

User frustration

Token monitoring

Never miss critical AI issues

Get notified instantly when issues arise and receive regular quality summaries to track your AI performance

Add Slack notifications

Coming soon

Tracing

Best practice monitoring with easy install

Easy install

Gentrace provides a minimal tracing SDK for quickly tracing your AI agent.

npm

yarn

pnpm

npm install gentrace

Copy

yarn add gentrace

Copy

pnpm add gentrace

Copy

index.ts

1import { init, interaction } from 'gentrace';
2
3init()
4
5const haiku = interaction(‘haiku’, () => {
6	return myLlm.invoke('make a haiku');
7})
8
9haiku();

Copy

bash

1GENTRACE_API_KEY=YOUR_API_KEY npx tsx index.ts

Copy

Generate API Key

Click to generate your unique API key.

Generate API Key

Authenticate

Install the Gentrace SDK using npm. 

Learn more about how to instrument LLM calls.

Initialize in Your Project

Use the following TypeScript code to initialize the SDK and define an LLM interaction.

Widespread compatibility

Gentrace works with most common agent frameworks and LLMs.

AI SDK

View Docs

Pydantic AI SDK

View Docs

OpenAI Agents

View Docs

Mastra

View Docs

Next.JS

View Docs

LangGraph

Docs soon

OpenAI (Python)

Docs soon

OpenAI (JS)

Docs soon

TypeScript

Python

See all integrations

Built on open standards

Built on OpenTelemetry, the industry standard for observability, ensuring compatibility with any monitoring stack

Get started

1# This wraps the function in an OpenTelemetry span
2# for submission to Gentrace.
3@interaction(name="simple_example")
4def my_agent(input: str, user_id: str) -> Dict[str, str]:
5    # You can access the current span using the OpenTelemetry API.
6    span = get_current_span()
7    span.set_attribute("user_id", user_id)
8    return my_agent_inner(input)

Copy

Evaluations

Capture regressions before they go live

Powerful evals, lightweight setup

Begin with lightweight evaluations that deliver immediate insights, then expand to comprehensive testing workflows as your requirements evolve.

Get started tracing your agent

Unit Test

Dataset Test

1// Run a "unit test" evaluation
2await evalOnce('rs-in-strawberry', async () => {
3  const response = await openai.chat.completions.create({
4     model: 'gpt-o4-mini',
5     messages: [{ role: 'user', content: 'How many rs in
6	 strawberry? Return only the number.'}],
7  });
8  const output = response.choices[0].message.content;
9  if (output !== '3') {
10     throw new Error('Output is not 3: ${output}’ );
11  }
12});

Copy

1// Run a "dataset" evaluation
2await evalDataset({
3  data: async () => (await testCases.list()).data,
4  inputSchema: z.object({ query: z.string() }),
5  interaction: async (case) => {
6    return await runMyAgent(case.inputs.query);
7  }
8});

Copy

Turn experiments into insights

Use AI to analyze results, compare performance across experiments, and integrate with your current scoring methods

Flexible dataset management

Store test data in Gentrace or your codebase, organize it efficiently with built-in management tools, and write experiments directly in code for maximum flexibility.

Learn more about datasets

Gentrace was the right product for us because it allowed us to implement our own custom evaluations, which was crucial for our unique use cases. It's dramatically improved our ability to predict the impact of even small changes in our LLM implementations.

Madeline Gilbert

Staff Machine Learning Engineer at Quizlet

Get started

Role-based
access control

Self-hosted infrastructure

SOC 2 Type II

GDPR

ISO 27001

SSO and SCIM provisioning

Security

Enterprise ready

Enterprise-level security through SOC 2 Type II and ISO 27001 compliance. Choose cloud or self-hosted deployment, and connect your existing login systems with SSO/SCIM.

Ready to debug smarter and ship faster?

Get Started Book a demo

Find and fix AI issues

Chat with your traces

Generate custom monitoring code with AI

Never miss critical AI issues

Gentrace allows our ML engineers to work cohesively with other engineering teams, product managers, and coaches. Combining AI and human evaluation really helps us move faster and be more confident in our deployment of AI to benefit our customers and learners.

Best practice monitoring with easy install

Easy install

Generate API Key

Authenticate

Initialize in Your Project

Widespread compatibility

Built on open standards

Capture regressions before they go live

Powerful evals, lightweight setup

Turn experiments into insights

Flexible dataset management

Gentrace was the right product for us because it allowed us to implement our own custom evaluations, which was crucial for our unique use cases. It's dramatically improved our ability to predict the impact of even small changes in our LLM implementations.

Enterprise ready

Ready to debug smarter and ship faster?