# Syntropylabs — LLM Evaluation & Observability Platform
> Instrument, trace, and evaluate your LLM applications in production.

## What is Syntropylabs?

Syntropylabs is a platform for AI engineering teams to trace, evaluate, and improve LLM applications.
It provides distributed tracing, automated evaluation, and a developer SDK that instruments your code in minutes.

## Core Features

### EvalKit SDK
- Python SDK (evalkit) and TypeScript SDK (@evalkit/sdk)
- Single init() call auto-instruments OpenAI, Anthropic, HTTP requests, Postgres, Redis, Mongoose, and more
- W3C traceparent propagation to stitch frontend and backend spans into one trace
- Manual span creation with evalkit.start_span() for custom operations
- Works with NestJS, Express, FastAPI, and plain scripts

### Distributed Tracing
- Every LLM call, tool invocation, HTTP request, and DB query captured as a span
- Waterfall view shows span hierarchy, latency, token usage, and stop reasons
- Trace projects with per-project subscription keys and tenant isolation
- Latency percentiles (p50, p95, p99) and traces-over-time charts in the dashboard

### Online Evaluation (Automatic)
- Configure evaluation rules and a judge model per trace project
- The platform polls for new traces and automatically scores them as they arrive
- Auto-evaluated traces show a badge in the dashboard
- Configurable polling interval (1–60 minutes)

### Offline Evaluation (Manual)
- Select any trace or batch of traces in the dashboard and run on-demand evaluation
- Results are persisted and shown in the Evaluation tab of each trace
- Full per-rule score breakdown with reasoning from the judge model

### Evaluation Rules & Collections
- Write custom LLM-as-judge prompts with {{trace}} template variable
- Rules return a score (0.0–1.0), pass/fail, and reasoning
- Group rules into collections and apply them all at once
- Built-in support for custom criteria, safety, relevance, and tone checks

### Unified Model Runner
- Run benchmarks across OpenAI, Anthropic, and Gemini via one API
- Compare latency, cost, and quality side-by-side
- Dataset-based testing with CSV/JSON uploads

### Security
- AES-256 encrypted storage for all provider API keys
- JWT authentication with httpOnly cookies
- Per-project subscription keys with rotation support

## Getting Started

- Docs: https://syntropylabs.ai/docs
- Sign up: https://syntropylabs.ai/auth
- Blog: https://syntropylabs.ai/blog

## Blog

- [Demystifying Agent Harness Evaluation](https://syntropylabs.ai/blog/demystifying-agent-harness-evaluation/raw) [AI Agents, Agent Evaluation, LLM Evaluation, AI Eval Harness, Multi Turn Evals, Agentic AI, AI Infrastructure, LLM Ops, AI Testing, AI Benchmarking, AI Safety, AI Tool Use, Evaluation Metrics, Offline Evaluation, Online Evaluation, LLM as a Judge, DeepEval, LangSmith, Ragas, Promptfoo, Research Agents, Voice Agents, Agent Trajectory, Tool Trajectory, AI Observability, AI Reliability, Production Monitoring, AI Workflows, Autonomous Agents, Generative AI]: An agent is a model plus a scaffold — tools, memory, skills, control flow. To evaluate one well, you have to evaluate the whole thing, not just the model's fina
- [Agent Skills-Dynamic Context Engineering](https://syntropylabs.ai/blog/agent-skills-dynamic-context/raw) [AI, Agents, LLM, Architecture, Engineering]: Agent Skills enable dynamic context loading, reducing prompt bloat by fetching only relevant instructions, improving efficiency, modularity, and cost in AI syst

## Links

- [Documentation](https://syntropylabs.ai/docs)
- [Blog index](https://syntropylabs.ai/blog)
- [Sitemap](https://syntropylabs.ai/sitemap.xml)