How It Works

Go from zero to production-grade evaluation in minutes. Four simple steps to rigorous LLM testing.

Step 01

Connect Your Model

Point YetixAI at any LLM — OpenAI, Anthropic, open-source, or your own fine-tuned model. Just provide an API endpoint and we handle the rest. Supports streaming, batch, and async inference modes.

One-line SDK integration
Support for all major providers
Custom model endpoints via REST
Automatic rate limiting and retries

step01.py

from yetixai import YetixClient

client = YetixClient(api_key="your-key")

# Register your model
client.models.add(
    name="my-gpt4",
    provider="openai",
    model="gpt-4o"
)

Step 02

Configure Eval Suites

Choose from built-in evaluation templates or define custom test suites with your own datasets, metrics, and scoring rubrics. Start with pre-built templates for common tasks or build from scratch.

Pre-built templates for QA, summarization, RAG
Custom metrics with Python or YAML
Dataset upload via CSV, JSON, or API
LLM-as-judge configuration

step02.py

# Use a built-in suite
suite = client.suites.get("hallucination-v2")

# Or define your own
suite = client.suites.create(
    name="my-qa-tests",
    dataset="./test_cases.json",
    metrics=["accuracy", "relevance"],
    threshold=90
)

Step 03

Run Evaluations

Execute evaluation runs on demand, on a schedule, or triggered by CI/CD events. Test across thousands of prompts in parallel with detailed per-case results.

Parallel execution across test cases
Scheduled and event-triggered runs
Real-time progress monitoring
Automatic result caching

step03.py

# Run evaluation
results = client.evaluate(
    model="my-gpt4",
    suite="my-qa-tests"
)

# Stream progress
for update in results.stream():
    print(f"{update.completed}/{update.total}")

Step 04

Analyze & Improve

Review dashboards, drill into failures, compare model versions, and track quality trends over time. Export reports, set alerts, and ship better models faster.

Interactive failure analysis
Model version comparison
Trend charts and regression alerts
Export to PDF, CSV, or API

step04.py

# Check results
print(f"Score: {results.score}%")
print(f"Passed: {results.passed}/{results.total}")

# Fail CI on regression
assert results.score > 90

# Export report
results.export("report.pdf")

Get Started →