Skip to content

How It Works

Go from zero to production-grade evaluation in minutes. Four simple steps to rigorous LLM testing.

Step 01

Connect Your Model

Point YetixAI at any LLM — OpenAI, Anthropic, open-source, or your own fine-tuned model. Just provide an API endpoint and we handle the rest. Supports streaming, batch, and async inference modes.

  • One-line SDK integration
  • Support for all major providers
  • Custom model endpoints via REST
  • Automatic rate limiting and retries
step01.py
from yetixai import YetixClient

client = YetixClient(api_key="your-key")

# Register your model
client.models.add(
    name="my-gpt4",
    provider="openai",
    model="gpt-4o"
)
Step 02

Configure Eval Suites

Choose from built-in evaluation templates or define custom test suites with your own datasets, metrics, and scoring rubrics. Start with pre-built templates for common tasks or build from scratch.

  • Pre-built templates for QA, summarization, RAG
  • Custom metrics with Python or YAML
  • Dataset upload via CSV, JSON, or API
  • LLM-as-judge configuration
step02.py
# Use a built-in suite
suite = client.suites.get("hallucination-v2")

# Or define your own
suite = client.suites.create(
    name="my-qa-tests",
    dataset="./test_cases.json",
    metrics=["accuracy", "relevance"],
    threshold=90
)
Step 03

Run Evaluations

Execute evaluation runs on demand, on a schedule, or triggered by CI/CD events. Test across thousands of prompts in parallel with detailed per-case results.

  • Parallel execution across test cases
  • Scheduled and event-triggered runs
  • Real-time progress monitoring
  • Automatic result caching
step03.py
# Run evaluation
results = client.evaluate(
    model="my-gpt4",
    suite="my-qa-tests"
)

# Stream progress
for update in results.stream():
    print(f"{update.completed}/{update.total}")
Step 04

Analyze & Improve

Review dashboards, drill into failures, compare model versions, and track quality trends over time. Export reports, set alerts, and ship better models faster.

  • Interactive failure analysis
  • Model version comparison
  • Trend charts and regression alerts
  • Export to PDF, CSV, or API
step04.py
# Check results
print(f"Score: {results.score}%")
print(f"Passed: {results.passed}/{results.total}")

# Fail CI on regression
assert results.score > 90

# Export report
results.export("report.pdf")