Getting Started with Geval
Geval is an eval-driven release enforcement system that turns your evaluation results into deterministic go/no-go decisions inside CI/CD. Geval does NOT:- Run evals
- Define correctness
- Optimize prompts
- Consume eval outputs from any tool
- Apply explicit contracts
- Enforce decisions in PRs and CI/CD
- Block unverified AI changes before production
Installation
Quick Start (5 minutes)
Step 1: Create an Eval Contract
An eval contract defines what “acceptable” means for your AI system. Createeval-contract.yaml:
Step 2: Prepare Your Eval Results
Your eval results should be in JSON format with metrics. Createeval-results.json:
Step 3: Run Geval Check
Working with CSV Files (LangSmith, Braintrust, etc.)
Many eval tools export CSV files. Geval can parse any CSV directly in the CLI by adding asources section to your contract.
The Best Approach: Inline Source Config (Recommended for CI)
Add asources section to your contract - no extra files needed!
Contract with inline CSV config (eval-contract.yaml):
eval-results.csv):
- Detects the
.csvextension - Uses the
sources.csvconfig from your contract - Aggregates metrics (avg, p95, pass_rate, etc.)
- Compares against your rules
- Returns PASS or BLOCK
CI/CD Workflow (Fully Automated)
Alternative: Programmatic Usage
If you need more control, use the core library directly:CLI Commands Reference
geval validate
Validate a contract file:
geval check
Check eval results against a contract:
0- PASS (safe to merge/deploy)1- BLOCK (do not merge/deploy)2- REQUIRES_APPROVAL (needs human review)3- ERROR (something went wrong)
geval diff
Compare eval results between runs:
geval explain
Get detailed explanation of a decision:
Programmatic Usage (Node.js/TypeScript)
Basic Example
Working with Different Eval Tools
Comparing Eval Runs
CI/CD Integration
GitHub Actions
GitLab CI
Contract Schema Reference
Aggregation Methods
When parsing CSV/JSON with custom column mapping, you can use these aggregation methods:| Method | Description |
|---|---|
avg | Average of all values (default) |
sum | Sum of all values |
min | Minimum value |
max | Maximum value |
count | Count of non-null values |
p50 | 50th percentile (median) |
p90 | 90th percentile |
p95 | 95th percentile |
p99 | 99th percentile |
pass_rate | % of values that are truthy/“success”/“pass” |
fail_rate | % of values that are falsy/“error”/“fail” |
first | First value |
last | Last value |
Best Practices
1. Start Simple
Begin with fixed thresholds. Add relative baselines later.2. Version Your Contracts
Store contracts in git alongside your code. Review contract changes like code changes.3. Use Descriptive Names
4. Set Appropriate Thresholds
Don’t set thresholds too tight initially. Adjust based on real data.5. Block on Critical Metrics Only
Usewarn for non-critical metrics, block for critical ones.
Troubleshooting
”Required eval not found”
TheevalName in your results must match the name in required_evals.