Back to Open Source

Calibrax

Unified Benchmarking Framework

A unified benchmarking framework for the JAX scientific ML ecosystem with profiling, statistical analysis, regression detection, and CI integration.

Repository Coming Soon

This project is under active development and will be open-sourced soon

Overview

Calibrax (Calibrate + JAX) is a unified benchmarking framework for the JAX scientific ML ecosystem. It extracts and consolidates shared benchmarking, profiling, and statistical analysis functionality from Datarax, Artifex, and Opifex into a single, reusable package.

The framework provides a full profiling suite including timing with warm-up awareness, resource monitoring, GPU memory/clock/power tracking, energy measurement, FLOPS counting, roofline analysis, XLA compilation profiling, complexity analysis, hardware detection, and carbon tracking. All measurements come with rigorous statistical analysis — bootstrap confidence intervals, hypothesis testing, effect sizes, and outlier detection.

Calibrax includes direction-aware regression detection with configurable severity levels, cross-configuration comparison with Pareto front analysis and aggregate scoring, and a validation framework for convergence analysis and accuracy assessment. Results can be stored in a JSON-per-run file backend with baseline management, and exported to W&B, MLflow, or publication-ready LaTeX/HTML/CSV tables and matplotlib plots.

For CI integration, Calibrax provides a regression gate with git bisect automation, and production monitoring with configurable alerting thresholds. The full CLI supports ingest, export, check, baseline, trend, summary, and profile commands.

Key Features

Profiling

Timing with warm-up awareness, GPU memory/clock/power tracking, FLOPS counting, roofline analysis, XLA compilation profiling, and carbon tracking.

Statistical Analysis

Bootstrap confidence intervals, hypothesis testing, effect sizes, and outlier detection for rigorous benchmarking results.

Regression Detection

Direction-aware performance regression detection with configurable severity levels. Catch performance regressions before they ship.

Comparison & Ranking

Cross-configuration comparison, Pareto front analysis, aggregate scoring, and scaling analysis for informed architecture decisions.

Publication Export

W&B and MLflow integration, publication-ready LaTeX/HTML/CSV tables, and matplotlib plots for papers and reports.

CLI & CI Integration

Full CLI (ingest, export, check, baseline, trend, summary, profile) with CI regression gate and git bisect automation.

Use Cases

1

Benchmarking JAX model performance across hardware configurations

2

Detecting performance regressions in CI/CD pipelines

3

Comparative analysis across Artifex, Datarax, and Opifex projects

4

Publication-ready performance tables and plots for research papers

5

GPU memory and energy profiling for resource optimization

6

Roofline analysis for identifying computational bottlenecks

7

Statistical validation of performance improvements

8

Production monitoring with configurable alerting thresholds

Installation

# Basic installation
uv pip install calibrax

# With statistical analysis (scipy)
uv pip install "calibrax[stats]"

# With GPU monitoring
uv pip install "calibrax[gpu]"

# With publication export (matplotlib)
uv pip install "calibrax[publication]"

Quick Start

# Clone and set up development environment
git clone https://github.com/avitai/calibrax.git
cd calibrax

# Automatic setup with GPU detection
./setup.sh
source ./activate.sh

# CLI usage
calibrax profile --model my_model.py
calibrax check --baseline main
calibrax summary --format html
calibrax trend --metric throughput --window 30d

Built With

JAXSciPyNumPyMatplotlibWeights & BiasesMLflow

Ready to Get Started?

Explore the documentation, try examples, or contribute to the project.