DiffBio
Differentiable Bioinformatics Pipelines
End-to-end differentiable bioinformatics pipelines built on JAX. Replace discrete operations with differentiable relaxations for gradient-based optimization.
Repository Coming Soon
This project is under active development and will be open-sourced soon
Overview
DiffBio is a framework for building end-to-end differentiable bioinformatics pipelines. Traditional bioinformatics pipelines use discrete operations (hard thresholds, argmax decisions) that block gradient flow. DiffBio addresses this by replacing these operations with differentiable relaxations, enabling gradient-based optimization through entire analysis workflows.
The framework provides 35+ differentiable operators covering alignment, variant calling, single-cell analysis, epigenomics, RNA-seq, preprocessing, normalization, and multi-omics. Key innovations include soft quality filtering using sigmoid-based weights instead of hard cutoffs, differentiable pileup with soft position assignments via temperature-controlled softmax, and soft alignment scoring replacing discrete Smith-Waterman with continuous relaxations.
DiffBio includes 5 end-to-end pipelines for variant calling, single-cell analysis, differential expression, and preprocessing. Each pipeline can be trained using gradient descent with custom loss functions, gradient clipping, and synthetic data generation for bootstrapping. This enables learning optimal pipeline parameters directly from data rather than manual tuning.
Built on Datarax's operator framework and powered by JAX/Flax NNX, DiffBio inherits composable architecture with automatic vectorization, batch processing, and GPU acceleration. Each operator implements the standard apply interface, enabling seamless composition into complex analysis workflows.
Key Features
35+ Differentiable Operators
Covering alignment, variant calling, single-cell analysis, epigenomics, RNA-seq, preprocessing, normalization, and multi-omics.
Soft Quality Filtering
Sigmoid-based weights instead of hard cutoffs. Learnable thresholds allow gradient-based optimization of quality control parameters.
Differentiable Alignment
Soft Smith-Waterman scoring replacing discrete alignments with continuous relaxations. Temperature-controlled softmax for smooth gradient flow.
End-to-End Pipelines
5 ready-to-use pipelines for variant calling, single-cell analysis, differential expression, and preprocessing — all trainable with gradient descent.
GPU-Accelerated
Built on JAX for XLA-compiled computation. Process large genomic datasets efficiently on GPUs and TPUs.
Built on Datarax
Composable architecture using the Datarax operator framework. Chain operators into pipelines with automatic vectorization and batch processing.
Use Cases
Variant calling with learnable quality thresholds and pileup parameters
Single-cell RNA-seq analysis with differentiable preprocessing
Differential expression analysis with end-to-end optimization
Epigenomics peak calling with soft boundary detection
Learning optimal pipeline parameters directly from labeled data
Multi-omics integration with gradient-based feature selection
Benchmarking differentiable vs. discrete bioinformatics approaches
Training custom bioinformatics operators with task-specific losses
Installation
# Clone the repository
git clone https://github.com/avitai/DiffBio.git
cd DiffBio
# Install with uv (recommended)
uv sync
# Or with pip
pip install -e .Quick Start
import jax.numpy as jnp
from flax import nnx
from diffbio.operators import (
DifferentiableQualityFilter,
DifferentiablePileup,
)
from diffbio.pipelines import create_variant_calling_pipeline
# Quality filtering with learnable threshold
quality_filter = DifferentiableQualityFilter(
threshold=20.0, temperature=1.0, rngs=nnx.Rngs(0),
)
# Create end-to-end variant calling pipeline
pipeline = create_variant_calling_pipeline(
reference_length=100,
num_classes=3, # ref, SNP, indel
hidden_dim=32,
seed=42,
)
# Process reads — result contains per-position variant predictions
result, _, _ = pipeline.apply(batch_data, {}, None)Built With
Ready to Get Started?
Explore the documentation, try examples, or contribute to the project.