Newt vs toad Benchmark
This benchmark validates Newt's AUC, KS, IV, and PSI implementations against toad using the bundled sample dataset at examples/data/test_data/all_data.pq.
For broader release-level context, see the project README.md.
What It Checks
AUCandKSonscore_pandscore_old_pPSIfor:train -> testtrain -> oot- the score columns above
- all numeric features
IVfor all candidate features- numeric columns are first aligned to fixed 10-quantile bins
- categorical columns are compared directly
The benchmark also records median runtimes for:
- Newt
AUCandKS - Newt and toad
PSI - Newt single-column
IV - Newt batch
IVwith the Python engine - Newt batch
IVwith the Rust engine when available - toad's
quality(..., iv_only=True)path
Run It
Run the benchmark from the benchmark environment:
python benchmarks/metric_vs_toad.py
Output
The command writes:
out/benchmarks/metric_vs_toad.jsonout/benchmarks/metric_vs_toad.md
The JSON file contains the full machine-readable report. The Markdown file is a short human-readable summary.
Python and Dependency Notes
- Newt core remains compatible with Python
3.8.5through3.12.x - The benchmark environment is
.venv-benchmark-3.10 - That environment includes
toad==0.1.5through thebenchmarkdependency group - The benchmark runs
toaddirectly in the same environment; no worker venv or Python fallback is used
Current Comparison Policy
AUCandKScompare the public metric functions directlyPSIuses Newt's reference quantile bucket logic on both sides so the comparison measures the metric calculation, not bucket-definition driftIVuses the same prepared feature frame for both libraries to avoid mixing metric differences with different default binning rules
Latest Verified Run
Latest benchmark run (generated on 2026-04-04T21:22:44) from .venv-benchmark-3.10:
- IV mean absolute diff:
4.949931178722187e-16 - IV max absolute diff:
1.4033219031261979e-13 userinfo_24IV:- Newt:
1.727905 - toad:
1.727905
This confirms the current IV implementation is aligned with toad on the prepared benchmark input.