Skip to content

Newt vs toad Benchmark

This benchmark validates Newt's AUC, KS, IV, and PSI implementations against toad using the bundled sample dataset at examples/data/test_data/all_data.pq.

For broader release-level context, see the project README.md.

What It Checks

  • AUC and KS on score_p and score_old_p
  • PSI for:
  • train -> test
  • train -> oot
  • the score columns above
  • all numeric features
  • IV for all candidate features
  • numeric columns are first aligned to fixed 10-quantile bins
  • categorical columns are compared directly

The benchmark also records median runtimes for:

  • Newt AUC and KS
  • Newt and toad PSI
  • Newt single-column IV
  • Newt batch IV with the Python engine
  • Newt batch IV with the Rust engine when available
  • toad's quality(..., iv_only=True) path

Run It

Run the benchmark from the benchmark environment:

python benchmarks/metric_vs_toad.py

Output

The command writes:

  • out/benchmarks/metric_vs_toad.json
  • out/benchmarks/metric_vs_toad.md

The JSON file contains the full machine-readable report. The Markdown file is a short human-readable summary.

Python and Dependency Notes

  • Newt core remains compatible with Python 3.8.5 through 3.12.x
  • The benchmark environment is .venv-benchmark-3.10
  • That environment includes toad==0.1.5 through the benchmark dependency group
  • The benchmark runs toad directly in the same environment; no worker venv or Python fallback is used

Current Comparison Policy

  • AUC and KS compare the public metric functions directly
  • PSI uses Newt's reference quantile bucket logic on both sides so the comparison measures the metric calculation, not bucket-definition drift
  • IV uses the same prepared feature frame for both libraries to avoid mixing metric differences with different default binning rules

Latest Verified Run

Latest benchmark run (generated on 2026-04-04T21:22:44) from .venv-benchmark-3.10:

  • IV mean absolute diff: 4.949931178722187e-16
  • IV max absolute diff: 1.4033219031261979e-13
  • userinfo_24 IV:
  • Newt: 1.727905
  • toad: 1.727905

This confirms the current IV implementation is aligned with toad on the prepared benchmark input.