Writing — Karan Mohindroo

I Built the LLM Inference Correctness Tool That Should Already Exist

Every benchmark measures tokens per second. Nobody measures whether the tokens are correct. I built infer-check to fix that — and ran it across Llama-3.1-8B and Qwen3.5-4B on Apple Silicon.

8 min read

Mar 2026