Who I am
About
I'm Karan Mohindroo — a CS and Behavioral Neuroscience undergrad at Northeastern University building open-source tooling for LLM inference correctness and reliability. I study neural networks from both sides: how they compute and how biological ones do.
My current focus is quantization correctness testing and cross-backend differential analysis on Apple Silicon. I built infer-check, a CLI tool that catches the correctness bugs benchmarks miss — differential testing across mlx-lm, vllm-mlx, and llama.cpp, quantization degradation sweeps, and determinism verification. It's on PyPI and has produced the first published quantization data for Qwen3.5-4B on the Gated DeltaNet architecture.
My research background is in inference optimization. In summer 2024 I interned at UC Santa Cruz in Prof. Chen Qian's lab, co-authoring CALID — a confidence-based filter decoding system for routing LLM requests between small and large models. We presented at BayLearn 2024 at Apple.
I contribute to open-source projects including Homebrew and MacPorts, and I ship tools on PyPI. I care about the parts of ML infrastructure that fail quietly — serving-layer faithfulness, reasoning-token handling, and the correctness gaps between inference backends that only show up at scale.
Skills & Tools

Experience
Creator & Maintainer
infer-check
Built a cross-backend differential testing CLI for LLM inference correctness — quantization sweeps, determinism verification, and serving-layer faithfulness across mlx-lm, vllm-mlx, and llama.cpp. Published on PyPI.
Creator & Maintainer
MiddleDrag
Native macOS app adding three-finger trackpad gestures for middle-click and middle-drag. Distributed via Homebrew and MacPorts with signed installers, auto-updates, and crash reporting.
Undergraduate
Northeastern University
Dual major in Computer Science and Behavioral Neuroscience — studying neural computation from both the engineering and biological perspectives.
Research Intern
UC Santa Cruz — Prof. Chen Qian’s Lab
Co-authored CALID, a confidence-based filter decoding system for LLM inference that routes requests between small and large models using NLL-based confidence scores. Poster presented at BayLearn 2024 at Apple.