Skip to main content
KM

Who I am

About

I'm Karan Mohindroo — a CS and Behavioral Neuroscience undergrad at Northeastern University building open-source tooling for LLM inference correctness and reliability. I study neural networks from both sides: how they compute and how biological ones do.

My current focus is quantization correctness testing and cross-backend differential analysis on Apple Silicon. I built infer-check, a CLI tool that catches the correctness bugs benchmarks miss — differential testing across mlx-lm, vllm-mlx, and llama.cpp, quantization degradation sweeps, and determinism verification. It's on PyPI and has produced the first published quantization data for Qwen3.5-4B on the Gated DeltaNet architecture.

My research background is in inference optimization. In summer 2024 I interned at UC Santa Cruz in Prof. Chen Qian's lab, co-authoring CALID — a confidence-based filter decoding system for routing LLM requests between small and large models. We presented at BayLearn 2024 at Apple.

I contribute to open-source projects including Homebrew and MacPorts, and I ship tools on PyPI. I care about the parts of ML infrastructure that fail quietly — serving-layer faithfulness, reasoning-token handling, and the correctness gaps between inference backends that only show up at scale.

Skills & Tools

ML & Inference
MLXvLLMllama.cppPyTorchHuggingFace
Quantization
GPTQAWQMLX Native QuantGGUF
Languages
PythonTypeScriptBashJavaJavaScriptSwiftSQLC#
Tools
GitGitHub ActionsDockerPyPIJupyter NotebookNext.js
Self-portrait

Experience

  1. Creator & Maintainer

    infer-check

    Built a cross-backend differential testing CLI for LLM inference correctness — quantization sweeps, determinism verification, and serving-layer faithfulness across mlx-lm, vllm-mlx, and llama.cpp. Published on PyPI.

  2. Creator & Maintainer

    MiddleDrag

    Native macOS app adding three-finger trackpad gestures for middle-click and middle-drag. Distributed via Homebrew and MacPorts with signed installers, auto-updates, and crash reporting.

  3. Undergraduate

    Northeastern University

    Dual major in Computer Science and Behavioral Neuroscience — studying neural computation from both the engineering and biological perspectives.

  4. Research Intern

    UC Santa Cruz — Prof. Chen Qian’s Lab

    Co-authored CALID, a confidence-based filter decoding system for LLM inference that routes requests between small and large models using NLL-based confidence scores. Poster presented at BayLearn 2024 at Apple.