Work
Cross-backend differential testing CLI for LLM inference correctness — quantization sweeps, determinism verification, and serving-layer faithfulness across mlx-lm, vllm-mlx, and llama.cpp.
Native macOS app that adds three-finger trackpad gestures for middle-click and middle-drag. Distributed via Homebrew and MacPorts.
Confidence-based filter decoding system that routes LLM inference requests between small and large models to reduce compute costs while maintaining output quality. BayLearn 2024 poster at Apple.