Review any Nix config, get structured findings.
Paste a flake, a NixOS module, or a home-manager config. statix and deadnix run first; findings are augmented with RAG over 98k+ nixpkgs entries and 16k NixOS options; a small local LLM turns it all into line-by-line review comments. Open source, Apache-2.0.
Paste your config
flake.nix · configuration.nix · home.nix — up to 128 KB
Benchmark — every version on the same 25 cases
We run the same 25-case test suite against every version. No version ships unless it's an honest step forward on real metrics — line-exactness, severity match, no hallucinated options.
Hover a point to see why that version shipped (or didn't).
| Metric | v0 base Qwen 1.5B | v0 live (hermes3:3b) | v0.1 LoRA | v0.2 LoRA (3 ep.) | v0.2a LoRA | v0.4 LoRA · live |
|---|---|---|---|---|---|---|
| schema_valid | 100%† | 96% | 96% | 88% | 96% | 100% |
| no_hallucinated_options | n/a | 88.9% | 100% | n/a | 100% | 100% |
| line_exact | 0% | 20% | 90% | 50% | 90% | 95% |
| severity_match | 20% | 45% | 75% | 40% | 70% | 80% |
| message_keywords_hit | 0% | 25% | 45% | 40% | 45% | 80% |
| empty_on_negative | 0% | 0% | 0% | 100%‡ | 60% | 60% |
| avg latency§ | 5.5 s | 18 s | 3.3 s | 2.0 s | 2.9 s | 3.5 s |
† base Qwen's 100% is illusory — every output triggered the review pipeline's escape-hatch fallback (no real review emitted, valid shape by coincidence). ‡ v0.2 achieved 100% on negatives but over-fit to refusal (3 epochs), regressing line_exact 90→50. § fp16 adapter latency on Arc XPU; production GGUF/Q4 on CPU is comparable across versions. v0.4 is the first version that strictly beats v0.2a on every review metric: +5 line_exact, +10 severity, +35 message_keywords_hit, all from adding a single new mutation class (option_renamed_across_channels).
environment.systemPackages = [ fooo ];line 0 · hint texlive: Consider using texliveConTeXt for TeX Live environment.
Never mentions fooo. Hallucinates texlive. Classic slop.
line 3 · error `fooo` is not a nixpkgs attribute — did you mean `vim`?
Right line. Right severity. Names the bug. Suggests the fix.
How it works
Deterministic linters catch redundant defaults, unused bindings, deprecated attrs, useless parens before the LLM runs.
98,382 packages + 16,095 NixOS options pre-embedded. At review time, findings + attr-paths are searched to pull relevant context into the prompt.
Fine-tuned on 445 synthesized (broken_config, review) pairs. Base: Qwen2.5-Coder-1.5B-Instruct. Serves via Ollama on the xnode.
The Nix evaluator is the teacher.
Every training pair is synthesized: we take a pattern (e.g. typo'd package name), generate an original Nix config exhibiting it, and run nix eval inside a Docker container to capture the real error message, line number, and column. The model learns to reproduce what the Nix compiler itself says. No forum content is reproduced — the dataset is entirely synthetic, Apache-2.0 clean.
Qwen2.5-Coder-1.5B-Instruct
LoRA (r=16, α=32)
Intel Arc 140T iGPU (48 GB)
Run the model on your own machine
The Q4_K_M quantized build is ~986 MB and runs on CPU. If you have Ollama installed, it's two commands.
ollama pull hf.co/OpenxAILabs/nix-reviewer-1.5b-GGUF:Q4_K_M
One-time download. ~986 MB.
ollama run hf.co/OpenxAILabs/nix-reviewer-1.5b-GGUF:Q4_K_M \
'{{ pkgs, ... }}:
{{
environment.systemPackages = with pkgs; [ vim vvim ];
}}'
Expected: [{{"line": 4, "severity": "error", "message": "`vvim` is not..."}}]
curl http://localhost:11434/api/chat -d '{{
"model": "hf.co/OpenxAILabs/nix-reviewer-1.5b-GGUF:Q4_K_M",
"stream": false,
"messages": [
{{"role": "system", "content": "You are nix-assistant. Review the Nix config and output ONLY a JSON array: [{{\"line\":int,\"severity\":\"error\"|\"warning\"|\"hint\",\"message\":str}}]"}},
{{"role": "user", "content": "{{ pkgs, ... }}: {{ environment.systemPackages = with pkgs; [ vim vvim ]; }}"}}
]
}}'
Send a message
Bug reports, misses, requests, or "your model hallucinated X" — all useful. Humans and AI agents both welcome. Text only (markdown fine), 2000 chars max, 3 submissions per hour per IP.