Autonomous Dev Agent

Two-node compute setup. Your local machine handles fast, lightweight agents. Your server handles the heavy thinking — Orchestrator and Planner. Both nodes run Ollama and communicate over your local network.

⚡ PERFORMANCE NOTE

Qwen2.5-Coder:7B fits entirely in your 6GB VRAM in Q4_K_M quantization — that's ~4.5GB loaded. Response time is ~2–4 tokens/sec for code generation. Fast enough for an agent loop.

The server's 100GB RAM can run Qwen2.5-Coder:32B comfortably in CPU inference (~8–10 tokens/sec). We hit the server only for Orchestrator + Planner — heavy reasoning tasks that happen once per job, not in the tight loop.