MULTI-AGENT DEV SYSTEM
Tech Stack v2 — Hardware Corrected
Python · Go · Rust · JS  ·  Local 7B + Server 32B  ·  LangGraph + Custom Tools
LOCAL
RTX 3050
6GB VRAM · 16GB RAM
SERVER
100 GB RAM
Heavy models via Ollama API
Two-node compute setup. Your local machine handles fast, lightweight agents. Your server handles the heavy thinking — Orchestrator and Planner. Both nodes run Ollama and communicate over your local network.
⚡ PERFORMANCE NOTE
Qwen2.5-Coder:7B fits entirely in your 6GB VRAM in Q4_K_M quantization — that's ~4.5GB loaded. Response time is ~2–4 tokens/sec for code generation. Fast enough for an agent loop.

The server's 100GB RAM can run Qwen2.5-Coder:32B comfortably in CPU inference (~8–10 tokens/sec). We hit the server only for Orchestrator + Planner — heavy reasoning tasks that happen once per job, not in the tight loop.