Comment I still can't get Gemma4 to beat Qwen3.6 (Score 1) 54
My best results in intelligence and speed in my MacBook Pro 2019 with an i7-9750H and 32GB RAM are with Qwen3.6 MoE version. I use it for my Hermes Agent:
llama-server \
--hf-repo majentik/Qwen3.6-35B-A3B-RotorQuant-GGUF-Q2_K \
--hf-file Qwen3.6-35B-A3B-Q2_K.gguf \
--host 0.0.0.0 \
--port 8080 \
--threads 4 \
-tb 4 \
-b 2048 \
-ub 2048 \
--ctx-size 65536 \
--n-gpu-layers 0 \
--flash-attn on \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--reasoning off \
--batch-size 2048
It takes around 20 minutes for Hermes to start using this as a local model. But after that it is quite good.