Date: June 5th, 2026 8:41 PM Author: Dan Bilzerian
I decided to run Qwen3.6 27b UNQUANTIZED. The model file is 55gb, then I needed some more VRAM for the 262.1k context window. I was afraid this would be slow as fuck, but it's surprisingly usable. Maybe MTP is actually doing something here.
Date: June 5th, 2026 9:59 PM Author: Dan Bilzerian
Ok this MTP shit must be magic because I am not feeling any memory bandwidth contraints ATM. I mean I'm not just feeling it. To keep it all-Blackwell on this server I had to toss in two 5060 ti's, and I thought for SURE they would make it feel dogshit slow. But I can't detect any speed hit at all. It may even be faster, if that's physically possible. Dual 3090s running Q8 are WAY slower than this 4-GPU Blackwell array running BF16