NSAM's built the AI from hell with some Nvidia GPUs, go ahead and doubt it
| Jared Baumeister | 02/26/26 | | wangfei | 02/27/26 | | Jared Baumeister | 02/27/26 | | wangfei | 02/27/26 | | Jared Baumeister | 02/27/26 | | wangfei | 02/27/26 | | Jared Baumeister | 02/27/26 | | wangfei | 02/27/26 | | Jared Baumeister | 02/27/26 | | wangfei | 02/27/26 | | Jared Baumeister | 02/27/26 | | wangfei | 02/27/26 | | Jared Baumeister | 02/27/26 | | Taylor Swift is not a hobby she is a lifestyle | 02/27/26 | | Lab Diamond Dallas Trump | 02/27/26 | | Jared Baumeister | 02/27/26 | | Jared Baumeister | 02/27/26 |
Poast new message in this thread
Date: February 27th, 2026 9:28 AM Author: Jared Baumeister
PS if you're using llama.cpp you can ask Claude how to tune the parameters to your particular situation.
You should also phave all drivers installed before you compile llama.cpp, so that it detects and installs the right modules
Also, if you have a mix of GPUs with different amounts of VRAM, you have tell llama.cpp how many layers to offload to each one and how many layers (if any) go to the CPU. It's such a grind that I'm making a spreadsheet of scripts for launching different models in different configurations
This is the PCIe bifurcation card I use. Even though it says it's only 4.0, it shows up as 5.0x8 in nvtop
https://a.co/d/07XNtyYc
Finally, all of these GPUs can be drastically power limited so that they run off one PSU. The 3080s can be run at 200W (400Wx combined), the 5060 uses 170W so you're only at 370W, then power limit the 5090 to 380W. So total draw is only 750W (and my CPU can only pull 65W, so it's not a problem to run them on one PSU. I'm using a 1600W PSU but I could probably get by with 1000W. Performance isn't an issue unless you're gaming, and cooler temps extend longevity
(http://www.autoadmit.com/thread.php?thread_id=5838844&forum_id=2most#49699135) |
|
|