we have GPU servers and we are using ollama models on to it we have nvme disk attached to it. we wanted to know did ollama models will work better on nvme or os disk? we are using autoshutdown and autostart to save cost as well.

Question

we have GPU servers and we are using ollama models on to it we have nvme disk attached to it. we wanted to know did ollama models will work better on nvme or os disk? we are using autoshutdown and autostart to save cost as well.

Jay Namdeo 0

We're running Ollama models on GPU servers with an NVMe disk attached. To optimize performance, we're considering whether to store models on the NVMe or the OS disk. Since Ollama frequently accesses model files, using the high-speed NVMe disk will improve load times and inference performance compared to standard OS disks. We're also using auto-shutdown and auto-start to save costs, and storing models on the persistent NVMe ensures models are readily available at startup without re-downloading.

Harry Vo (WICLOUD CORPORATION) 3,910 Reputation points Microsoft External Staff Moderator

2025-09-15T03:20:30.25+00:00

Hi @Jay Namdeo , have you solved your issue? Let me know if still have any struggles! Thank you!
Jay Namdeo 0 Reputation points

2025-09-15T05:41:29.8533333+00:00

not yet
Harry Vo (WICLOUD CORPORATION) 3,910 Reputation points Microsoft External Staff Moderator

2025-09-17T10:32:17.33+00:00

Hi @Jay Namdeo , I just posted my answer! If you think it helpful, please let me know! Thank you!

1 answer

Your answer

Harry Vo (WICLOUD CORPORATION) 3,910 Reputation points Microsoft External Staff Moderator

2025-09-15T03:20:30.25+00:00

Hi @Jay Namdeo , have you solved your issue? Let me know if still have any struggles! Thank you!
Jay Namdeo 0 Reputation points

2025-09-15T05:41:29.8533333+00:00

not yet
Harry Vo (WICLOUD CORPORATION) 3,910 Reputation points Microsoft External Staff Moderator

2025-09-17T10:32:17.33+00:00

Hi @Jay Namdeo , I just posted my answer! If you think it helpful, please let me know! Thank you!

Answer 1

Hi @Jay Namdeo , sorry for late response!

As I understand, Ollama stores model weights on disks. When the model starts, those files are read from disk and copied into system RAM and then GPU VRAM. After this step, most of operations happen in VRAM and the disk should not be related to this process unless you unload or reload the models.

So, your choice really depends on how frequently the model is started. Since you mentioned using auto shutdown and auto start, it sounds like the model restarts quite often. In that case, I’d strongly recommend going with NVMe. Keeping your files on a separate NVMe disk not only speeds up read times but also ensures persistence, since it's independent of the system. That makes things a lot easier to manage.

I hope this helps you get things back on track quickly! If my suggestions can solve your issue, feel free to interact with the system accordingly!

Thank you!

Harry Vo (WICLOUD CORPORATION) 3,910 Reputation points Microsoft External Staff Moderator

2025-09-24T02:57:15.83+00:00

Hi @Jay Namdeo , have you review my answer yet? If you still have any question, please let me know! Thank you!

Share via

we have GPU servers and we are using ollama models on to it we have nvme disk attached to it. we wanted to know did ollama models will work better on nvme or os disk? we are using autoshutdown and autostart to save cost as well.

1 answer

Your answer