Hi @Jay Namdeo , sorry for late response!
As I understand, Ollama stores model weights on disks. When the model starts, those files are read from disk and copied into system RAM and then GPU VRAM. After this step, most of operations happen in VRAM and the disk should not be related to this process unless you unload or reload the models.
So, your choice really depends on how frequently the model is started. Since you mentioned using auto shutdown and auto start, it sounds like the model restarts quite often. In that case, I’d strongly recommend going with NVMe. Keeping your files on a separate NVMe disk not only speeds up read times but also ensures persistence, since it's independent of the system. That makes things a lot easier to manage.
I hope this helps you get things back on track quickly! If my suggestions can solve your issue, feel free to interact with the system accordingly!
Thank you!