Fixing GPU Passthrough On Windows 10 VM In Proxmox

With the recent update to Proxmox 6.2, I ran into some trouble with GPU passthrough on my Windows 10 VM (which I refer to as my “Windows 10 VDI”). Namely that my Windows 10 guest would no longer boot.

I believe many Proxmox users are likely impacted by this. I offer the following notes in order to get help to those impacted users as soon as possible.

A formal guide on how I fixed this problem will follow on the blog at a later time.

Below is the issue summary from my Reddit post.


Issue Summary: NVIDIA GPU Passthrough No Longer Working with Windows 10 Guest VM; Windows 10 VM No Longer Boots on Proxmox with GPU Passthrough

Issue:

I have successfully been passing my 1070Ti to a Windows 10 guest for over a year. However, with the latest Proxmox updates, my Windows 10 VM will no longer boot with the GPU passed through. Is anyone else experiencing the same issue? And if not, any suggestions on where I should go from here?

Technical Notes:

  • With GPU passed through, and HDMI cable from GPU to a monitor, the BIOS screen appears just fine, but the screen goes black when Windows was supposed to boot (I know Windows isn’t booting when this happens because the VM keeps restarting itself and eventually the physical display from the GPU will load the Windows troubleshooting screen).
  • With GPU passed through, and on clean Windows 10 install, Windows loads just fine and even displays on the monitor. It’s only when a “real” driver, either from NVIDIA or from Windows Update is installed that Windows crashes (no green screen or BSOD, however). It will even crash during the install of the driver. This suggests to me that my Proxmox GPU passthrough configuration is correct, as I am successfully passing through my GPU initially, and that this is a Windows/NVIDIA driver issue. However, see below.

Things I Have Tried:

  • I saw some sporadic comments about issues with Windows 10 May 2020 and GPU drivers. Thankfully, I had a backup from two weeks ago (when the VM was working), but no dice. This makes me believe this isn’t a Windows/driver issue.
  • Additionally, I have installed Windows 10 from scratch using both the latest Insider iso and from the latest stable ISO and then reinstalling the driver. No dice again.

Troubleshooting Steps

It appears that I am not the only person that has faced this issue. On that same thread, several users have reported the same problem. One Proxmox user replied with a video guide of how he had resolved the problem after updating Proxmox:

The above guide didn’t work for me, but I will include it in case it helps someone else. The guide discusses GPU passthrough for Proxmox 6.1 and then additional steps for Proxmox 6.2. I was coming from Proxmox 6.1 (where I originally had it working), but, according to Allen, there were some changes with Proxmox 6.2 (namely a kernel update) that caused it to stop working for him.

Namely, the biggest change was that the new kernel included vfio_iommu_type1 and so it was no longer needed in /etc/modules and suggested that this was leading to his IOMMU errors. I had no such errors, but I decided to try his fix anyway and removed my options vfio_iommu_type1 allow_unsafe_interrupts=1 from /etc/modprobe.d/iommu_unsafe_interrupts.conf (I actually just removed iommu_unsafe_interrupts.conf altogether with rm /etc/modprobe.d/iommu_unsafe_interrupts.conf). It was unclear from the video if this step was actually necessary. (I did not remove vfio_iommu_type1 from /etc/modules).

As part of that guide, I updated my grub file’s GRUB_CMDLINE_LINUX_DEFAULT (using the command nano /etc/default/grub) to include vfio_iommu_type1.allow_unsafe_interrupts=1 as Allen suggests. This means my GRUB_CMDLINE_LINUX_DEFAULT currently appears as so:

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 video=efifb:off transparent_hugepage=always intremap=no_x2apic_optout"

Fixing the Windows 10 VM Boot Failure

Alas, my issue persisted and my Proxmox Windows 10 VM continued to crash. So here’s what fixed it for me.

On my VM config, from the Proxmox web GUI, I re-enabled the display, setting it to “Standard VGA”:

Adding Standard VGA display to Proxmox VM

I also gimped by GPU (attached PCI device) by disabling it as my primary GPU:

I was greeted with the following GSOD (green screen of death). I use Windows Insider editions, so this may likely appear as a BSOD (blue screen of death) for you:

Finally! A real error! I don’t think I’ve ever been so excited to see an error before. I can do something with an error!

That stop code, VIDEO_TDR_FAILURE in nvlddmkm.sys sounds familiar. Something about interrupts?

If I recall correctly, Nvidia tries to force line-based interrupts on us for some reason, even though it’s 2020. What we want are message-based interrupts (MSI). But to fix that issue, I need to get Windows to boot in the first place…

Getting Back Into our VM

Okay, honestly, this is the hardest part. We need to get into our VM to make some changes, but right now we can’t do that because our VM won’t even boot. So let’s cripple our GPU even further. I think the problem here is that the GPU is loading (more or less, okay maybe less) successfully. Let’s force a code 43 error by removing the ROM file; that will make Windows disable the device so we can boot. At least Nvidia’s draconian drivers can be useful for something…

To force a code 43, I remove my romfile from the VM config with nano /etc/pve/qemu-server/110.conf (note that your VM ID will be different) and change the line from:

hostpci0: 02:00,pcie=1,romfile=nvidia1070.rom

to:

hostpci0: 02:00,pcie=1

This allowed me to at least boot my VM and get to the console.

Update (6/4/20): Alternatively, after a few failed boots, Windows will boot into recovery, at which point you can enable safe mode and make the regedit changes below in there.

Enabling MSI (Message Signaled Interrupts) on an Nvidia GPU in Windows 10

This step will fix the VIDEO_TDR_FAILURE in nvlddmkm.sys error.

From the console/remote desktop, open Device Manager. Right click on your display adapter (i.e. your GPU), and navigate to details and select the Device instance path property:

Copy that value. You’ll need it in a second.

Next, open regedit and navigate to that regkey you found above:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\PCI\<yourValueHereVEN_10DE...>

Then to the subkey Device Parameters\Interrupt Management. If the device were in MSI mode, we’d see another subkey “Device Parameters\Interrupt Management\ MessageSignaledInterruptProperties”, but we’re not, so we’ll need to add the “MessageSignaledInterruptProperties” subkey:

And inside our new “MessageSignaledInterruptProperties” subkey, we’ll need to add a sword MSISupported:

And, finally, set its value to 1:

(Sorry, I took the screenshots after the fact, so the subkeys already exist.)

Shutdown your VM.

And add back in the romfile to your VM config:

hostpci0: 02:00,pcie=1,romfile=nvidia1070.rom

And start back up your VM.

Success!

Your Windows 10 VM should now boot flawlessly with your Nvidia GPU enabled:

I hope this helps some of you,
TorqueWrench

Analysis: What Went Wrong with Windows 10 VMs on Proxmox 6.2?

Now that I have my Windows 10 VM working, I think it’s time to investigate and do a root cause analysis (RCA) on what went wrong with Windows VMs with GPU passthrough on Proxmox.

When I went back into my VDI VM to enable MSI on my GPU, I noticed the following in regedit:

I could see my prior changes were still there, they were just under a different Device Instance Path, so presumably, something in the Proxmox update caused the Device Instance to change, thus causing the problem!

loving the guide although I am stuck I am unable to get into safe mode, but i can get to command prompt. but how can i find the VEN number without having access to the device manager?

I would still try to get into recovery mode, you should be able to and it’s by far the easiest/cleanest way to fix this problem. Are you at least getting the green screen of death (likely BSOD for you if you’re using a regular, non-Insider edition of Windows)?

Booting into Windows Recovery Mode in Proxmox

Starting with both a display enabled in the VM hardware (I use VGA) and your dedicated GPU still attached with romfile, after two failed boot attempts, Windows should take you to recovery mode.

Select the troubleshoot option:

From there, select “advanced options”:

Startup settings:

Restart:

After the reboot, you should now be prompted to select which safe mode you wish to use. Choose “Enable Safe Mode with Networking”:

Let me know if you still run into any trouble.

-TorqueWrench

1 Like

I think I am using different Windows version, I moved to proxmox 6.1 it have no issue on gpu passthrough

1 Like

Finally found solution
If you don’t need HDMI audio You can install customized nvidia driver where no HDMI driver needed it work like charm

What I done

Install proxmox 6.1 and install windows with gpu passthrough with custom nvidia driver (you can install official drivers with MSI enable at the same point refer first post).

When you satisfy about nvidia driver working go back to console apt update && apt full-upgrade ( you have enable proxmox free repo).

After update, refer first post grub config and remove extra modules from /etc/modules you will get latest version with no BSOD/GSOD issue.

The best part of this apporach that you can still use old kernel and old initramfs on proxmox boot screen.

1 Like