Over the past few days some new guests on my whitebox vmhost (which has been running for almost a year now) started behaving badly! Symptoms listed below, and I’m sure there are were more. There wasn’t really any pattern to the crashing, it was very random. Things went awry after I added my ninth guest and the host peaked into the 14-16GB of RAM range. My existing guests running PRTG, Zoneminder, VPN and NGINX all were fine… all were stable. Just a few new Windows guests I have been using for testing were crashing. I couldn’t even reinstall windows without a BSOD while loading.
- VMware ESX unrecoverable error: (vcpu-0)
- MONITOR PANIC: Unable to decompress PPN from swap slot for VM
- loading windows starting and crashing Msrpc.sys
- Windows MMC’s not loading, crashing
I ran memtest86+ on the full 32 gigs with no failures (with only one pass, probably not a good idea) and figured it couldn’t be RAM.
Googling took me down some rabbit holes unloading custom NIC drivers I’ve added. Almost ready to reinstall ESXi. But the issue still persisted. Finally decided to run memtest actually in the VM that was having troubles. Errors within seconds. I pulled half the RAM, ran it again… no issues. Looks like it’s bad RAM. RMA time.
So an issue I’ve seen many times back in my Geek Squad days in college- got me. A reminder, even in 2016, don’t rule out bad memory so quick, it would have saved me a few hours tonight.
And finally, the purple screen!