So, came in today, and fire alarms are going off, people are planning evacuations, ‘VMware is down’ I hear someone mutter as a enter the room.
Sure enough, last night at some point one of my ESXi 4.1 boxes lost it’s mind and disconnected from vSphere. Several hours later I got it back and here is what I learned:
I logged into vCenter and one of the boxes was connected, the other was not. I tried to reconnect it from vCenter, and that failed. So began the process ever system admin goes though when stuff is broke, we go to our hierarchy of problems, I think most of us have this stored in our brains.
- Can I ping it? Yes
- Can I ssh to it? Yes
- Can I log into it vSphere Client? No Hum…
- Are servers on that box still running? Unknown (I couldn’t get this work from the cli: vim-cmd vmsvc/getallvms )
- Can I ping servers that where on that box running? No
So off to Google I went and I’ll leave links below for the pages that help me get to the solution.
First Error Message: Call “Datacenter.QueryConnectionInfo” for object “<DataCenter name>” on vCenter Server “<vCenter name>” failed.
Ok, so at this point we can probably assume the box is jacked up really good. But, we don’t know if the VMs are running, rebooting the box would be rude for the guest.
We could try restarting the management agents > /sbin/services.sh restart
Ok, still can’t connect, someone else mentions starting > /etc/opt/init.d/vmware-vpxa restart
Still no dice, getting frustrated didn’t work and says something about like “Failed to login” when I run vim-cmd vmsvc/getallvms or any other commands
Finally came across some things to test:
ls -l /etc/vmware/hostd/proxy.xml if this returns 0 for the file size, that’s your problem. Replace this file from another ESX server and you should be good.