So I recently had an issue where we had to put a host into maintenance mode quickly to accomodate an emergency change for the network team.
Now I personally prefer to scale up vs scaling out. There are pro’s and con’s for both.
We run DL580’s in a stretched cluster and each host holds about 60 VM’s. The hosts are equipped with 10Gb cards, which help.
So putting the host into maintenance mode kicked off a bit of a storm and since we are using 10Gb card s it can suck up to 8Gb of the bandwidth.
Shortly after kicking off this process the host became disconnected from the vCenter server.
Right Click -> connect didn’t work. The VM’s were up and the host was responding to pings.
I also couldn’t connect directly using the client.
Connecting to the console through the iLO I was presented with the familiar yellow and grey screen.
I logged in and turned on local tech support mode.
Neither ./sbin/service.sh restart or ./etc/init.d/hostd restart got the host back. Some googling and KB surfing later and I came across VMware KB 1005566 which discusses manually killing the hostd process and running ./sbin/service.sh restart and ./etc/init.d/hostd restart again. And like magic the host was back.