Welcome To NetTech Solutions

Warm Welcome to Every Visitor. Here we serve best technical support, all the posts here you find will help you solving and facing the day to day problems.
Every one is welcome to comment on our Posts.

Thursday, March 6, 2014

VMware CPU and Memory Reservations: Fixing Insufficient resources to satisfy configured failover level for HA


This post comes from a few days of poring over manuals as well as some technical support. This is a good one. The error came from trying to power on a VM in our VMware cluster and we would get these errors:
“Insufficient resources to satisfy configured failover level for HA”


And this alert on our cluster
“Insufficient resources to satisfy HA failover level on cluster vmCluster in vmTST”
Our way of thinking was we had to power one off to power another one on.
But that didn’t work.
Here we is the actual solution. (p.s. Great VMware HA education for me on this one!)
PROBLEM SOURCE: VMware HA is turned on and you are violating constraints
VMware HA is turned on, and you have it configured so that there is a certain amount of resource reserve for failover. By turning on this VM, you are going to dip into that resource reserve and so VMware is telling you “Nope, not turning it on….”
There is a quick fix to get the VM turned on (one good way, one bad way), and then there are two long term fixes for you to consider. In my case, the first one was faster, while the second one was better for my environment.
My VMware environment
Datacenter: vmTST
Cluster: vmCluster
OS: ESXi 4.1.0
Five (5) servers in a cluster.
My VMware Cluster Errors
As mentioned above:
“Insufficient resources to satisfy configured failover level for HA”
and
“Insufficient resources to satisfy HA failover level on cluster vmCluster in vmTST”
TWO WAYS TO DO QUICK FIX
1.Turning off HA (popular, and I would say WRONG)
2.Disable Admission Control (much better!!)
#1: Turning off HA (though I recommend against)
This is the solution I saw on some forums (including vmware forum). After looking at it more, I recommend against it and I’ll explain why, but here it is:
VSphere Client: Browse Inventory -> Hosts and Clusters
Edit VMware cluster settings
Right Click on Cluster name -> Edit Settings
Turning off HA
While this works, if you do this, whenever you turn it back on, it has to do a recalculation for the HA failover. Bad, especially for testing or doing temporary power ons.
#2: Disable “Admission Control” (better IMO)
Better to disable “Admission Control” so VMS will power on despite violating availability constraints. This way your HA is still on. In the long run, though, it is better to fix your issue.
Same window, but next bullet item on the left:
LONG TERM FIX: TWO WAYS
There are two things I ended up having to look at. One was pretty good long term fix and that I had found suggested on forums including VMware forums.
The second is the actual fix to my problem, the best one in the long term
FIX #1: Change from “Host Failures Cluster Tolerates” to “Percentage of cluster resources reserved as failover spare capacity”
In other words, instead of telling VMware you want to have enough resource reserve so that you can lose one host, you are telling VMware you want to have a certain percentage of resources unused for failover.
We had it configured to lose one host. So by switching to a percentage it was a quick and easy fix for my environment.
VMware HA: Host failures cluster tolerates (?)
So if we look at the “VMware HA” window, you’ll see that my “Host failures cluster tolerates” was set to 1. Now with 5 servers you would think that means “20%” but that’s not so. Because what if one of your VMs (or more) for whatever reason took up 75% of your resources, then by worst case calculation you could only have one VM on your five node cluster.
A worst case calculation of your largest VM will determine what’s called a “slot” size. The VMware HA will then calculate how many total “slots” can be used which determines how many total VMs you can have powered on.
When this option is chosen, from what I’ve read on VMware forums, the calculations are VERY conservative.
Find Your Slot Size: VMware Cluster Summary -> Advanced Runtime Info
VMware Advanced Runtime Info: Slot sizes
So you can see above, worst case scenario, one slot size is 2507Mhz, 4256 MB. With that in mind, there are 55 slots available on my five node cluster. There are a total of 156 VMs out of 55.
This means I would have to power off 102 VMs to get to 54 powered on VMs leaving one slot open to power the new one on… (YIKES!0
Changing To Percentage: First Check Resource Usage
Out of curiosity, I checked the actual resource usage in my cluster
If you tally up all the green bars in CPU, I could fit all the CPU usage of every VM on one host.
If you tally up all the green bars in Memory, I could fit all the memory usage in about three hosts.
So why can’t I power on a VM? Because the calculation is *THAT CONSERVATIVE* for the “Host failure cluster tolerates” option
VMware HA: Switch to percentage
Now, the first time I did this, I chose “20%” which prorated to one server out of the five being free.
And I was able to power on a VM
On a whim, I kept upping the percentage and I got as high as 75% before I decided to stop, thinking I was doing something wrong.
Part of it was that the VM I was powering on was very very small in resource usage (and later I found out also it had 0 reserve configured with it) which is probably why it powered on even at 75% failover spare capacity.
Anyhow, so in a pinch, this is one way to configure some amount of reserve AND be able to power on your VMs, at least if your resource usage somewhat mirrors mine (see previous picture)
FIX #2: Best Long Term Fix: Determine WHY the cluster resource reserve is so high and see if it is actually needed, or if it is just poorly configured
In the end this was the actual fix for us, because it delved into the actual source of the problem. Which was to find out:
WHY the heck was our VM slot size so BIG?
Because obviously all five hosts combined were using VERY LITTLE CPU and RAM. Less than 20% on CPU (it could fit all on one server), and less than 50% on RAM (it could fit on two to three servers).
It turns out: The slot size is not based on usage, it is based on a VM resource reservation.
So here is how to check your resource reservation for your Vms.
VMware Cluster: Resource Allocation for CPU and Memory
CPU
(The dashed lines are my VM names which I blanked out)
Click on the “CPU” button and look for the “Reservation” column and sort by largest to smallest.
Memory
(The dashed lines are my VM names which I blanked out)
Click on the “Memory” button and look for the “Reservation” column and sort by largest to smallest.
As you can see, there are many VMs with resource reservation. This means as soon as the VM is powered on, it will reserve this much resource REGARDLESS IF IT IS NEEDED OR NOT!
But as you can see by actual usage, we are not even near to capacity, there is no real reason for us to reserve that much.
One of the culprits: it turns out many of our templates we use to clone/deploy VMs had resources reservation already set, so each time we made a new VM it had a resource reservation.
VMware Cluster: Virtual Machines Actual Usage
Go to the tab “Virtual Machines” now and you can see actual usage. There is a column “HOST CPU – Mhz” and “Guest Mem – %”. These show actual usage by the VM.
I sorted alphabetically here and referenced the previous two pictures (VMs with the highest reservations) and then checked this list to see actual usage. Sure enough, many of our VMs were not using that much resource (as you can tell from earlier graphs)
Next step: contact VM owners to see if the VM was in typical usage. If so, get permission to turn the resource reserve down or even off.
VMware: Right Click -> Edit Settings
To configure resource reserve, right click on the VM and Edit Settings
VMware: CPU reservation and Memory reservation
Here I turned the CPU Resource reservation and memory reservation low or to zero
REMEMBER TO CONSULT YOUR USER FIRST TO SEE IF VM IS IN TYPICAL USE
VMware HA: Advanced Runtime Info Results
Now go back to your Advanced Runtime Info Results… (you might have to turn the VMware HA to “Host failover cluster tolerates” if you had changed it to the percentage as an intermediate fix)
When all was said and done, I went from 55 slots to 550 slots.
And from being in the “red” of 101 VMs I’d need to power off to power one on to being in the “green” of 394 VM slots available.
CPU slot size went down a factor of 10
Memory slot size went down a factor of 20
NICE!!!
Hope this has been helpful!

2 comments:

Anonymous said...

You stole this article from someone else word for word.

http://geekswing.com/geek/vmware-cpu-and-ram-reservations-fixing-insufficient-resources-to-satisfy-configured-failover-level-for-ha/

Anonymous said...

Dude, stop stealing people's content, you copied it word for word and even used images off of his article too.