This post comes from a few days of poring over manuals as well as some technical support. This is a good one. The error came from trying to power on a VM in our VMware cluster and we would get these errors:
“Insufficient
resources to satisfy configured failover level for HA”
And this
alert on our cluster
“Insufficient
resources to satisfy HA failover level on cluster vmCluster in vmTST”
Our way
of thinking was we had to power one off to power another one on.
But that
didn’t work.
Here we
is the actual solution. (p.s. Great VMware HA education for me on this one!)
PROBLEM SOURCE:
VMware HA is turned on and you are violating constraints
VMware HA
is turned on, and you have it configured so that there is a certain amount of
resource reserve for failover. By turning on this VM, you are going to dip into
that resource reserve and so VMware is telling you “Nope, not turning it on….”
There is
a quick fix to get the VM turned on (one good way, one bad way), and then there
are two long term fixes for you to consider. In my case, the first one was
faster, while the second one was better for my environment.
My VMware environment
Datacenter:
vmTST
Cluster: vmCluster
OS: ESXi 4.1.0
Five (5) servers in a cluster.
Cluster: vmCluster
OS: ESXi 4.1.0
Five (5) servers in a cluster.
My VMware Cluster Errors
As
mentioned above:
“Insufficient
resources to satisfy configured failover level for HA”
and
“Insufficient
resources to satisfy HA failover level on cluster vmCluster in vmTST”
TWO WAYS TO DO
QUICK FIX
1.Turning
off HA (popular, and I would say WRONG)
2.Disable
Admission Control (much better!!)
#1: Turning off HA (though I recommend against)
This is
the solution I saw on some forums (including vmware forum). After looking at it
more, I recommend against it and I’ll explain why, but here it is:
VSphere Client: Browse Inventory -> Hosts and
Clusters
Edit VMware cluster settings
Right
Click on Cluster name -> Edit Settings
Turning off HA
While
this works, if you do this, whenever you turn it back on, it has to do a
recalculation for the HA failover. Bad, especially for testing or doing
temporary power ons.
#2: Disable “Admission Control” (better IMO)
Better to
disable “Admission Control” so VMS will power on despite violating availability
constraints. This way your HA is still on. In the long run, though, it is
better to fix your issue.
Same
window, but next bullet item on the left:
LONG TERM FIX:
TWO WAYS
There are
two things I ended up having to look at. One was pretty good long term fix and
that I had found suggested on forums including VMware forums.
The
second is the actual fix to my problem, the best one in the long term
FIX #1: Change
from “Host Failures Cluster Tolerates” to “Percentage of cluster resources
reserved as failover spare capacity”
In other
words, instead of telling VMware you want to have enough resource reserve so
that you can lose one host, you are telling VMware you want to have a certain
percentage of resources unused for failover.
We had it
configured to lose one host. So by switching to a percentage it was a quick and
easy fix for my environment.
VMware HA: Host failures cluster tolerates (?)
So if we
look at the “VMware HA” window, you’ll see that my “Host failures cluster
tolerates” was set to 1. Now with 5 servers you would think that means “20%”
but that’s not so. Because what if one of your VMs (or more) for whatever
reason took up 75% of your resources, then by worst case calculation you could
only have one VM on your five node cluster.
A worst
case calculation of your largest VM will determine what’s called a “slot” size.
The VMware HA will then calculate how many total “slots” can be used which
determines how many total VMs you can have powered on.
When this
option is chosen, from what I’ve read on VMware forums, the calculations are VERY
conservative.
Find Your Slot Size: VMware Cluster Summary ->
Advanced Runtime Info
VMware Advanced Runtime Info: Slot sizes
So you
can see above, worst case scenario, one slot size is 2507Mhz, 4256 MB. With
that in mind, there are 55 slots available on my five node cluster. There are a
total of 156 VMs out of 55.
This
means I would have to power off 102 VMs to get to 54 powered on VMs leaving one
slot open to power the new one on… (YIKES!0
Changing To Percentage: First Check Resource Usage
Out of
curiosity, I checked the actual resource usage in my cluster
If you
tally up all the green bars in CPU, I could fit all the CPU usage of every VM
on one host.
If you
tally up all the green bars in Memory, I could fit all the memory usage in
about three hosts.
So why
can’t I power on a VM? Because the calculation is *THAT CONSERVATIVE*
for the “Host failure cluster tolerates” option
VMware HA: Switch to percentage
Now, the
first time I did this, I chose “20%” which prorated to one server out of the
five being free.
And I was
able to power on a VM
On a
whim, I kept upping the percentage and I got as high as 75% before I decided to
stop, thinking I was doing something wrong.
Part of
it was that the VM I was powering on was very very small in resource usage (and
later I found out also it had 0 reserve configured with it) which is probably
why it powered on even at 75% failover spare capacity.
Anyhow,
so in a pinch, this is one way to configure some amount of reserve AND be able
to power on your VMs, at least if your resource usage somewhat mirrors mine
(see previous picture)
FIX #2: Best
Long Term Fix: Determine WHY the cluster resource reserve is so high and see if
it is actually needed, or if it is just poorly configured
In the
end this was the actual fix for us, because it delved into the actual source of
the problem. Which was to find out:
WHY the
heck was our VM slot size so BIG?
Because
obviously all five hosts combined were using VERY LITTLE CPU and RAM. Less than
20% on CPU (it could fit all on one server), and less than 50% on RAM (it could
fit on two to three servers).
It turns
out: The slot size is not based on usage, it is based on a VM resource
reservation.
So here
is how to check your resource reservation for your Vms.
VMware Cluster: Resource Allocation for CPU and
Memory
CPU
(The
dashed lines are my VM names which I blanked out)
Click on
the “CPU” button and look for the “Reservation” column and sort by largest to
smallest.
Memory
(The
dashed lines are my VM names which I blanked out)
Click on
the “Memory” button and look for the “Reservation” column and sort by largest
to smallest.
As you
can see, there are many VMs with resource reservation. This means as soon as
the VM is powered on, it will reserve this much resource REGARDLESS IF IT IS
NEEDED OR NOT!
But as
you can see by actual usage, we are not even near to capacity, there is no real
reason for us to reserve that much.
One of
the culprits: it turns out many of our templates we use to clone/deploy VMs had
resources reservation already set, so each time we made a new VM it had a
resource reservation.
VMware Cluster: Virtual Machines Actual Usage
Go to the
tab “Virtual Machines” now and you can see actual usage. There is a column
“HOST CPU – Mhz” and “Guest Mem – %”. These show actual usage by the VM.
I sorted
alphabetically here and referenced the previous two pictures (VMs with the
highest reservations) and then checked this list to see actual usage. Sure
enough, many of our VMs were not using that much resource (as you can tell from
earlier graphs)
Next
step: contact VM owners to see if the VM was in typical usage. If so, get
permission to turn the resource reserve down or even off.
VMware: Right Click -> Edit Settings
To
configure resource reserve, right click on the VM and Edit Settings
VMware: CPU reservation and Memory reservation
Here I
turned the CPU Resource reservation and memory reservation low or to zero
REMEMBER
TO CONSULT YOUR USER FIRST TO SEE IF VM IS IN TYPICAL USE
VMware HA: Advanced Runtime Info Results
Now go
back to your Advanced Runtime Info Results… (you might have to turn the VMware
HA to “Host failover cluster tolerates” if you had changed it to the percentage
as an intermediate fix)
When all
was said and done, I went from 55 slots to 550 slots.
And from
being in the “red” of 101 VMs I’d need to power off to power one on to being in
the “green” of 394 VM slots available.
CPU slot
size went down a factor of 10
Memory slot size went down a factor of 20
Memory slot size went down a factor of 20
NICE!!!
Hope this
has been helpful!
2 comments:
You stole this article from someone else word for word.
http://geekswing.com/geek/vmware-cpu-and-ram-reservations-fixing-insufficient-resources-to-satisfy-configured-failover-level-for-ha/
Dude, stop stealing people's content, you copied it word for word and even used images off of his article too.
Post a Comment