On our recent SLES 12 SP4 servers, we notice that the VMWare driver installed on client side is not properly releasing memory after ballooning:
The host does not show any ballooning:
> vmware-toolbox-cmd stat balloon
0 MB
However the guest driver shows dozen of gigabytes being "in resetting":
> cat /sys/kernel/debug/vmmemctl
balloon capabilities: 0x1e
used capabilities: 0x1e
is resetting: y
target: 8204474 pages
current: 8204474 pages
timer: 40802
doorbell: 2467
start: 1 ( 0 failed)
guestType: 1 ( 0 failed)
2m-lock: 905 ( 0 failed)
lock: 21771 ( 0 failed)
2m-unlock: 1461 ( 0 failed)
unlock: 20602 ( 0 failed)
target: 40801 ( 1 failed)
prim2mAlloc: 240083 ( 94 failed)
primNoSleepAlloc: 10905780 ( 4 failed)
primCanSleepAlloc: 218318 ( 0 failed)
prim2mFree: 225282
primFree: 10449504
err2mAlloc: 0
errAlloc: 0
err2mFree: 0
errFree: 21
doorbellSet: 1
doorbellUnset: 2
This leaks a huge number of memory and causes all kinds of instabilities. The only workaround we found is to reboot the guest.
Are we doing anything wrong or if there is anything we can do to investigate further?