I'm now running VMware Server 1.0.1 on a HP DL385 (2 dual-core athlon) box running CentOS 4.4
I have two guests OSs running on my install. One windows, one CentOS 4.4
If I shutdown a guest, from the moment the guest should "power off" and return to the status tab I have to wait 8 minutes before I can connect to the guest again or it completely locks up and never returns.
Here are the stats of a guest starting:
==> /var/log/vmware/vmware-serverd.log <==
Feb 16 20:23:27: app| Adding to list of running vms: /var/lib/vmware/Virtual Machines/W8/fwbuilder/FWBuilder.vmx
Feb 16 20:23:27: app| Attempting to launch vmx : /var/lib/vmware/Virtual Machines/W8/fwbuilder/FWBuilder.vmx
Feb 16 20:23:28: app| New connection on socket server-vmxvmdb from host localhost (ip address: local) , user: root
Feb 16 20:23:28: app| Connection from : /var/lib/vmware/Virtual Machines/W8/fwbuilder/FWBuilder.vmx
Feb 16 20:23:28: app| Setting up autoDetect info.
Feb 16 20:23:28: app| VMServerdConnect: connecting to /var/lib/vmware/Virtual Machines/W8/fwbuilder/FWBuilder.vmx
Feb 16 20:23:28: app| Connected to /var/lib/vmware/Virtual Machines/W8/fwbuilder/FWBuilder.vmx
Feb 16 20:23:28: app| SP: Retrieved username: root
Feb 16 20:23:28: app| VM suddenly changed state: poweredOn.
Feb 16 20:23:28: app| SP: Retrieved username: root
Feb 16 20:23:28: app| SP: Retrieved username: root
Feb 16 20:23:28: app| cleanup: cleaned up 1 objects
==> /var/log/messages <==
Feb 16 20:23:28 secw8vm kernel: eth0.7: dev_set_promiscuity(master, 1)
Feb 16 20:23:28 secw8vm kernel: device eth0 entered promiscuous mode
Feb 16 20:23:28 secw8vm kernel: audit(1171657408.205:6): dev=eth0 prom=256 old_prom=0 auid=4294967295
Feb 16 20:23:28 secw8vm kernel: device eth0.7 entered promiscuous mode
Feb 16 20:23:28 secw8vm kernel: audit(1171657408.205:7): dev=eth0.7 prom=256 old_prom=0 auid=4294967295
Feb 16 20:23:28 secw8vm kernel: bridge-eth0.7: enabled promiscuous mode
==> /var/log/vmware/vmware-serverd.log <==
Feb 16 20:24:03: app| VmsdRegister: Config file has changed: /var/lib/vmware/Virtual Machines/W8/fwbuilder/FWBuilder.vmx
Alls well, the machine has started. I can see it in the tab of my vmware-console. all is happy.
I'll now issue a "poweroff" command in my CentOS guest at 20:30
The clock in the guest is incredibly slow (1 second ever 12 seconds I think) even with vmware tools installed and clock=pit
Now the OS has "powered off" and the console is a black box.
==> /var/log/messages <==
Feb 16 20:31:44 secw8vm kernel: eth0.7: dev_set_promiscuity(master, -1)
Feb 16 20:31:44 secw8vm kernel: device eth0 left promiscuous mode
Feb 16 20:31:44 secw8vm kernel: audit(1171657904.264:8): dev=eth0 prom=0 old_prom=256 auid=4294967295
Feb 16 20:31:44 secw8vm kernel: device eth0.7 left promiscuous mode
Feb 16 20:31:44 secw8vm kernel: audit(1171657904.264:9): dev=eth0.7 prom=0 old_prom=256 auid=4294967295
Feb 16 20:31:44 secw8vm kernel: bridge-eth0.7: disabled promiscuous mode
That is the only log so far. Notice there is nothing in the vmware-serverd.log in /var/log OR the log file for the actual virtual machine in /var/lib/vmware/Viru.../vmware.log
If I try and close the tab with the virtual machine still displayed (as a black window, not the start/edit virtual machine) I find I can't. The client has locked up!
The time is 20:34. Now the client has woken up but I am disconnected, all the tabs have vanished. I'll reconnect...
Select "Connect to a host", select "Local host", press connect...
Hangs... about 30 seconds later..
Window appears: "The local VMware Server is not installed, or is not currently running"
Really?
\# ps -ef|grep serverd
root 4895 1 0 19:44 ? 00:00:01 /usr/sbin/vmware-serverd -s -d
So it's locked up perhaps? A brief strace reveals..
read(33, 0x8c53260, 1024) = -1 EAGAIN (Resource temporarily unavailable)
poll(\[\{fd=33, events=POLLIN}], 1, 100) = 0
\# ls -l /proc/4895/fd/33
lrwx------ 1 root root 64 Feb 16 20:43 /proc/4895/fd/33 -> socket:\[25622]
Don't think that socket is open.
a brief strace of the vmware-vmx process hangs. Only see this when a process is stuck waiting for a kernel function to return. Chances are this'll be waiting on /dev/vmmon as a previously noticed.
So in this example the virtual machine NEVER returns. I can't reboot (cleanly) since I cannot kill the vmware-vmx process that is stuck waiting on /dev/vmmon.
Just for reference. Here's what happened when it did eventually wake up after 8 mins on a previous test:
==> vmware.log <==
Feb 16 20:12:10: mks| SOCKET 5 new update req with non-empty pending: logical error on server/client?
Feb 16 20:12:10: vcpu-0| PIIX4: PM Soft Off. Good-bye.
Feb 16 20:12:10: vmx| Stopping VCPU threads...
Feb 16 20:12:10: mks| Async MKS thread is exiting
Feb 16 20:12:10: vmx| DnD rpc already set to 0
Feb 16 20:12:10: vmx| TOOLS received request in VMX to set option 'enableDnD' -> '0'
Feb 16 20:12:10: vmx| SOCKET 5 close VNC socket on VNCBackendDestroy
Feb 16 20:12:10: vmx| MKS local poweroff
Feb 16 20:12:10: vmx| Lock before MKS lock created. Early poweroff?
Feb 16 20:12:10: vmx| Unlock before MKS lock created. Early poweroff?
Feb 16 20:12:10: vmx| Msg_Hint: msg.tools.toolsImage (not shown)
Feb 16 20:12:10: vmx| scsi0:0: numIOs = 5119 numMergedIOs = 0 numSplitIOs = 0 ( 0.0%)
Feb 16 20:12:10: vmx| FILEIO: Cannot remove lock file /var/lib/vmware/Virtual Machines/W8/fwbuilder/FWBuilder.vmem.WRITELOCK (No such file or directory).
Feb 16 20:12:10: vmx| FILEIO: Failed to unlock /var/lib/vmware/Virtual Machines/W8/fwbuilder/FWBuilder.vmem.
Feb 16 20:12:10: IO#1| AIOGNRC: thread #1 exiting (904)
Feb 16 20:12:10: IO#4| AIOGNRC: thread #4 exiting (883)
Feb 16 20:12:10: IO#0| AIOGNRC: thread #0 exiting (878)
Feb 16 20:12:10: IO#5| AIOGNRC: thread #5 exiting (861)
Feb 16 20:12:10: IO#3| AIOGNRC: thread #3 exiting (880)
Feb 16 20:12:10: IO#2| AIOGNRC: thread #2 exiting (874)
Feb 16 20:12:10: vmx| AIOGNRC: asyncOps=5280 syncOps=0 maxPending=57 maxCompleted=2
==> /var/log/messages <==
Feb 16 20:12:10 secw8vm kernel: eth0.7: dev_set_promiscuity(master, -1)
Feb 16 20:12:10 secw8vm kernel: device eth0 left promiscuous mode
Feb 16 20:12:10 secw8vm kernel: audit(1171656730.450:4): dev=eth0 prom=0 old_prom=256 auid=4294967295
Feb 16 20:12:10 secw8vm kernel: device eth0.7 left promiscuous mode
Feb 16 20:12:10 secw8vm kernel: audit(1171656730.450:5): dev=eth0.7 prom=0 old_prom=256 auid=4294967295
Feb 16 20:12:10 secw8vm kernel: bridge-eth0.7: disabled promiscuous mode
THEN WE WAIT FOR 8 MINUTES WHILST NOTHING RESPONDS
==> vmware.log <==
Feb 16 20:20:15: vmx| DUMPER: Deleted checkpoint file /var/lib/vmware/Virtual Machines/W8/fwbuilder/FWBuilder.vmss
Feb 16 20:20:15: vmx| VMX idle exit
Feb 16 20:20:15: vmx| VMX IPC closed the connection with thread servercontrol (0x84db360)
Feb 16 20:20:15: vmx| VMX: Remote VMControl client servercontrol disconnected.
Feb 16 20:20:15: vmx| Flushing VMX VMDB connections
Feb 16 20:20:15: vmx| IPC_exit: disconnecting all threads
Feb 16 20:20:15: vmx| VMX exit.
Feb 16 20:20:15: vmx| AIOMGR-S : stat o=7 r=4 w=0 i=0 br=2044 bw=0
==> /var/log/vmware/vmware-serverd.log <==
Feb 16 20:20:15: app| vmdbPipe_Streams Couldn't read: OVL_STATUS_EOF
Feb 16 20:20:15: app| VMServerd IPC closed the connection with thread /var/lib/vmware/Virtual Machines/W8/fwbuilder/FWBuilder.vmx (0x8302a50)
Feb 16 20:20:15: app| Lost connection to /var/lib/vmware/Virtual Machines/W8/fwbuilder/FWBuilder.vmx (/var/lib/vmware/Virtual Machines/W8/fwbuilder/FWBuilder.vmx) unexpectedly.
Feb 16 20:20:15: app| VM suddenly changed state: poweredOff.
Feb 16 20:20:15: app| Removing from running vm list: /var/lib/vmware/Virtual Machines/W8/fwbuilder/FWBuilder.vmx
Feb 16 20:20:15: app| VM suddenly changed state: poweredOff.
Feb 16 20:20:15: app| VM suddenly changed state: poweredOff.
Feb 16 20:20:15: app| VM suddenly changed state: poweredOff.
Feb 16 20:20:15: app| SP: Retrieved username: root
Feb 16 20:20:15: app| cleanup: cleaned up 1 objects
Feb 16 20:20:15: app| SP: Retrieved username: root
I notice that this forum tends only to attract attention when things are relatively simple or really interesting. Does anyone have any idea why it is simply so unreliable on on CentOS4.4 on this hardware. Does anyone else have a sucess story running 1.0.1 on CentOS 4.4 on a multi-core system (in 32bit mode!)
Cheers
Chris