Hi All,
I just rebuilt a nice production VM server as follows:
Tyan Thunder K8SD-Pro
Dual 2.0ghz Opterons
3gb PC3200 DDR ECC
3Ware 9550sx-4LP Controller
4x Seagate 250gb SATA Drives RAID 5
sda = 10gb (boot 100mb, root 5gb, swap 5gb)
sdb = 689gb (split up into different lv for each VM)
Centos Server 4.3
VMware Server 1.0.1
bonnie++ gives some pretty nice speed results (103mb/s block writes 129mb/s reads)
I have six virtual machines running on it, the heaviest use currently is a mail server that has maybe 20 non-concurrent users. No machine has more than 512mb allocated, most are 384mb. If I try to do any 'significant' I/O between guests, the host and all VMs grind to a halt. In the VM logs I get errors such as these:
Sep 25 21:51:48: vmx| SCSI0:1: Command WRITE(10) took 1.617 seconds (ok)
Sep 25 21:51:49: vmx| SCSI0:1: Command WRITE(10) took 2.642 seconds (ok)
Sep 25 21:51:49: vmx| SCSI0:1: Command WRITE(10) took 2.549 seconds (ok)
Sep 25 21:51:49: vmx| SCSI0:1: Command WRITE(10) took 2.669 seconds (ok)
Sep 25 21:51:50: vmx| SCSI0:1: Command WRITE(10) took 2.806 seconds (ok)
Sep 25 21:51:50: vmx| SCSI0:1: Command WRITE(10) took 2.816 seconds (ok)
Sep 25 21:51:50: vmx| SCSI0:1: Command WRITE(10) took 3.112 seconds (ok)
Sep 25 21:51:50: vmx| SCSI0:1: Command WRITE(10) took 3.056 seconds (ok)
Sep 25 21:51:50: vmx| SCSI0:1: Command WRITE(10) took 3.056 seconds (ok)
Sep 25 21:51:52: vmx| SCSI0:1: Command WRITE(10) took 4.959 seconds (ok)
Sep 25 21:51:52: vmx| SCSI0:1: Command WRITE(10) took 4.959 seconds (ok)
Sep 25 21:51:52: vmx| SCSI0:1: Command WRITE(10) took 5.052 seconds (ok)
Sep 25 21:51:52: vmx| SCSI0:1: Command WRITE(10) took 5.181 seconds (ok)
Sep 25 21:51:52: vmx| SCSI0:1: Command WRITE(10) took 5.254 seconds (ok)
Sep 25 21:51:52: vmx| SCSI0:1: Command WRITE(10) took 4.200 seconds (ok)
Sep 25 21:51:52: vmx| SCSI0:1: Command WRITE(10) took 4.200 seconds (ok)
Sep 25 21:51:52: vmx| SCSI0:1: Command WRITE(10) took 5.456 seconds (ok)
Sep 25 21:51:52: vmx| SCSI0:1: Command WRITE(10) took 5.363 seconds (ok)
Sep 25 21:51:53: vmx| SCSI0:1: Command WRITE(10) took 5.808 seconds (ok)
Sep 25 21:51:54: vmx| SCSI0:1: Command WRITE(10) took 5.609 seconds (ok)
Sep 25 21:51:54: vmx| SCSI0:1: Command WRITE(10) took 6.231 seconds (ok)
Sep 25 21:51:59: vmx| SCSI0:1: Command WRITE(10) took 12.304 seconds (ok)
Sep 25 21:52:02: vmx| SCSI0:1: Command WRITE(10) took 15.461 seconds (ok)
Sep 25 21:52:02: vmx| SCSI0:1: Command WRITE(10) took 14.205 seconds (ok)
Sep 25 21:52:05: vmx| SCSI0:1: Command WRITE(10) took 15.731 seconds (ok)
Sep 25 21:52:15: vmx| SCSI0:1: Command WRITE(10) took 25.476 seconds (ok)
Sep 25 21:52:16: vmx| SCSI0:1: Command WRITE(10) took 26.980 seconds (ok)
Sep 25 21:52:17: vmx| SCSI0:1: Command WRITE(10) took 27.702 seconds (ok)
Sep 25 21:52:20: vmx| SCSI0:1: Command WRITE(10) took 30.326 seconds (ok)
Sep 25 21:52:22: vmx| SCSI0:1: Command WRITE(10) took 32.545 seconds (ok)
Sep 25 21:52:24: vmx| SCSI0:1: Command WRITE(10) took 33.939 seconds (ok)
Sep 25 21:52:26: vmx| SCSI0:1: Command WRITE(10) took 35.888 seconds (ok)
Sep 25 21:52:29: vmx| SCSI0:1: Command WRITE(10) took 37.060 seconds (ok)
Sep 25 21:52:32: vmx| SCSI0:1: Command WRITE(10) took 40.101 seconds (ok)
Sep 25 21:52:32: vmx| SCSI0:1: Command WRITE(10) took 40.116 seconds (ok)
Sep 25 21:52:33: vmx| SCSI0:1: Command WRITE(10) took 41.201 seconds (ok)
Sep 25 21:52:35: vmx| SCSI0:1: Command WRITE(10) took 42.727 seconds (ok)
Sep 25 21:52:36: vmx| SCSI0:1: Command WRITE(10) took 47.764 seconds (ok)
Sep 25 21:52:39: vmx| SCSI0:1: Command WRITE(10) took 46.462 seconds (ok)
Sep 25 21:52:42: vmx| SCSI0:1: Command WRITE(10) took 49.949 seconds (ok)
Sep 25 21:52:44: vmx| SCSI0:1: Command WRITE(10) took 56.397 seconds (ok)
Sep 25 21:52:46: vmx| SCSI0:1: Command WRITE(10) took 53.971 seconds (ok)
Sep 25 21:52:46: vmx| SCSI0:1: Command WRITE(10) took 58.182 seconds (ok)
Sep 25 21:52:47: vmx| SCSI0:1: Command WRITE(10) took 59.407 seconds (ok)
Sep 25 21:52:52: vmx| SCSI0:1: Command WRITE(10) took 49.300 seconds (ok)
Sep 25 21:52:52: vmx| SCSI0:1: Command WRITE(10) took 63.597 seconds (ok)
Sep 25 21:52:52: vmx| SCSI0:0: Command WRITE(10) took 46.021 seconds (ok)
Sep 25 21:52:54: vcpu-1| SCSI0:1: Command WRITE(10) took 52.159 seconds (ok)
Sep 25 21:53:02: vcpu-1| SCSI0:1: Command WRITE(10) took 74.284 seconds (ok)
Sep 25 21:53:02: vcpu-1| SCSI0:1: Command WRITE(10) took 74.284 seconds (ok)
Sep 25 21:53:02: vcpu-1| SCSI0:1: Command WRITE(10) took 74.284 seconds (ok)
Sep 25 21:53:02: vcpu-1| SCSI0:1: Command WRITE(10) took 73.924 seconds (ok)
Sep 25 21:53:02: vcpu-1| SCSI0:1: Command WRITE(10) took 70.080 seconds (ok)
Sep 25 21:53:02: vcpu-1| SCSI0:1: Command WRITE(10) took 69.731 seconds (ok)
Sep 25 21:53:02: vcpu-1| SCSI0:1: Command WRITE(10) took 59.987 seconds (ok)
Sep 25 21:53:02: vcpu-1| SCSI0:1: Command WRITE(10) took 59.987 seconds (ok)
Sep 25 21:53:04: vmx| SCSI0:0: Command WRITE(10) took 52.595 seconds (ok)
Sep 25 21:53:04: vmx| SCSI0:0: Command WRITE(10) took 52.595 seconds (ok)
Sep 25 21:53:04: vmx| SCSI0:0: Command WRITE(10) took 52.595 seconds (ok)
Sep 25 21:53:04: vmx| SCSI0:0: Command WRITE(10) took 52.595 seconds (ok)
Sep 25 21:53:04: vmx| SCSI0:0: Command WRITE(10) took 52.595 seconds (ok)
Sep 25 21:53:04: vmx| SCSI0:0: Command WRITE(10) took 52.595 seconds (ok)
Sep 25 21:53:04: vmx| SCSI0:0: Command WRITE(10) took 44.486 seconds (ok)
Sep 25 21:53:05: vmx| SCSI0:0: Command WRITE(10) took 53.836 seconds (ok)
Sep 25 21:53:06: vmx| SCSI0:0: Command WRITE(10) took 54.307 seconds (ok)
Sep 25 21:53:08: vmx| SCSI0:0: Command WRITE(10) took 48.982 seconds (ok)
Sep 25 21:53:09: vmx| SCSI0:0: Command WRITE(10) took 57.906 seconds (ok)
Sep 25 21:53:09: vmx| SCSI0:0: Command WRITE(10) took 57.940 seconds (ok)
Sep 25 21:53:11: vmx| SCSI0:0: Command WRITE(10) took 51.876 seconds (ok)
Sep 25 21:53:14: vmx| SCSI0:0: Command WRITE(10) took 62.442 seconds (ok)
Sep 25 21:53:14: vmx| SCSI0:0: Command WRITE(10) took 54.334 seconds (ok)
Sep 25 21:53:17: vmx| SCSI0:0: Command WRITE(10) took 65.991 seconds (ok)
Sep 25 21:53:17: vmx| SCSI0:0: Command WRITE(10) took 57.883 seconds (ok)
Sep 25 21:53:17: vmx| SCSI0:0: Command WRITE(10) took 57.883 seconds (ok)
Sep 25 21:53:25: vmx| SCSI0:0: Command WRITE(10) took 65.384 seconds (ok)
Sep 25 21:53:26: vmx| SCSI0:0: Command WRITE(10) took 66.637 seconds (ok)
Sep 25 21:53:26: vmx| SCSI0:0: Command WRITE(10) took 66.637 seconds (ok)
Sep 25 21:53:28: vmx| SCSI0:0: Command WRITE(10) took 68.198 seconds (ok)
Sep 25 21:53:29: vmx| SCSI0:0: Command WRITE(10) took 69.222 seconds (ok)
Sep 25 21:53:31: vmx| SCSI0:0: Command WRITE(10) took 71.870 seconds (ok)
Sep 25 21:53:34: vmx| SCSI0:0: Command WRITE(10) took 74.066 seconds (ok)
Sep 25 21:53:35: vmx| SCSI0:0: Command WRITE(10) took 75.285 seconds (ok)
Sep 25 21:53:37: vmx| SCSI0:0: Command WRITE(10) took 77.963 seconds (ok)
Sep 25 21:53:37: vmx| SCSI0:0: Command WRITE(10) took 77.963 seconds (ok)
Sep 25 21:53:42: vmx| SCSI0:0: Command WRITE(10) took 82.693 seconds (ok)
Sep 25 21:53:46: vmx| SCSI0:0: Command WRITE(10) took 86.458 seconds (ok)
Sep 25 21:53:46: vmx| SCSI0:0: Command WRITE(10) took 86.458 seconds (ok)
Sep 25 21:53:52: vcpu-0| Msg_Hint: msg.monitorevent.halt (not shown)
Sep 25 21:58:36: vmx| VMXVmdbCbVmVmxExecState: Exec state change requested to state poweredOff without reset
Sep 25 21:58:36: vmx| VMX: attempted to do a soft halt while not in the correct state. Ignored...
Sep 25 21:58:36: vmx| Stopping VCPU threads...
Sep 25 21:58:36: mks| Async MKS thread is exiting
Sep 25 21:58:36: vmx| DnD rpc already set to 0
Sep 25 21:58:36: vmx| TOOLS received request in VMX to set option 'enableDnD' -> '0'
Sep 25 21:58:36: vmx| SOCKET 3 close VNC socket on VNCBackendDestroy
Sep 25 21:58:36: vmx| MKS local poweroff
Sep 25 21:58:36: vmx| Lock before MKS lock created. Early poweroff?
Sep 25 21:58:36: vmx| Unlock before MKS lock created. Early poweroff?
Sep 25 21:58:37: vmx| scsi0:1: numIOs = 143493 numMergedIOs = 12283 numSplitIOs = 640 ( 5.0%)
Sep 25 21:58:37: vmx| scsi0:0: numIOs = 107466 numMergedIOs = 13680 numSplitIOs = 2573 (15.8%)
Sep 25 21:58:38: IO#3| AIOGNRC: thread #3 exiting (33029)
Sep 25 21:58:38: IO#9| AIOGNRC: thread #9 exiting (33022)
Sep 25 21:58:38: IO#2| AIOGNRC: thread #2 exiting (33184)
Sep 25 21:58:38: IO#0| AIOGNRC: thread #0 exiting (32972)
Sep 25 21:58:38: IO#7| AIOGNRC: thread #7 exiting (32702)
Sep 25 21:58:38: IO#5| AIOGNRC: thread #5 exiting (33344)
Sep 25 21:58:38: IO#8| AIOGNRC: thread #8 exiting (32425)
Sep 25 21:58:38: IO#6| AIOGNRC: thread #6 exiting (32878)
Sep 25 21:58:38: IO#4| AIOGNRC: thread #4 exiting (32817)
Sep 25 21:58:38: IO#1| AIOGNRC: thread #1 exiting (33135)
Sep 25 21:58:38: vmx| AIOGNRC: asyncOps=253741 syncOps=116 maxPending=57 maxCompleted=39
Sep 25 21:58:39: vmx| VMX idle exit
Sep 25 21:58:39: vmx| VMX IPC closed the connection with thread servercontrol (0x84db360)
Sep 25 21:58:39: vmx| VMX: Remote VMControl client servercontrol disconnected.
Sep 25 21:58:39: vmx| Flushing VMX VMDB connections
Sep 25 21:58:39: vmx| IPC_exit: disconnecting all threads
Sep 25 21:58:39: vmx| VMX exit.
Sep 25 21:58:39: vmx| AIOMGR-S : stat o=16 r=24 w=0 i=0 br=12462 bw=0
...
I included the part in the logs where I had to hard power off the guest. Each VM has these errors regularly, but shown is an extreme and frustrating example.
Any thoughts on how to attack troubleshooting this problem? I am fairly new to linux, but holding my own.