Playing evil with VMWare ESX Server

During the past month, I've been working on migrating some servers onto a VMWare ESX Server. I did test VMWare Workstation a long time ago, and have had a VMWare Server on my work laptop since it has been freed (as in beer), but ESX Server is something else.

It seems to me, though, that except it can run unmodified OSes without VT/Pacifica technology, it is not technically superior to what you can do with Xen and other free software. For instance, it may be possible to do better virtual switching setups with "standard" linux bridges and ebtables. But from the administrator perspective ESX Server has the advantage of its administration console. I'm still waiting to see a nice and featureful free (as in speech) configuration software for Xen (or maybe, dear lazyweb, could you show me some good urls I missed).

We don't have the full VMWare Infrastructure, though, so I can't speak of VMotion or VMHA, but that sounds neat, on paper. I never tested Xen migration either, so, I won't say it's better :)

Anyways, VMWare ESX Server is a pretty good product, but there are quite a few quirks, or even really painful misfeatures:

  • Sometimes, the console shows more items than what the user is supposed to see. Though you still can't act on these, you wouldn't expect them to show up (for example, I sometimes see items that are only supposed to appear if you have a Virtual Center, which we don't have).
  • If you rename a virtual switch, you have to go through all the VMs that were connected to it to change their network configuration accordingly.
  • You now have to edit the settings to connect/disconnect the CD drive or the network. That used to be less annoying with the console software in version 2.5.
  • You can't display more than an hour of performance graphs without a Virtual Center. Pretty painful when you only have one ESX server. (also known as the "buy more of my products business plan")
  • VMFS doesn't maintain coherency between readdir() and stat(). The d_ino readdir() returns in its struct dirent and st_ino in stat()'s struct stat don't match. This is especially annoying with Legato Networker, which checks that coherency and doesn't save files that "changed inode". To circumvent this misfeature, I installed fuse and slightly modified libfuse and the fusexmp_fh example so that I can mount a mirror of /vmfs with coherent inodes. Now VM disks can be safely saved.
  • It's impossible to create a loopback device on a file residing on VMFS. The filesystem doesn't accept the LOOP_SET_FD request. This means that, while the VM disks files are basically raw disk images, you can't directly mount the filesystems on the service console with a loopback device. Again, with the modified fusexmp_fh program, this is now possible.
  • While there was a (quite broken, as in kernel freeze) way to mount filesystems from VM disks with ESX server 2.5 (which we also tested before upgrading to version 2) with vmware-mount, the only "official" way I found to do this with version 3 is to use vcbMount, which requires a VMWare Virtual Consolidated Backup server (not really free neither as in speech nor as in beer ; seems to be another instance of the "buy more of my products business plan"), and an extra server connected to the SAN.
  • ...

As said above, we can now setup a loopback device on the VM disk files (which needs a little trickery with offsets to get the partitions positions right, but that's not very hard). While it's possible to mount filesystems from an offline disk this way, it's not a good idea to mount an ext3 filesystem from a running VM, because the filesystem is flagged for recovery, and the service console kernel would want to replay the journal, which may have nasty side effects. I don't know for NTFS yet.

There may be a solution to "cleanly" mount filesystems from an online disk, though (not yet tested):

  • Snapshot the VM which the disk is on. After that, the -flat.vmdk file is frozen (only if it's the first snapshot).
  • Use an unionfs or funionfs (over fuse) to keep the real -flat.vmdk file readonly but still can write on it, so that the service console kernel can replay the journal on the writeable part of the unionfs.
  • Loopback mount partitions from the image in the unionfs.

That would be pretty pervert (ext3 over loopback over unionfs over vmfs), but should just work. I'll post a detailed procedure.

2006-12-16 00:15:55+0900

diskimgfs

Both comments and pings are currently closed.

3 Responses to “Playing evil with VMWare ESX Server”

  1. Jim Crilly Says:

    Another option for mounting the ext3 filesystem might be the “noload” option, according to the mount man page it won’t load the journal on mount. I haven’t tried it though so I don’t know how well the ext3 driver would like a filesytem being mounted twice like that.

  2. glandium Says:

    Jim: That’s the first thing I tried… but it doesn’t work…

  3. Jim Crilly Says:

    I didn’t hold high hopes for that, ext3 just isn’t made to be mounted more than once like that. LVM snapshots, possibly combined with xfs_freeze, might be your best bet.