Thursday, May 29, 2008

Keeping a tap on virtual machines

Ok, I'm close to admitting defeat now. I can't keep track of all of the virtual machines that are created around me. The fact that we are using VMware ESX, VMware Server, Xen, and KVM all together does not make the job easier. Just keeping an overview of what runs on which server, which IP addresses are used, and which credentials are needed to access them fills up a big table in a wiki page already. Just keeping that page updated is a struggle. There are of course specific tools (some nicer, some not so) for each of the virtualization solutions, but I still have to find the One Tool To Rule Them All, One Tool To Find Them, One Tool To Bring Them All And In The Darkness Bind Them!

So right now, I'm more often than not, accessing all the different server consoles and trying to find the machine that I'm looking for. Also, while the VMware virtualization center is nice and easy to use, it does not show you the IP addresses of the machines. Logging in to each of them through the console is time-consuming. Using IP addresses as names helps though, so that is what I started doing (obviously only for machines with static IPs).

Getting IP addresses for Xen guests is easier because the configuration files actually contain the IP addresses. There is a way to get current IP addresses from VMware guests as well, but they have to be running and they have to have VMware tools installed. VM tools are a high hurdle, especially for non-mainstream operating systems where you have to download, compile and install the tools yourself. An alternative is to look into the ARP and forwarding tables of the upstream switch and search for the IP address based on the MAC address (which you can get e.g. through the VC client). You have to make sure though, that the machine is in the ARP table, so that creates a chicken-and-egg problem. If you know at least the subnet of the machine and that subnet is not too big (/24 works fine), you can fill the table by fping-ing the range.

Thursday, May 22, 2008

Rebooting virtual machines

Managing virtual machines is getting really easy these days with all the nice graphical user interfaces available. Although, unfortunately some (most notably the VI client) are only available for Windows right now and I learned the hard way that the nice "poweroff" buttons really do that: they cut the power to the virtual machine. That results in a lengthy disk check during the next startup and can result in lost files etc. Of course, we all know that you should always shutdown a machine and not just pull the plug, but logging in to every running VM and shutting it down cleanly is not fun if you have lots of them running, and for some of them, I did not even know the root/administrator password. So, wouldn't it be nice if the "poweroff" button would do the right thing? Seems the answer to this is to make sure the VMware tools are installed and running in the guests. The ESX server sends a message to the VMware toolbox running in the guest during power state changes (before poweroff or suspend, after startup or resume).

The trick to make clean shutdowns really working under Linux is to make sure that the toolbox is running as root or else the poweroff script won't have the permission to really shutdown the machine. However, running an unknown binary as root always makes me nervous - even more so if that binary is listening on the network ports. Thankfully, VMware open sourced their tools and recently moved from Subversion to a git repository, so I will spend one day of the upcoming memorial weekend to study the code before I install it on my virtual machines.

Thursday, May 8, 2008

Network Fencing in VMware Lab Manager

VMware Lab Manager has a nice feature called "Network Fencing". The background for fencing is that you often want to run multiple instances of the same configurations (i.e., a group of VMs). This could entail different test cycles for a product or recreating a customer scenario. Normally, the VMs in this setup have fixed IP addresses, and changing them for every instance is painful because these IP addresses are also part of configuration files, etc. One solution would be to set up every instance with it's own vswitch without an uplink. This nicely isolates the instances, but unfortunately also prevents access to them from the outside (e.g., from your desktop).

Network fencing solves that problem by deploying a virtual router (VR) that connects the vswitch to the outside world. The VR is automatically configured to NAT the internal IP addresses of the "fenced" VMs to unique external IP addresses. This gives you the best of both worlds: the VMs of the configuration can talk to each other using their internal (but not system-wide unique) addresses, and you can still access the VMs from the outside using their external addresses.

The current version of Manager has the limitation that it only creates one VR per instance. Therefore, all instance members have to reside on the same host (i.e., an instance can't be spread accross multiple hosts). It would be interesting to see if it's possible to overcome the limitation by using TBD Networks VirtualFirewall (which does NAT as well as VLANs) instead of the VR.

Wednesday, May 7, 2008

Accessing network configuration of ESX servers

One of my recent tasks involved reading and processing networking configurations from an ESX server. As a concrete example, consider that you want to list all the VLAN tags currently used. There are multiple ways to do that. The first choice is the handy console program "esxcfg-vswitch -l", which prints information about all vswitches including portgroups with their VLAN tags . This works nicely, but needs some parser/regular expression matching to extract the required information.

Another tool is"esxcfg-info", which prints lots of information about the ESX server. By default, that prints the whole configuration as a (very long) formatted text. Reading through that is possible, but not very pleasant. So the first step is to reduce the amount of data by using the "-n" option, which results in printing only the network part (other options are e.g. "-s" for storage related information. "esxcfg-info -h" prints a list of all available options). But even "esxcfg-info -n" still prints a vast amount of information compared with "esxcfg-vswitch -l", so why even consider using that? The answer is the "-F" option that can change the output to either XML or Perl. Unfortunately, the XML output is currently broken (see this thread for details), so that leaves us with the Perl output format. "esxcfg-info -n -F perl" prints the same information as "esxcfg-info -n", but as one big Perl datastructure. To use it, simply evaluate the result within your Perl script and then navigate through the big nested hashmap. No more parsing needed!

Granted, if the goal would have been really to just list all the VLAN tags, parsing the output of "esxcfg-vswitch -l" is still simpler than navigating though the data. However, if you need to answer more complex questions about the network, with a single line of Perl you get a pre-populated, nice datastructure that contains all there is to know.

There are of course more options (aren't there always?). Leaving the crazy idea of parsing the HTML output of the web interface aside, there are SDKs provided by VMware that allow to access the same information remotely. More about this next time...

Thursday, May 1, 2008

Port groups in VMware ESX

ESX servers have a whole virtual network within them: guest machines connect to virtual switches and uplinks from these switches to the outside. Although, one term that is not used outside of ESX are "port groups." After reading a bit about them and looking at the various tools the the ESX console provides, I think the best way for a network engineer to understand port groups is to see them as network hubs connected to a single vswitch port. This actually makes sense for multiple reasons:

  • All members of a port group share common attributes like a VLAN tag
  • All members of a port group can see all of the packets sent by other members of this port group
  • A port group is always connected to a single vswitch
Actually, it even makes sense to think of the VLAN tag as being applied to the vswitch port that is connected to the uplink of the virtual hub. Therefore, a vswitch with a portgroup "PG1" that has two members "VG1" and "VG2" would be built using a pswitch and a 3-port hub. The uplink of the hub is connected to a pswitch port. Applying a VLAN tag on that port group then corresponds to configuring the VLAN on the pswitch port.

Port groups in ESX are identified by their name, which must be unique within an ESX server. Having the same port group names in different ESX servers, however, makes a lot of sense, especially when moving guests around between them. More on this later.