Server Virtualization: Physical to Virtual Migration

Despite being around for quite some time now (VMWare released its first version of VMWare Workstation in 2001), server virtualization seems to be all the rage these days. It seems that not a day goes by without seeing virtualization mentioned in a newsletter, magazine or web site.

The first virtualization software product I used was indeed VMWare Workstation, some time around 2002. I used it mainly for testing and development, and the fact that it supported most major operating systems (e.g. Linux) was very helpful. A couple of years later we got our first dedicated server where we ran the VMWare GSX Server (now called VMWare Server). At that time I also evaluated Microsoft Virtual Server, but found it to be inferior from both a management and performance aspect, and it also didn’t support Non-Microsoft Operating Systems. Microsoft Virtual Server has improved since then however and is a viable alternative today.

As virtualization has become more and more important for our business, we have since migrated to the VMWare ESX Server, which supplies its own Linux-based operating system and thus offers a slightly better performance among other benefits. I know this is starting to sound a lot like a VMWare advertisement, so I’ll try to get to the point. Don’t get me wrong, we’ve definitely found problems in VMWare’s products over the years, but its proven to be a stable and reliable platform overall. For example, installing and upgrading ESX Server has thus far not caused us any problems or difficulties. Read Michael’s review of VMWare Server 2 however, he doesn’t appear to have liked the latest release of VMWare Server too much.

One thing that is often overlooked in my opinion however is the ability to migrate physical machines to virtual machines using the free VMWare Converter. Using the converter, you can migrate a physical into a virtual machine, which can run on any of VMWare’s products. How many IT departments are running servers that are barely used anymore, yet cannot be turned off because a handful of users are occasionally accessing the server to access old data? Those machines usually don’t require fast hardware and might even be running on systems that are no longer supported by the manufacturer. Systems like these are ideal candidates for virtualization.

Here are just some of the benefits you get by moving a legacy or underutilized machine to a virtual server:

  1. If you retire (=recycle) the original hardware, you save money on power by requiring less A/C and power consumption in your data center.
  2. You can cancel any maintenance agreements on the hardware if it is retired.
  3. You might speed up the application if the server hosting the virtual machines is more powerful than the original box the software was running on.
  4. If the migration fails or causes unexpected problems, then you have nothing to fear since the original server won’t be modified.
  5. The migration is done remotely, so you don’t even have to physically log on to the computer being migrated.
  6. Virtual machines can be suspended, thus saving RAM on the host machine while suspended.

We performed a similar migration a couple of months ago when we switched to a new support ticketing system. Since we didn’t migrate any data from the previous system to our new system, we wanted to have the ability to login and search tickets periodically – so shutting of and formatting the server was not an option. Of course, keeping the server running 24/7 seemed like waste as well – especially when we wouldn’t need to access the machine more than once a week. Hence, a migration to a virtual machines seemed like the best option and the server lives on ESX Server since. The physical server was initially just turned off for a few weeks, but has since found new use for a different project.

So if you’re planning to move to virtualization or have already begun, don’t just think about new machines but also consider “virtualizing” existing physical machines.

And remember that machines running inside VMWare or Microsoft Virtual Server can be monitored by EventSentry just like a physical machine can. 🙂

1983: Coleco Adam

If you’re past 30 then you’ve probably heard about the Commodore C64, the Amiga, the IBM XT and so forth. Well, another lesser known computer that was released around the same time IBM released their “IBM Personal Computer XT” was the Coleco Adam.

Why is this funny?

To find out why this is beyond funny, you will have to read the Wikipedia article about the Coleco Adam, in particular the “Problems” section. We found the 1st and 3rd problem most amusing.

The tale of the dying capacitors

We were recently helping out a company in the same building as ours with a server issue they were having. They noticed it had rebooted out of nowhere a couple times within a week. None of the event logs showed anything, no crash dump file, pretty much no trace software wise. Luckily EventSentry sent us a 6009 event from the system log, letting us know that the server had rebooted. Knowing when this event occurs is great, especially on nights or weekends when users may not notice.

I was 99% sure it was a hardware issue since it was out of the blue with no recent hardware or software changes. We ran some basic diagnostics, including the ones from Dell. Everything kept coming back clean. After contacting Dell, they recommended re-seating all the RAM, the CPU’s, VRM’s etc. They have had problems in the past with CPU’s coming out of the sockets from the heatsink compound drying up and causing the same issue. I should have instantly noticed the problem then, but we will get to that.

None of this was helping and the reboots were becoming more and more frequent. The server was not under warranty so Dell couldn’t help out much more than that. I was actually amazed how helpful they were at all since it wasn’t. Their last suggestion was to start disabling hardware until we got to the root of the problem.

I went into device manager and disabled anything I could. The system became much more stable, although useless without the devices. I started enabling hardware again one at a time. After I enabled the built in NIC, the computer crashed. We threw in a PCI network card, and disabled the onboard NIC in the BIOS. The server booted up and all was great. For about 3 days…

The crashes started again, this time Windows couldn’t even finish loading before it would reboot. We opened the server again and this time I instantly saw what was wrong. I had seen this in a workstation before so I couldn’t believe I missed it. Almost all the capacitors on the board were bulging at the top.

This has become so common lately, I highly recommend looking for that right away on any critical server you have. There were even a few motherboard makers sued over this.

Some makers, like Gigabyte, are using solid state capacitors instead of the cheaper, more common electrolytic ones for some of their boards. I’m sure it costs them a little more, but for reliability I think it is completely worth it.

We ordered a new motherboard for the server, and sure enough it had a completely different brand of capacitors. Once we swapped it out and booted it up, the server has been running smooth. An extra $5 for some quality capacitors would have probably prevented the whole situation.

Here are some pictures of what to look for:

Taken from http://img.photobucket.com/albums/v711/whurd/Bad.jpg

Bad.jpg

The tops should be completely flat. If there is any bulging at all, it is most likely on its way out. The picture below shows leaking capacitors, also not a good thing.

Taken from http://macmedics.com/images/imac-logicaboard-with-leaking-capacitors.jpg

imac-logicaboard-with-leaking-capacitors.jpg

So, next time one of your servers starts acting up out of the blue, without any recent hardware or software changes, take a close look at those capacitors 🙂

Vista/Win2k8 Event Log Changes #2: .evtx Format

In my previous post I already mentioned that Vista and Windows Server 2008 introduced many changes to the Windows Event Log, and the event log backup files with the familiar .evt extension are no exception. If you backup event logs in the .evt format and plan on moving to Vista and/or Windows 2008 then you should make yourself familiar with the basic changes and the “new” EVTX format.

The new event viewer on Vista and Win2k8 supports exporting an event log in either the EVTX, XML, TXT or CSV format. If you select the EVTX format then you will not be able to import/load this file on a Pre-Vista/Win2k8 machine, the old event viewer does not understand the new EVTX format.

So far so good, this is to be expected. Like I mentioned in my previous post, Vista and later also still provide the legacy event log APIs so that applications that were developed for Windows 2003 and earlier are still able to access and backup the event log. The next paragraphs get a bit confusing, so only read on if you are interested in more details ;-).

Windows 2003 and earlier provide two API calls to backup and/or clear the event log: ClearEventLog() and BackupEventLog(). If you use any of these functions to backup an event log on Vista and later, then you are still able to create a .evt file. I would expect that this file could be opened on any computer that understands the EVT format, however this is not the case. Even when you export an event log using the aforementioned legacy API calls, the resulting file can still only be opened with the new event viewer on Vista or later. I will refer to event log backup files that were created on Vista and later with the legacy API calls as the new EVT format from now on.

This becomes more clear when you compare the contents of the new EVT format with the EVTX format. While the two files are different for the exact same event log backup – the overall structure are quite similar. You can also rename a file with the new EVT format to the EVTX extension and the new event viewer will open this file correctly. The format of an EVT file on the other hand is very different to that of an EVTX file.

So the bottom line is that you can, in theory, create three types of event log backup files:

1. EVT Format
These files are created on Windows Server 2003 and earlier. Vista and later refer to these files as “Classic Event Log Files”, and you can open and read EVT files on any NT-based OS including Vista and later.

2. EVT Format (when created on Vista and later)
These files can only be created on Vista and later by using the legacy API calls ClearEventLog() and BackupEventLog(). It is important to point out that even though these files have the .evt extension, they unfortunately cannot be read on Windows Server 2003 or earlier and the format of this file is similar to the new EVTX format.

3. EVTX Format
These files can only be created and viewed on Vista and later.

Note on EventSentry: If you are backing up event logs with EventSentry v2.72, v2.80 or v2.81 on Vista or Windows 2008, then EventSentry will create EVT files (#2) that can only be viewed on Vista or later. We are switching to the native EVTX format for event log backups with the upcoming v2.90 release of EventSentry.

Plink – or – Issuing SSH Commands on Demand

We have a Linux server running Samba on our network which we use mostly to store ISO images which can be mounted and served on-demand through Samba.

I was looking for a way to issue commands on the Linux machine through SSH yesterday when the Winbind daemon (which is part of Samba and ensures that Linux users are authenticated against our domain controller) on the machine was acting up again. Every time we reboot our Windows 2003 domain controller (which is fortunately not very often but security updates usually require this), the Winbind daemon starts logging a particular error message every 5 minutes to the Syslog daemon which in turn is forwarded to EventSentry by the Linux Syslog daemon.

Since warnings and errors are forwarded to me via email, getting this particular error message every 5 minutes starts getting old after about half an hour – especially when I’m out of the office and get them on my phone. Logging on to the Linux box and restarting the Winbind daemon however solves the problem – and this is what I have been doing for a long time now. Well, until recently.

I thought to myself that if there were a utility that could issue commands through SSH from a Windows box, then I could configure EventSentry to automatically restart the Winbind daemon as soon as the Syslog packet containing the error message is received.

I have been using the free SSH-Client PuTTY for quite some time now, but didn’t know that it “included” Plink, a SSH utility that allows you to issue commands through the SSH tunnel and even see the output from the remote command. Perfect!

Setting up EventSentry to automatically restart windbind using plink is a straight-forward 3-step process, assuming you already have the Syslog Daemon in EventSentry up and running:

1. Create a batch file that issues the command you need to run. The batch file I created looks like this:

C:\Batch\plink.exe
root@mylinuxhost -pw SecretPass “/etc/init.d/winbind restart”

Make sure you run the script once from the command-line to ensure that it is working.

2. In EventSentry, create a process action that references the above script. You do this by right-clicking the Actions container and selecting Add Action. Then just select the Process tab and point to the batch file you just created.

3. Under the Event Log Packages container, add a filter in an existing package or create a new package. The filter will match the Syslog event that you want to trigger our script. The event source for that filter will always be Application, and the event id should be 9999. Since we don’t want the process to be triggered every time a Syslog event comes in, we will also specify the text from the Syslog event – *winbindd*: cli_nt_setup_creds: request challenge failed* in my case. Then just select the process action you created in step 2 and you are all set.

There are a couple of things I need to point out of course. First, make sure that the batch file is secure as it contains the username and password to your Linux host – the appropriate NTFS permission might be enough in most cases. If you cannot keep it secure then you should create a user on the Linux box that is just used for the purpose of issuing particular commands through SSH. Second, make sure that plink.exe is present on the host where the EventSentry Syslog daemon is running, as the file will be executed on that host.

Plink of course is a great utility for automation in any case, regardless of whether you use EventSentry to consolidate Syslog messages. I hope this helps automate some tasks in Windows/Linux environments.