The tale of the dying capacitors

We were recently helping out a company in the same building as ours with a server issue they were having. They noticed it had rebooted out of nowhere a couple times within a week. None of the event logs showed anything, no crash dump file, pretty much no trace software wise. Luckily EventSentry sent us a 6009 event from the system log, letting us know that the server had rebooted. Knowing when this event occurs is great, especially on nights or weekends when users may not notice.

I was 99% sure it was a hardware issue since it was out of the blue with no recent hardware or software changes. We ran some basic diagnostics, including the ones from Dell. Everything kept coming back clean. After contacting Dell, they recommended re-seating all the RAM, the CPU’s, VRM’s etc. They have had problems in the past with CPU’s coming out of the sockets from the heatsink compound drying up and causing the same issue. I should have instantly noticed the problem then, but we will get to that.

None of this was helping and the reboots were becoming more and more frequent. The server was not under warranty so Dell couldn’t help out much more than that. I was actually amazed how helpful they were at all since it wasn’t. Their last suggestion was to start disabling hardware until we got to the root of the problem.

I went into device manager and disabled anything I could. The system became much more stable, although useless without the devices. I started enabling hardware again one at a time. After I enabled the built in NIC, the computer crashed. We threw in a PCI network card, and disabled the onboard NIC in the BIOS. The server booted up and all was great. For about 3 days…

The crashes started again, this time Windows couldn’t even finish loading before it would reboot. We opened the server again and this time I instantly saw what was wrong. I had seen this in a workstation before so I couldn’t believe I missed it. Almost all the capacitors on the board were bulging at the top.

This has become so common lately, I highly recommend looking for that right away on any critical server you have. There were even a few motherboard makers sued over this.

Some makers, like Gigabyte, are using solid state capacitors instead of the cheaper, more common electrolytic ones for some of their boards. I’m sure it costs them a little more, but for reliability I think it is completely worth it.

We ordered a new motherboard for the server, and sure enough it had a completely different brand of capacitors. Once we swapped it out and booted it up, the server has been running smooth. An extra $5 for some quality capacitors would have probably prevented the whole situation.

Here are some pictures of what to look for:

Taken from http://img.photobucket.com/albums/v711/whurd/Bad.jpg

Bad.jpg

The tops should be completely flat. If there is any bulging at all, it is most likely on its way out. The picture below shows leaking capacitors, also not a good thing.

Taken from http://macmedics.com/images/imac-logicaboard-with-leaking-capacitors.jpg

imac-logicaboard-with-leaking-capacitors.jpg

So, next time one of your servers starts acting up out of the blue, without any recent hardware or software changes, take a close look at those capacitors 🙂

Vista/Win2k8 Event Log Changes #2: .evtx Format

In my previous post I already mentioned that Vista and Windows Server 2008 introduced many changes to the Windows Event Log, and the event log backup files with the familiar .evt extension are no exception. If you backup event logs in the .evt format and plan on moving to Vista and/or Windows 2008 then you should make yourself familiar with the basic changes and the “new” EVTX format.

The new event viewer on Vista and Win2k8 supports exporting an event log in either the EVTX, XML, TXT or CSV format. If you select the EVTX format then you will not be able to import/load this file on a Pre-Vista/Win2k8 machine, the old event viewer does not understand the new EVTX format.

So far so good, this is to be expected. Like I mentioned in my previous post, Vista and later also still provide the legacy event log APIs so that applications that were developed for Windows 2003 and earlier are still able to access and backup the event log. The next paragraphs get a bit confusing, so only read on if you are interested in more details ;-).

Windows 2003 and earlier provide two API calls to backup and/or clear the event log: ClearEventLog() and BackupEventLog(). If you use any of these functions to backup an event log on Vista and later, then you are still able to create a .evt file. I would expect that this file could be opened on any computer that understands the EVT format, however this is not the case. Even when you export an event log using the aforementioned legacy API calls, the resulting file can still only be opened with the new event viewer on Vista or later. I will refer to event log backup files that were created on Vista and later with the legacy API calls as the new EVT format from now on.

This becomes more clear when you compare the contents of the new EVT format with the EVTX format. While the two files are different for the exact same event log backup – the overall structure are quite similar. You can also rename a file with the new EVT format to the EVTX extension and the new event viewer will open this file correctly. The format of an EVT file on the other hand is very different to that of an EVTX file.

So the bottom line is that you can, in theory, create three types of event log backup files:

1. EVT Format
These files are created on Windows Server 2003 and earlier. Vista and later refer to these files as “Classic Event Log Files”, and you can open and read EVT files on any NT-based OS including Vista and later.

2. EVT Format (when created on Vista and later)
These files can only be created on Vista and later by using the legacy API calls ClearEventLog() and BackupEventLog(). It is important to point out that even though these files have the .evt extension, they unfortunately cannot be read on Windows Server 2003 or earlier and the format of this file is similar to the new EVTX format.

3. EVTX Format
These files can only be created and viewed on Vista and later.

Note on EventSentry: If you are backing up event logs with EventSentry v2.72, v2.80 or v2.81 on Vista or Windows 2008, then EventSentry will create EVT files (#2) that can only be viewed on Vista or later. We are switching to the native EVTX format for event log backups with the upcoming v2.90 release of EventSentry.

Plink – or – Issuing SSH Commands on Demand

We have a Linux server running Samba on our network which we use mostly to store ISO images which can be mounted and served on-demand through Samba.

I was looking for a way to issue commands on the Linux machine through SSH yesterday when the Winbind daemon (which is part of Samba and ensures that Linux users are authenticated against our domain controller) on the machine was acting up again. Every time we reboot our Windows 2003 domain controller (which is fortunately not very often but security updates usually require this), the Winbind daemon starts logging a particular error message every 5 minutes to the Syslog daemon which in turn is forwarded to EventSentry by the Linux Syslog daemon.

Since warnings and errors are forwarded to me via email, getting this particular error message every 5 minutes starts getting old after about half an hour – especially when I’m out of the office and get them on my phone. Logging on to the Linux box and restarting the Winbind daemon however solves the problem – and this is what I have been doing for a long time now. Well, until recently.

I thought to myself that if there were a utility that could issue commands through SSH from a Windows box, then I could configure EventSentry to automatically restart the Winbind daemon as soon as the Syslog packet containing the error message is received.

I have been using the free SSH-Client PuTTY for quite some time now, but didn’t know that it “included” Plink, a SSH utility that allows you to issue commands through the SSH tunnel and even see the output from the remote command. Perfect!

Setting up EventSentry to automatically restart windbind using plink is a straight-forward 3-step process, assuming you already have the Syslog Daemon in EventSentry up and running:

1. Create a batch file that issues the command you need to run. The batch file I created looks like this:

C:\Batch\plink.exe
root@mylinuxhost -pw SecretPass “/etc/init.d/winbind restart”

Make sure you run the script once from the command-line to ensure that it is working.

2. In EventSentry, create a process action that references the above script. You do this by right-clicking the Actions container and selecting Add Action. Then just select the Process tab and point to the batch file you just created.

3. Under the Event Log Packages container, add a filter in an existing package or create a new package. The filter will match the Syslog event that you want to trigger our script. The event source for that filter will always be Application, and the event id should be 9999. Since we don’t want the process to be triggered every time a Syslog event comes in, we will also specify the text from the Syslog event – *winbindd*: cli_nt_setup_creds: request challenge failed* in my case. Then just select the process action you created in step 2 and you are all set.

There are a couple of things I need to point out of course. First, make sure that the batch file is secure as it contains the username and password to your Linux host – the appropriate NTFS permission might be enough in most cases. If you cannot keep it secure then you should create a user on the Linux box that is just used for the purpose of issuing particular commands through SSH. Second, make sure that plink.exe is present on the host where the EventSentry Syslog daemon is running, as the file will be executed on that host.

Plink of course is a great utility for automation in any case, regardless of whether you use EventSentry to consolidate Syslog messages. I hope this helps automate some tasks in Windows/Linux environments.

Vista Event Log Changes

As you may already know, Microsoft significantly changed the Windows event log in Windows Vista. I always found the Windows event log to be a very well designed logging infrastructure, at least compared to the logging facilities that are available in other major network operating systems. The Windows event log however hasn’t changed much since it was originally included in Windows NT. It has actually been 13 1/2 years since the core event log service and event viewer underwent a major improvement – other than updating security event ids to accommodate new events related to various security components – Windows NT 3.1 was first released in 1993. I have never actually seen the event viewer in Windows NT 3.1, but Windows NT 3.51’s event viewer for example was not too much different than Windows 2003’s.

So it appears that Microsoft finally realized that a good system can be improved (especially with compliance becoming more and more important over the last years), and so the event log subsystem, including the event viewer, appear to have been rewritten completely in Windows Vista and of course the upcoming Windows Server 2008.

As we are continuing to improve Vista support in EventSentry,  I will cover the changes that I believe are relevant to IT professionals that need to manage their event logs. Just as a side note, EventSentry already monitors the Vista event log since the end of 2006, however ES 2.81 currently accesses the Vista event log through the legacy API that Vista (fortunately) still provides to pre-Vista event log software.

Before I dig into the technical details about the new event log, I need to point out that Microsoft made a large amount of changes to the event log, and didn’t leave a stone unturned. While the overall logic is the same (you have event logs and events 🙂 ), a lot has changed under the hood.

While this will affect IT professionals that need to manage event logs (since they need to make sure that their software works with Vista & Windows 2008), it will affect software developers even more. While I like a lot of the changes that were introduced, and we all know improvements were overdue for a long time, I personally feel that the new event log has been over-engineered. A lot of the features that were added are a bit of overkill and accessing, especially writing to, the event log is significantly more involved with the new version (at least if you take advantage of the new XML functionality). I think it would have been better to gradually introduce improvements over the last 10 years, rather than ignoring the event log for a long time and then introduce a myriad of new functionality to it – some of which has yet to make sense (I will prove my point below with future posts).

In any case, I will get off my soapbox now and focus on the relevant changes that were introduced.

Keywords
One of the new fields added to the properties of an event is called “Keywords”. I find the most interesting thing about this field that security events now have their severity stored in the Keywords field instead of the Type field (Type was renamed to Level in Vista and later). As you know, events in Windows Server 2003 and earlier used to have their severity stored in the Type property of an event (Information, Warning, Error, Audit Success, Audit Failure), but in Vista and later the severity of security events (Audit Success, Audit Failure) have been moved to the new Keywords field.

This of course leaves the question what the Level is set to for audit events. Well, the answer is Information. All Audit Success and Audit Failure events have their main severity stored in the Keywords field, whereas the Level field is always set to Information. An Audit Failure event that is informational, yeah – that makes a lot of sense!

So in theory it would be possible to have an Audit Failure event logged with a level of Information/Warning/Error, but I am not sure how useful this would be. After all, an Audit Failure is an Audit Failure.

Why was this changed? I am not sure. After asking the head of the Windows Auditing Team at Microsoft I received an explanation that, unfortunately, failed to eliminate my confusion. The original Type field could obviously accommodate the two attributes (since had always been there), and there would have been room for even more. There was some consensus between the two of us that the keywords field, at least in combination with the security events, was maybe not implemented in the best way.

In EventSentry we currently ignore the Keywords field and merge it with the original Type field, so that you can search across Pre-Vista machines and Vista machines using the same field name.

So this is it for now, we will cover a lot more about the new Windows Event Log here in the future. As always, let us know if you have any questions or feedback.

Who Is In My Server Room?

As some of you already know, EventSentry allows you to use different environment sensors to be alerted about changes in your server room. One of these happens to be a motion sensor (scroll down).

It is great to be alerted when somebody is moving around in there, but it would also be helpful to know who it is. We picked up an Axis 207 network enabled camera from Axis Communications so we can take a peak in there though any available web browser. This works great as long as we are near a computer at the time we get the motion alerts from EventSentry, but not very useful if we aren’t.

Luckily, our Axis camera has a pretty good API that you can access. It has the ability to grab a .jpg image by going to a URL (http://cameraIP/jpg/image.jpg). I needed a way to attach this .jpg to an email so that not only am I alerted, but I also have an image of who or what caused it.

There may be other cameras out there that can do this as well. If you know of one please post it in the comments section.

I came up with a batch file that uses some free utilities to accomplish this task. For good measure, I also decided to allow you to grab a series of pictures, put them to a web site directory, thumbnail them, and finally create an HTML page that displays them.

Building maintenance entering a server room at night. Image quality depends on lighting, and camera quality.

This could probably have been done easier using Perl or another scripting language, but I had already started with a batch file and wanted to just finish it! Feel free to come up with a better way.

The tools needed are included in this zip file:

  • gethttp.exe – Taken from our free EventSentry SysAdmin Tools, used to grab the image from the camera
  • sleep.exe – Also taken from EventSentry SysAdmin Tools. Allows you to put pauses in your script
  • blat.exe – Blat is a great command line utility that allows you to send emails
  • printf.exe – Taken from the GNU tools for Windows. A lot more flexibility than using ECHO
  • convert.exe – Command line utility from ImageMagick. Used to create the thumbnails.

The zip file also contains the actual script used named “getimages.cmd”. You will need to change some of the settings inside of it to get started. Most are self-explanatory and include:

  • cameraIP – IP address of the camera
  • binPath – Path to the needed utilities above
  • imagePath – Where you want the images stored
  • numImages – The number of images you want to capture each time
  • timePause – Miliseconds to wait between images
  • netLocation – URL to your web server hosting the images
  • eMail – Email address you want the alerts sent to. Comma separate for multiple people.
  • eSender – Address email comes from
  • subj – The subject for the email
  • server – Your SMTP server

Now to make it run when EventSentry detects motion. To do this, create a new action in EventSentry. I named mine “Motion Alert”. Go to the “Process” tab at the top and put in the path to the “getimage.cmd”.

Next, we will need an event filter to trigger the action. Here are the settings you need:

  • Event Log: Application
  • Type:  Error
  • Source: EventSentry
  • Category: Environment Sensors
  • Event ID: 10912

That is it, from now on you should know who is setting off your motion sensor.

You can download the entire package from here.

If you have any comments or suggestions, we would love to hear them.