Creating your very own event message DLL

If you’ve ever wrote code to log to the Windows event log before (e.g. through Perl, Python, …), then you might have run into a similar problem that I described in an earlier post: Either the events don’t look correctly in the event log, you are restricted to a small range of event ids (as is the case with eventcreate.exe) or you cannot utilize insertion strings.

In this blog post I’ll be showing you how to build a custom event message DLL, and we’ll go about from the beginning to the end. We’ll start with creating the DLL using Visual Studio (Express) and finish up with some example scripts, including Perl of course, to utilize the DLL and log elegantly to the event log.

Let’s say you are running custom scripts on a regular basis in your network – maybe with Perl, Python, Ruby etc. Your tasks, binary as they are, usually do one of two things: They run successfully, or they fail. To make troubleshooting easier, you want to log any results to the event log – in a clean manner. Maybe you even have sysadmins in other countries and want to give them the ability to translate standard error messages. Logging to the event log has a number of benefits: It gives you a centralized record of your tasks, allows for translation, and gives you the ability to respond to errors immediately (well, I’m of course assuming you are using an event log monitoring solution such as EventSentry). Sounds interesting? Read on!

Yes, you can do all this, and impress your peers, by creating your own event message file. And what’s even better, is that you can do so using all free tools. Once you have your very own event message file, you can utilize it from any application that logs to the event log, be it a PowerShell/perl/python/… script or a C/C++/C#/… application.

To create an event message file, you need two applications:

The reason you need the platform SDK, is because Visual Studio Express does not ship with the Message Compiler, mc.exe, for some reason. The message file compiler is essential, as without it there will be no event message file unfortunately. When installing the platform SDK, you can deselect all options except for “Developer Tools -> Windows Development Tools -> Win32 Development Tools” if you want to conserve space. This is the only essential component of the SDK that’s needed.

An event message file is essentially a specific type of resource that can be embedded in either a DLL file or executable. In EventSentry, we originally embedded the message file resources in a separate DLL, but eventually moved it into the executable, mostly for cleaner and easier deployment. We’ll probably go back to a separate message DLL again in the future, mostly because processes (e.g. the Windows Event Viewer) can lock the event message file (the executable in our case), making it difficult to update the file.

Since embedding an event message file in a DLL is more flexible and significantly easier to accomplish, I’ll be covering this scenario here. The DLL won’t actually contain any executable code, it will simply serve as a container for the event definitions that will be stored inside the .dll file. While it may sound a little bit involved to build a DLL just for the purpose of having an event message file (especially to non-developers), you will see that it is actually surprisingly easy. There is absolutely no C/C++ coding required, and I also made a sample project available for download, which has everything setup and ready to go.

In a nutshell, the basic steps of creating an event message file are as follows:

1. Create a message file (e.g. messagefile.mc)
2. Convert the message file into a DLL, using mc.exe, rc.exe and link.exe

Once we have the message file, we will also need to register the event message file in the registry, and associate it with an event source. Keep in mind that the event source is not hard-coded into the message file itself, and in theory a single event message file could be associated with multiple event sources (as is the case with many event sources from Windows).

So let’s start by creating a working folder for the project, and I will call it “myapp_msgfile”. Inside that directory we’ll create the message file, let’s call it myapp_msgfile.mc. This file is a simple text file, and you can edit it with your favorite text editor (such as Ultraedit, Notepad2 or Notepad++).

The file with the .mc extension is the main message file that we’ll be editing – here we define our event ids, categories and so forth. Below is an example, based on the scenario from before. Explanations are shown inline.


MessageIdTypedef=WORD

LanguageNames=(
English=0x409:MSG00409
German=0x407:MSG00407
)

Here we define which languages we support, and by which files these languages will be backed. You will have to look up the language id for other languages if you plan on supporting more, and you can remove German if you only plan on supporting English.


MessageId=1
SymbolicName=MYTOOL_CATEGORY_GENERAL
Language=English
Tasks
.
Language=German
Jobs
.

Our first event id, #1, will be used for categories. Categories work in the exact same way as event ids. When we log an event to the event log and want to include a category, then we only log the number – 1 in this case.


MessageId=100
SymbolicName=TASK_OK
Language=English
Task %1 (%2) completed successfully.
.
Language=German
Job %1 (%2) war erfolgreich.
.

This is the first event description. The “MessageId” field specifies the event id, and the symbolic name is a descriptive and unique name for the event. The language specifies one of the supported languages, followed by the event message text. You end the event description with a single period – that period has to be the only character per line.


MessageId=101
SymbolicName=TASK_ERROR
Language=English
Task %1 (%2) failed to complete due to error “%3”.
.
Language=German
Job %1 (%2) konnte wegen Fehler “%3” nicht abgeschlossen werden.
.

MessageId=102
SymbolicName=TASK_INFO
Language=English
Task Information: %1
.
Language=German
Job Information: %1
.

Since we’re trying to create events for “custom task engine”, we need both success and failure events here. And voila, our event message file now has events 100 – 102, plus an id for a category.

So now that we have our events defined, we need to convert that into a DLL. The first step now is to use the message compiler, mc.exe, to create a .rc file as well as the .bin files. The message compiler will create a .bin file for every language that is defined in the mc file. Open the “Visual Studio Command Prompt (2010)” in order for the following commands to work:


mc.exe myapp_msgfile.mc

will create (for the .mc file depicted above):


myapp_msgfile.rc
msg00407.bin
msg00409.bin

With those files created, we can now create a .res (resource) file with the resource compiler rc.exe:


rc.exe /r myapp_msgfile.rc

which will create the


myapp_msgfile.res

file. The “/r” option instructs the resource compile to emit a .res file. Now we’re almost done, we’re going to let the linker do the rest of the work for us:


link -dll -noentry -out:myapp_msgfile.dll myapp_msgfile.res

The myapp_msgfile.res is the only input file to the linker, normally one would supply object (.obj) files to the linker to create a binary file. The “-noentry” option tells the linker that the DLL does not have an entry point, meaning that we do not need to supply a DllMain() function – thus the linker is satisfied even without any object files. This is of course desired, since we’re not looking to create a DLL that has any code or logic in it.

After running link.exe, we’ll end up with the long awaited myapp_msgfile.dll file.

The end. Well, almost. Our message file is at this point just a lone accumulation of zeros and ones, so we need to tell Windows that this is actually a message file for a particular event log and source. That’s done through the registry, as follows:

Open the registry editor regedit.exe. Be extremely careful here, the registry editor is a powerful tool, and needs to be used responsibly :-).

All event message files are registered under the following key:


HKLM\System\CurrentControlSet\Services\eventlog

Under this key, you will find a key for every event log as well as subkeys for every registered event source. So in essence, the path to an event source looks like this:


HKLM\System\CurrentControlSet\Services\eventlog\EVENTLOG\EVENTSOURCE

I’m going to assume here that we are going to be logging to the application event log, so we’d need to create the following key:


HKLM\System\CurrentControlSet\Services\eventlog\Application\MyApp

In this key, we need to following values:


TypesSupported (REG_DWORD)
EventMessageFile (REG_EXPAND_SZ)

TypesSupported is usually 7, indicating that the application will log either Information, Warning or Error events (you get 7 if you OR 1[error], 2[warning] and 4[information] together).

EventMessageFile is the path to your message DLL. Since the type is REG_EXPAND_SZ, the path may contain environment variables.

If you plan on utilizing categories as well, which I highly recommend (and for which our message file is already setup), then you need two additional values:


CategoryCount (REG_DWORD)
CategoryMessageFile (REG_EXPAND_SZ)

CategoryCount simply contains the total number of categories in your message file (1, in our case), and the CategoryMessageFile points to our message DLL. Make sure that your message file does not contain any sequence gaps, so if your CategoryCount is set to 10, then you need to have an entry for every id from 1 to 10 in the message file.

We could create separate message files for messages and categories, but that would be overkill for a small project like this.

Now that we have that fancy message DLL ready to go, we need to start logging. Below are some examples of how you can log to the event log with a scripting language. I’ll be covering Perl, Kix, and Python. Me being an old Perl fan and veteran, I’ll cover that first.

PERL
The nice thing about Perl is that you can take full advantage of insertion strings, so it can support event definitions containing more than one insertion string.


use strict;
use Win32::EventLog;


# Call this function to log an event

sub logMessage
{
my ($eventID, $eventType, @eventDetails) = @_;

my $evtHandle = Win32::EventLog->new(“Your Software Application”);

my %eventProperties;

   # Category is optional, specify only if message file contains entries for categories

$eventProperties{Category}      = 0;
$eventProperties{EventID}       = $eventID;
$eventProperties{EventType}     = $eventType;
$eventProperties{Strings}       = join("\0", @eventDetails);

$evtHandle->Report(\%eventProperties);

$evtHandle->Close;
}


# This is what you would use in your scripts to log to the event log. The insertion strings
# are passed as an array, so even if you only have one string, you would need to pass it
# within brackets (“This is my message”) as the last parameter

logMessage(100, EVENTLOG_INFORMATION_TYPE, (“Database Backup”, “Monitoring Database”, “Complete”));
logMessage(102, EVENTLOG_INFORMATION_TYPE, (“Step 1/3 Complete”));


PYTHON

Python supports event logging very well too, including multiple insertion strings. See the sample code below:


import win32evtlogutil
import win32evtlog


# Here we define our event source and category, which we consider static throughout
# the application. You can change this if the category is different

eventDetails = {‘Source’: ‘MyApp’,    # this is id from the message file
‘Category’: 1}        # which was set aside for the category


# Call this function to log an event

def logMessage(eventID, eventType, message, eventDetails):
if type(message) == type(str()):
message = (message,)
win32evtlogutil.ReportEvent(eventDetails[‘Source’], eventID, eventDetails[‘Category’], eventType, tuple(message))

logMessage(100, win32evtlog.EVENTLOG_INFORMATION_TYPE, (“Database Backup”, “Monitoring Database”), eventDetails)
logMessage(102, win32evtlog.EVENTLOG_INFORMATION_TYPE, (“Step 1/3 complete”), eventDetails)

KIXTART
The pro: Logging to the event log using KiXtart is so easy it’s almost scary. The con: It only supports message files that use one insertion string.


LOGEVENT(4, 102, "Database Backup", "", "MyApp")

Curiosity Kills the Cat

25 years ago, on July 24th 1985, the Amiga 1000 was introduced in New York City (check out the ad). Coincidentally, the Amiga 500 was my first computer and I loved playing games on the Rock Lobster – despite the 7.15909 MHz processor. Well, those were the good old days, the days before mainstream email, the days before spam. Or were they? Believe it or not, in 1985 it had already been 7 years since the first spam email was sent by Gary Thuerk over the ARPAnet.

amiga_1000.jpg

I don’t know about you, but 32 years later I still get spam delivered to my inbox on a daily basis, and that’s despite having 2-3 spam filters in place. What’s more, I still get legitimate email caught by the spam filter, mostly to the dismay of the sender.

Now, of course WE all know not to open spam – or to even look at it – as it will potentially confirm receipt (if you display images from non-trusted sources) and could also trigger malware (again depending on your email reader’s configuration).

But, we’ve all seen spam emails and I can’t help but wonder who actually reads these emails (for purposes other than to get a chuckle), much less opens them! Let’s not even think about who opens attachments or clicks links (yikes!) from spam emails.

spam_adjusted.jpg

The Facts

So WHO are those people opening, clicking spam? Well, turns out that the MAWWG, the Messaging Anti-Abuse Working Group determines exactly that (and presumably other things too) – every year. Better yet, they publish that information for our enjoyment.

It’s been a few months since the latest findings were published, but I’d consider them relevant today nevertheless (and a year from now for that matter).

In a nutshell, the group surveyed the behavior of consumers both in North America and Europe, and published key findings in regards to awareness, consumer confidence and so forth.

Before I give the link to the full PDF (see the Resources section below); here are what I think are some of the most interesting facts:

  • Half of all users in North America and Europe have “confessed” to opening or accessing spam. 46% of those who opened spam, did so intentionally to unsubscribe or out of some untameable sense of curiosity. Some were even interested in the products “advertised” to them!Bottom Line: 1 out of 4 people open spam emails because they want to know more, or want to unsubscribe.
  • In more detail, 19% of all users surveyed either clicked on a link from an email (11%) or opened an attachment from an email (8%) that they themselves suspected to be spam. I found that to be one of the most revealing numbers in the report.
  • Young users (under 35) consider themselves more experienced, yet at the same time engage in more risky behavior than other age groups. In Germany, 33% of all users consider themselves to be experts. Compare that to France, where only 8% of all users think they are pros.
  • Less than half of users think that stopping spam or viruses is their responsibility. Instead, they feel that the responsibility lies mainly with the ISP and A/V companies. 48% of all respondents do realize that it is their responsibility. The report doesn’t state whether this particular question, which lists 10 choices, was a multiple choice question.
  • When asked about bots, 84% of users were familiar with the possibility that software, say a virus, can control their computer. At the same time, only 47% were familiar with the terms “bot” or “botnet”.
  • On the upside, 94% of all users are running A/V software that is up-to-date, which is a comforting fact. I can only imagine that the remaining 6%, given Apple’s market share, account for most of the rest.My opinion: OS X users are probably still oblivious and don’t see the need to install A/V or any other type of security software on their computers. Still, some PC users apparently still don’t install AntiVirus/AntiMalware on their computers, despite many free options being available today.

Wow, that’s a lot of bad news to digest. So if I may summarize – the reason why we keep getting spam in our inboxes, is because every 5th person with a computer clicks on links or opens attachments (ah!) from spam emails, and because 6% of all users with a computer don’t run security software. Given the amount of people that dwell in the western hemisphere, that amounts to a lot of people.

Well, at least I know now why I keep getting those nuisance emails in my inbox. But somehow I don’t feel any better about them.

Training Day

I think what this report shows us the importance of user education. While people are apparently aware of spam, it doesn’t look like the average Joe is aware of the implications that a simple click in an email can have.

If you are reading this email, then you are probably a network professional working in an organization. With that, you have a unique opportunity to organize a simple workshop with your employees to educate them about the potential threats, and remind them that it’s not a good idea to do anything with suspect emails.

botnet.png

There is a wealth of information available on the web about educating users on spam and general computer security. We all know that software can only do so much – it’s a constant cat & mouse game between the researchers and the bad guys. It’s simply not possible, at least not today, to make the computers we use on a daily basis 100% secure.

While securing computers in a corporation is possible to some extent using whitelisting, content filters and such, doing the same thing for home computers is much more difficult. And it’s those computers that are most likely to be part of a botnet.

I can only imagine that the average user does not know that botnets can span thousands, if not millions, of computers. The Conficker botnet alone infected around 10 million computers and has the capacity to send 10 billion emails per day.

Let’s face it, the situation will not improve as long as people will click links in emails and open attachments from suspicious senders.

I encourage you to organize a training session with your users on a regular basis. If your organization is large, then you might want to start with the key employees first, and maybe create a tiered training structure.

Our Network is Safe

You might think that your network is safe. You have AntiVirus, white listing, AntiMalware, firewalls in every corner, web content filters and more. Scheduling a training sessions to tell your users on not to do the obvious, is probably the last thing on your mind.

But read on.

Risky behavior by your end users will not only affect global spam rates, but your organization as well. Corporate espionage is growing, and spies (whether they are from a foreign government or corporation) often use email to initially get access to an individuals computers. See SANS Corporate Espionage 201 (PDF) for some techniques being employed.

For example, pretty much every organization has people working from home. If a malicious attacker can compromise a home computer that is used to access a corporate network (even if it’s just used to access emails) and install a key logger, then they will most likely have gotten access to your corporate network. Once they have their foot in the door, it’s only a matter of time.

There are plenty of resources available on the net on how to educate users on security, spam and so forth. A short training session of 20 minutes is probably enough. The message to convey is simple, and if you keep a few points in mind the session can even be fun. Consider the following for the training session:

  • Be sure to interact with your users. Start off by asking them if they use A/V software or AntiMalware software at home.
  • Tell them about botnets, and if they would be happy knowing that their computer is part of a 10 million botnet controlled by people in the Ukraine.
  • Be sure to explain that a single users actions can compromise their corporate network.
  • Explain that technology cannot provide 100% security against intruders.

Of course, user education alone is not the answer to solving security problems like viruses, phishing and the like. Encryption, digital signatures (especially for corporate emails), white-listing all should be employed regardless of user education.

Resources

2010 MAAWG Consumer Survey Key Findings Report (6 pages)
2010 MAAWG Consumer Survey Full Report (87 pages)

Using Cartoons to Teach Internet Security
Get IT Done: IT pros offer tips for teaching users

 

UNICODE – ONE code to rule them all

If you live in an English-speaking country like the United States, United Kingdom or Australia, then you are in the lucky position where every character in your language can be represented by the ASCII table. Many other languages aren’t as lucky unfortunately, and it is no surprise given the fact that over 1000 written languages exist. Most of these languages cannot be interpreted by ASCII, most notably Asian and Arabic languages.

Take the text below for example, ASCII would be struggling with this a bit (to say the least):

النمسا

Understanding UNICODE is no easy feat however – just the mere abbreviations out there can be mind-boggling: UTF-7, 8, 16, 32, UCS-2, BOM, BMP, code points, Big-Endian, Little-Endian and so forth. UNICODE support is particularly interesting when dealing with different platforms, such as Windows, Unix and OS X.

It’s not all that bad though, and once the dust settles it can all make sense. No, really. As such, the purpose of this article is to give you a basic understanding of UNICODE, enough so that the mention of the word UNICODE doesn’t give you cold shivers down your back.

Unicode is essentially one large character set that includes all characters of written languages, including special characters like symbols and so forth. The goal – and this goal is reality today – is to have one character set for all languages.

Back in 1963, when the first draft of ASCII was published, Internationalization was probably not on the top of the committee member’s minds. Understandable, considering that not too many people were using computers back then. Things have changed since then, as computers are turning up in pretty much every electrical device (maybe with the exception of stoves and blenders).

The easiest way to start is, of course, with ASCII (American Standard Code for Information Interchange). Gosh were things simple back in the 60s. If you want to represent a character digitally, you would simply map it to a number between 1 and 127. Voila, all set. Time to drive home in your Chevrolet, and listen to a Bob Dylan, Beach Boys or Beatles record. I won’t go in to the details now, but for the sake of completeness I will include the ASCII representation of the word “Bob Dylan”:


String:      B    o    b         D    y    l    a    n
Decimal:     66   111  98   32   68   121  108  111  110
Hexadecimal: 0x42 0x6F 0x62 0x20 0x44 0x79 0x6C 0x6F 0x6E
Binary:      01000010 01101111 01100010 00010100
01000100 01111001 01101100 01101111 01101110

Computers, plain and simple as they are, store everything as numbers of course, and as such we need a way to map numbers to letters, and vice versa. This is of course the purpose of the ASCII table, which tells our computers to display a “B” instead of 66.

Since the 7-bit ASCII table has a maximum of 127 characters, any ASCII character can be represented using 7 bits (though they usually consume 8 bits now). This makes calculating, how long a string is for example, quite easy. In C programs for example, ASCII characters are represented using chars, which use 1 byte (=8 bits) of storage. Here is an example in C:


char author[] = “The Beatles”;
int authorLen = strlen(author);        // authorLen = 11
size_t authorSize = sizeof(author);    // authorSize = 12

The only reason the two variables are different, is because C automatically appends a 0x0 character at the end of a string (to indicate where it terminates), and as such the size will always one char(acter) longer than the length.

So, this is all fine and well if we only deal with “simple” languages like English. Once we try to represent a more complex language, Japanese for example, things start to get more challenging. The biggest problem is the sheer number of characters – there are simply more than 127 characters in the world’s written languages. ASCII was extended to 8-bit (primarily to accommodate European languages), but this still only scratches the surface when you consider Asian and Arabic languages.

Hence, a big problem with ASCII is that is essentially a fixed-length, 8-bit encoding, which makes it impossible to represent complex languages. This is where the Unicode standard comes in: It gives each character a unique code point (number), and includes variable-length encodings as well as 2-byte (or more) encodings.

But before we go to deep into Unicode, we’ll just blatantly pretend that Unicode doesn’t exist and think of a different way to store Japanese text. Yes! Let us enter a world where every language uses a different encoding! No matter what they want to make you believe – having countless encodings around is fun and exciting. Well, actually it’s not, but let’s take a look here why.

The ASCII characters end at 127, leaving another 127 characters for other languages. Even though I’m not a linguist, I know that there are more than 127 characters in the rest of the world. Additionally, many Asian languages have significantly more characters than 255 characters, making a multi-byte encoding (since you cannot represent every character with one byte) necessary.

This is where encodings come in (or better, “came” in before Unicode was established), which are basically like stencils. Let’s use Japanese for our code page example. I don’t speak Japanese unfortunately, but let’s take a look at this word, which means “Farewell” in Japanese (you are probably familiar with pronunciation – “sayōnara”):

さようなら

The ASCII table obviously has no representation for these characters, so we would need a new table. As it turns out, there are two main encodings for Japanese: Shift-JIS and EUC-JP. Yes, as if it’s not bad enough to have one encoding per language!

So code pages serve the same purpose as the ASCII table, they map numbers to letters. The problem with code pages – opposed to Unicode – is that both the author and the reader need to view the text in the same code page. Otherwise, the text will just be garbled. This is what “sayōunara” looks like in the aforementioned encodings:



EUC-JP

0xA4 B5 A4 E8 A4 A6 A4 CA A4 E9

Shift_JIS
0x82 B3 82 E6 82 A4 82 C8 82 E7

Their numerical representation between EUC-JP and Shift_JIS is, as is to be expected, completely different – so knowing the encoding is vital. If the encodings don’t match, then the text will be meaningless. And meaningless text is useless.

You can imagine that things can get out of hand when one party (party can be an Operating System, Email client, etc.) uses EUC-JP, and the other Shift_JIS for example. They both represent Japanese characters, but in a completely different way.

Encodings can either (to a certain degree) be auto-detected, or specified as some sort of meta information. Below is a HTML page with the same Japanese word, Shift_JIS encoded:


<HTML>
<TITLE>Shift_JIS Encoded Page</TITLE>

    <META HTTP-EQUIV=”Content-Type” CONTENT=”text/html; charset=Shift_JIS”>
<BODY>
さようなら
</BODY>
</HTML>

You can paste this into an editor, save it has a .html file, and then view it in your favorite browser. Try changing “Shift_JIS” to “EUC-JP”, fun things await you.

But I am getting carried away, after all this post is about Unicode, not encodings. So, Unicode solves these problems by giving every character from every language a unique code point. No more “Shift_JIS”, no more “EUC-JP” (not even to mention all the other encodings out there), just UNICODE.

Once a document is encoded in Unicode, specifying a code page is no longer necessary – as long as the client (reader) supports the particular Unicode encoding (e.g. UTF-8) the text is encoded with.

The five major Unicode encodings are:

UTF-8
UCS-2
UTF-16 (an extension of UCS-2)
UTF-32
UTF-7

All of these encodings are Unicode, and represent Unicode characters. That is, UTF-8 is just as capable as UTF-16 or UTF-32. The number in the encoding name represents the minimum number of bits that are required to store a single Unicode code point. As such, UTF-32 can potentially require 4 x as much storage as UTF-8 – depending on the text that is being encoded. I will be ignoring UTF-7 going forward, as its use is not recommended and it’s not widely used anymore.

The biggest difference between UTF-8 and UCS-2/UTF-16/UTF-32 is that UTF-8 is a variable length encoding, opposed to the others being fixed-length encodings. OK, that was a lie. UCS-2, the predecessor of UTF-16, is indeed a fixed length encoding, whereas UTF-16 is a variable length encoding. In most use cases however, UTF-16 uses 2 bytes and is essentially a fixed length encoding. UTF-32 on the other hand, and that is not a lie, is a fixed-length encoding that always uses 4 bytes to store a character.

Let’s look at this table which lists the 4 major encodings and some of their properties:

Encoding   Variable/Fixed   Min Bytes   Max Bytes
UTF-8 variable 1 4
UCS-2 fixed 2 2
UTF-16 variable 2 4
UTF-32 fixed 4 4

What this means, is that in order to represent a Unicode character (e.g. さ), a variable length encoding might require more than 1 byte, and in UTF-8’s case up to 4 bytes. UTF-8 needs potentially more bytes, since it maintains backward-compatibility with ASCII, and as such loses 7 bits.

Windows uses UTF-16 to store strings internally, as do most Unicode frameworks such as ICU and Qt‘s QString. Most Unixes on the other hand use UTF-8, and it’s also the most commonly found encoding on the web. Mac OSX is a bit of a different beast; due to it using a BSD kernel, all BSD system functions use UTF-8, whereas Apple’s Cocoa framework uses UTF-16.

UCS-2 or UTF-16
I had already mentioned that UTF-16 is an extension of UCS-2, so how does it extend it and why does it extend it?

You see, Unicode is so comprehensive now that it encompasses more than what you can store in 2 bytes. All characters (code points) from 0x0000 to 0xFFFF are in the “BMP“, the “Basic Multilingual Plane”. This is the plane that uses most of the character assignments, but additional planes exist, and here is a list of all planes:

•    The “BMP”, “Basic Multilingual Plane”, 0x0000 -> 0xFFFF
•    The “SMP”, “Supplementary Multilingual Plane”, 0x10000 -> 0x1FFFF
•    The “SIP”, “Supplementary Ideographic Plane”, 0x20000 -> 0x2FFFF
•    The “SSP”, “Supplementary Special-purpose Plane”, 0xE0000 -> 0xEFFFF

So technically, having 2 bytes available is not even enough anymore to cover all the available code points, you can only cover the BMP. And this is the main difference between UCS-2 and UTF-16, UCS-2 only supports code points in the BMP, whereas UTF-16 supports code points in the supplementary planes as well, through something called “surrogate pairs“.

Representation in Unicode
So let’s look at the above sample text in Unicode, shall we? Sayonara Shift_JIS & EUC-JP! The site http://rishida.net/tools/conversion/ has some great online tools for Unicode, one of which is called “Uniview“. It shows us the actual Unicode code points, the symbol itself and the official description:

eventlogblog_unicode_uniview.pngThe official Unicode notation (U+hex) for the above characters uses the U+ syntax, so for the above letters we would write:


U+3055 U+3088 U+3046 U+306A U+3089

With this information, we can now apply one of the UTF encodings to see the difference:


UTF-8

E3 81 95 E3 82 88 E3 81 86 E3 81 AA E3 82 89

UTF-16
30 55 30 88 30 46 30 6A 30 89

UTF-32
00 00 30 55 00 00 30 88 00 00 30 46 00 00 30 6A 00 00 30 89

So UTF-8 uses 5 more bytes than UCS-2/UTF-16 to represent the same exact characters. Remember that UCS-2 and UTF-16 would be identical for this text since all characters are in the BMP. UTF-32 uses yet 5 more bytes then UTF-8 and would be require the most storage space, as to be expected.

What you can also see here, is that UTF-16 essentially mirrors the U+ notation.

Fixed Length or Variable Length?
Both encoding types have their advantages and disadvantages, and I will be comparing the most popular UTF encodings, UTF-8 and UCS-2, here:

Variable Length UTF-8:
•    ASCII-compatible
•    Uses potentially less space, especially when storing ASCII
•    String analysis/manipulation (e.g. length calculation) is more CPU-intensive

Fixed Length UCS-2:
•    Potentially wastes space, since it always uses fixed amount of storage
•    String analysis/manipulation is usually less CPU intensive

Which encoding to use will depend on the application. If you are creating a web site, then you should probably choose UTF-8. If you are storing data in a database however, then it will depend on the type of strings that will be stored. For example, if you are only storing languages that cannot be represented through ASCII, then it is probably better to use UCS-2. If you are storing both ASCII and languages that require Unicode, then UTF-8 is probably a better choice. An extreme example would be storing English-Only text in a UCS-2 database – it would essentially use twice as much storage as an ASCII version, without any tangible benefits.

One of the strongest suits of UTF-8, at least in my opinion, is its backward compatibility with ASCII. UTF-8 doesn’t use any numbers below 127 (0x7F), which are – well – reserved for ASCII characters. This means that all ASCII text is automatically UTF-8 compatible, since any UTF-8 parser will automatically recognized those characters as being ASCII and render them appropriately.

The BOM
And this brings us to the next topic – the BOM (header). BOM stands for “Byte Order Mark”, and is usually a 2-4 byte long header in the beginning of a Unicode text stream, e.g. a text file. If a text editor does not recognize a BOM header, then it will usually display the BOM header as either the þÿ or ÿþ characters.

The purpose of the BOM header is to describe the Unicode encoding, including the endianess, of the document. Note that a BOM is usually not used for UTF-8.

Let’s revisit the example from earlier, the UTF-16 encoding looked like this:


30 55 30 88 30 46 30 6A 30 89

If we wanted to store this text in a file, including a BOM header, then it could look also look like this:


FF FE 55 30 88 30 46 30 6A 30 89 30

“FF FE” is the BOM header, and in this case indicates that a UTF-16 Little Endian encoding is used. The same text in UTF-16 Big Endian would look like this:


FE FF 30 55 30 88 30 46 30 6A 30 89

The BOM header is generally only useful when Unicode encoded documents are being exchanged between systems that use different Unicode encodings, but given the extremely little overhead it certainly doesn’t hurt to add it to any UTF-16 encoded document. As such, Windows always adds a 2-byte BOM header to all Unicode text documents. It is the responsibility of the text reader (e.g. an editor) to interpret the BOM header correctly. Linux on the other hand, being a UTF-8 fan and all, does not need to (and does not) use a BOM header – at least not by default.

Tools & Resources
There are a variety of resources and tools available to help with Unicode authoring, conversions, and so forth.

I personally like Ultraedit, which lets me convert documents to and from UTF-8 and UTF-16, and also supports the BOM headers. GEdit on Linux is also very capable, and supports different code pages (if you ever need to use those) as well. Babelpad is an editor designed specifically for Unicode, and seems to support every possible encoding. I have not actually used this editor though.

A nifty online converter that I already mentioned earlier can be found at http://rishida.net/tools/conversion/, and also check out UniView: http://rishida.net/scripts/uniview/.

The official Unicode website is of course a great resource too, though potentially overwhelming to mere mortals that only have to deal with Unicode occasionally. The best place to start is probably their basic FAQ: http://www.unicode.org/faq/basic_q.html.

I hope this provides some clarification for those who know that Unicode exists, but are not entirely comfortable with the details.

さようなら!

Event 4964: Special Groups Feature for Vista + Windows 2008 Entrepreneurs

There is certainly a lot of talk about the benefits of using Vista, but a lot of administrators and users seem to be avoiding it and instead hold on to Windows XP – which now appears to have a better reputation than ever! Well, here is a small reason to upgrade to Vista or Windows Server 2008.

Microsoft introduced a new event, 4964, called the Special Groups Feature. The purpose of this feature is to log event 4964 to the security event log when a member of a group you specify logs on to a computer.

So let’s say you want to know when a member of a local Administrator group logs on to a computer (and with EventSentry you could get an email when that happens for example), then you can accomplish that with the special groups feature.

In order to use this feature you need to do three things:

  • Determine the SID of the group(s) you want to monitor
  • Specify the SID(s) of the groups you want to monitor in a registry key
  • Ensure that you are auditing the Special Logon Feature (enabled by default)

One way to obtain the SID of a group is to use the getsid.exe tool which is part of the Windows XP SP2 Support Tools and other Microsoft Resource Kits. Note that the primary purpose of this tool is to compare the SID of two user accounts (so it requires you to specify two user/group accounts), but you can just enter the same group name twice to get around this. Here is an example output of the tool:

getsid \\mydc “Domain Admins” \\mydc “Domain Admins”

The SID for account BUILTIN\
Domain Admins matches account BUILTIN\Domain Admins
The SID for account BUILTIN\Domain Admins is S-1-5-21-9817441204-4587651373-9817264971-512
The SID for account BUILTIN\Domain Admins is S-1-5-21-9817441204-4587651373-9817264971-512

As you can see you need to point to tool to computer where the group exists, in our case I used a domain controller since I want to monitor if somebody from the Domain Admins group logs on to the computer. If you monitor a built-in group (e.g. Administrators) then you will see that the SID is much shorter and the same across all your computers.

Now that we know the SID, we can specify it in the registry. Navigate to key HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Lsa\Audit and create a new String with the name SpecialGroups.

The value for this new string will be the SID of the group you want to monitor, and you can separate multiple SIDs with a semicolon. For example:

S-1-5-32-544;S-1-5-32-123-54-65

You do not have to reboot after making this change, it is effective immediately with the first subsequent login. The event that is being logged will look similar to this (screen shot from the EventSentry Web Reports):

Special Groups Logon 4964 ScreenshotThe relevant information is shown in the lower part of the event in the New Logon section. Security ID shows the user that logged on, and Special Groups Assigned shows the group the account is a member of (of course this group has to be specified in the registry).

Voila. This feature probably makes most sense on critical servers, though I would recommend enabling it on all workstations as well since you probably want to know if a member of the local Administrators group logs on. But of course this also means that you need to be running Vista on your network :-).

Since this feature needs to be activated using the registry, you can use AutoAdministrator to push this registry change to multiple computers. AutoAdministrator has actually been rewritten from scratch and we will be releasing a new version 2.0 very soon.

Event Log Message Files (The description for Event ID … cannot be found)

Anybody who has used the built-in event viewer that comes with Windows more than once, has probably seen the message “The description for Event ID ( 50 ) in Source ( SomeService ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer.” when viewing certain events. This message occurs more often when viewing events on a remote event log, but it appears often enough on the local machine as well.

event_message_id_cannot_be_found.pngI will explain this dubious error message here, but before I do I will explain how messages are in fact logged to the event log. After reading this you should have a much clearer picture about how applications log to the event log and how you go about troubleshooting this “error”.

The framework that Microsoft created for the event log, back in the NT 3.51 days, was actually quite sophisticated in many ways – especially when compared with the more simplistic Syslog capabilities (though Syslog still has some unique features).

A key feature of event logging in Windows is the fact that an application, at least when using the event log framework in the way it was intended to be used, will never actually directly write the actual message to the event log – instead it will log only the event source and event id, along with some properties such as category and insertion strings. The framework also supports multiple languages, so if you open an event on a French Windows, then the event will display in French (of course assuming that the message file from the vendor supports that) instead of English.

Let’s look at an example – using EventSentry – to understand this better. When EventSentry detects a service status change, it will log the event 11000 to the event log that reads something like this:

The service Print Spooler (Spooler) changed its status from RUNNING to STOPPED.

When EventSentry logs this event to the event log, you would expect that the application does (in a simplified manner) something like this:

LogToEventLog(“EventSentry”,
101000, “The service Print Spooler (Spooler) changed its status from RUNNING to
STOPPED.”);

However, this is NOT the case. The application logging to the event log never actually logs the message to the event log, instead the application would log something similar to this:

LogToEventLog(“EventSentry”,
101000, “RUNNING”, “STOPPED”);

(Note that the above example is for illustration purposes only, the actual code is somewhat more complicated)

So, our actual string from the event message is nowhere to be found, and that’s because the string is embedded in what is referred to as the “Event Message File”. The event message file contains a list of all events that an application could potentially log to the event log. Here is what an event message file looks like before it is compiled:

MessageId=10100
SymbolicName=EVENTSENTRY_SVC_STATUSCHANGE
Language=English
The status for service %1 (%2) changed from %3 to %4.
.
Language=German
Der Dienststatus von Dienst %1 (%2) aenderte sich von %3 auf %4.
.

Notice the numbers contained in the string that start with the percentage sign. These are placeholders for so-called insertion strings, and they make it possible to make the event log message dynamic, since an application developer can’t possible account for all imaginable error message or information that might be accumulated during the runtime of the application. For example, an application might log the name of a file that is being monitored to the event log, clearly this can’t be embedded into the event message file.

Instead, the application can insert strings (hence, insertion strings) into the event message during run time. Those strings are then stored in the actual event log, along with all the other static properties of event, such as the event id and the event source.

Event message files are usually DLL files, but event resources can also be embedded in executables – as is the case in EventSentry, where all events are contained in the eventsentry_svc.exe file. This is generally a good idea, since it reduces the number of files that have to be shipped with the software and it also prevents you from “losing” the message DLL.

You can browse through all embedded events in a message file by using the event message browser that is included in the free EventSentry SysAdmin Tools which you can download here. Simply launch the application, select an event log (e.g. Application), select an event source (e.g. EventSentry), and browse through all the registered event messages, sorted by the ID.

So now that we know how Windows handles event messages internally, we can go back to the original problem: “The description for Event ID ( 50 ) in Source ( SomeService ) cannot be found.”. The Windows Event Viewer logs this message for one of the following reasons:

* No message file is registered for the source (e.g. SomeService)
* The registered message file does not exist or cannot be accessed
* The specified event id is not included in the message file

If the message file is not registered, then this is probably because the application wasn’t installed correctly, or because it has already been uninstalled by the time you are trying to view the event message. For example, if the event message was logged before the application was uninstalled, but you are viewing the event after the application was uninstalled, then you will see this message.

If the event you are trying to view is important, then you can try to fix the problem yourself by either fixing the registry entry or locating the missing event message file.

The registry location depends on only two factors: The event log [EVENTLOG] the event was logged to as well as the event source [EVENTSOURCE].

HKLM\System\CurrentControlSet\Services\Eventlog\[EVENTLOG]\[EVENTSOURCE]

(Replace [EVENTLOG] and [EVENTSOURCE] with the respective values, and view/add/edit the value EventMessageFile. This is the value that points to the message file)

If this value doesn’t exist, then you can add it as either a REG_SZ or a REG_EXPAND_SZ value. You can specify multiple message files with a semicolon.

regedit_eventmessagefile.pngIf the message file specified in the value doesn’t exist, then you can simply copy it into the appropriate location – assuming you can get a hold of it that is :-). Oracle is notorious for not including the message file, in particular with the Express Edition.

A final note on message files for those of you haven’t had enough yet: You can use message files not only to translate event messages, but also for categories, GUIDs and more. Some of the values you might find (mostly in the security event log) are CategoryMessageFile, GuidMessageFile and ParameterMessageFile.

Well, this article turned out a lot longer than I had anticipated, but hopefully you will have a better understanding as to why this message is logged and what you can do about it.