We recently had to troubleshoot an interesting problem on a Windows XP workstation that had just been recently installed. There was nothing unusual about that computer: It was a member of a domain, had all the latest patches, AntiVirus software and of course the EventSentry agent installed.
What happened daily was this: The computer would boot up ok without any problems, but at some point several windows-related error messages would be emailed to us by EventSentry, after which remote access (with the exception of a basic ping) to the computer was impossible. This made troubleshooting this problem particularly difficult since it was located in a remote location. The user of that workstation never actually reported any problems, but the wealth of error message we received from the event log confirmed that something was wrong on that computer. And, since we believe in preventative maintenance, we decided to take a look and get to the bottom of it.
Further investigation of the computer showed that a number of critical services (e.g. Server service) would be stopped a couple of hours after the computer had booted, explaining why we couldn’t access the computer remotely anymore. Of course we didn’t yet know why these services were stopping.
We briefly considered re-installing the computer in question, but since it had just recently (less than a month ago) been installed, the problem would probably just re-surface again later. Any search for malware also didn’t yield anything.
At this point I started to review the event log history of the computer in more detail through the EventSentry Web Reports. Since we were collecting event logs from that computer (which worked well, even when we couldn’t access it remotely), viewing and searching for events was fast and easy (even though the computer was across a WAN and essentially unreachable).
I didn’t expect to find much (critical events had already been emailed to us), but I browsed through the application and system event logs anyway and came across an interesting event:
Event Log: Application
Event Type: Error
Event Source: Application Error
Event ID: 1000
Message: Faulting application svchost.exe, version 5.1.2600.5512, faulting module xxTSP3x.tsp, version 1.0.0.1, fault address 0x000f1528.
Even though this was an error event, we didn’t actually receive it via email since we had earlier decided to exclude all “Application Error” events – due to the overwhelming noise that various crashing executables on workstations usually generate.
Svchost.exe is a generic host process, and Windows XP (and later) run multiple services as part of a single svchost.exe process. On Vista for example, a single svchost.exe process might host as many as 18 services – all part of a single process. Windows usually runs multiple svchost.exe processes, all “hosting” one or more services. This makes troubleshooting problems with the svchost.exe process somewhat difficult, since a faulting svchost.exe process can potentially point to dozens of services. My Vista machine runs 67 services inside only 16 svchost.exe processes. Using the tasklist.exe command, you can list all running svchost processes as well as the services running inside each of them:
tasklist /SVC /FI “IMAGENAME eq svchost.exe”
Image Name PID Services
========================= ======== ============================================
svchost.exe 912 DcomLaunch, PlugPlay
svchost.exe 1008 RpcSs
svchost.exe 1072 WinDefend
svchost.exe 1148 Audiosrv, Dhcp, Eventlog, lmhosts, wscsvc
svchost.exe 1180 AudioEndpointBuilder, CscService, hidserv,
Netman, PcaSvc, SysMain,
TabletInputService, TrkWks, UxSms,
WdiSystemHost, Wlansvc, WPDBusEnum, wudfsvc
svchost.exe 1216 AeLookupSvc, BITS, Browser, EapHost,
IKEEXT, iphlpsvc, LanmanServer, MMCSS,
ProfSvc, RasMan, Schedule, seclogon, SENS,
SharedAccess, ShellHWDetection, Themes,
Winmgmt, wuauserv
svchost.exe 1364 gpsvc
svchost.exe 1480 EventSystem, FDResPub, LanmanWorkstation,
netprofm, nsi, SSDPSRV, SstpSvc, TBS,
upnphost, W32Time, WebClient
svchost.exe 1600 CryptSvc, Dnscache, KtmRm, NlaSvc, TapiSrv,
TermService
svchost.exe 1872 BFE, DPS, MpsSvc
svchost.exe 856 BthServ
svchost.exe 2228 Net Driver HPZ12
svchost.exe 2280 Pml Driver HPZ12
svchost.exe 2304 PolicyAgent
svchost.exe 2364 stisvc
svchost.exe 2788 WerSvc
Note that the grouping of services varies from OS to OS – Windows Server 2003 combines different services than Windows XP does for example.
Back to our problem, the error event fortunately contains additional information, such as the module where the process crashed: xxTSP3x.tsp. If you are a bit familiar with TAPI, the Microsoft Telephony API, then you might know that files with the .tsp extension are TAPI Service Providers, essentially drivers that communicate directly with the phone hardware. Bingo – it was a problem with that TSP driver that caused the svchost.exe process to fail, which in turn killed all other services that run inside that same process. On a Vista machine for example, a crashing Telephony (tapisrv) service would mean that the CryptSvc, Dnscache, KtmRm, NlaSvc, TapiSrv and TermService would all terminate. What solitarity.
Coincidentally, the computer(s) in question where running a VoIP application that was utilizing this TSP driver, and was in fact having problems. No kidding you might say, if the underlying driver crashes. Fortunately we were able to get an update from the developers which ultimately resolved this problem.
Now, I couldn’t help but wonder whether I could change the grouping of services. Let’s just pretend that we wouldn’t have been able to get an update for the driver quickly and would need to isolate the Telephony service, so that a crash of a TSP driver wouldn’t affect the LanmanServer service (on XP the Telephony service is in a group with most critical system services, something that was changed in Vista). All I would have to do was create a new group that would only include the telephony service, and finally change the telephony service itself to point to that group. Turns out that this is possible!
As always, you might want to backup any registry keys that you modify before you make such substantial changes like the ones listed below:
1. Create a new svchost group called Telephony
2. Change the service to utilize the telephony group
Now that the group has been created, we can change the service itself to point to the new svchost group. In the registry editor, navigate to HKLM\System\CurrentControlSet\Services\TapiSrv and edit the ImagePath value. Change it from
%SystemRoot%\System32\svchost.exe -k netsvcs
to
%SystemRoot%\System32\svchost.exe -k telephony
Note that we are changing the value that is passed through the -k parameter to reflect the name of the svchost group that we created earlier.
I rebooted the computer after the change, though this is probably not even be necessary. Voila, the telephony service now runs in its own svchost.exe process.
Image Name PID Services
========================= ====== =============================================
svchost.exe 916 DcomLaunch, TermService
svchost.exe 1000 RpcSs
svchost.exe 1092 AudioSrv, BITS, Browser, CryptSvc, Dhcp,
dmserver, ERSvc, EventSystem, helpsvc,
LanmanServer, lanmanworkstation, Netman,
Nla, Schedule, seclogon, SENS, SharedAccess,
ShellHWDetection, srservice, Themes, TrkWks,
W32Time, winmgmt, wuauserv, WZCSVC
svchost.exe 1180 Dnscache
svchost.exe 1292 LmHosts, RemoteRegistry, SSDPSRV, WebClient
svchost.exe 2412 TapiSrv
I wouldn’t recommend making too many changes to these built-in groupings unless you have a particular problem to solve, or want to ensure that potentially unstable or vulnerable services are isolated.
Well, thanks to EventSentry we got critical errors emailed to us, and were able to review the event logs even when those computers where unreachable – speeding up the troubleshooting process significantly. And, with a little research, I learned a bit more about the svchost.exe process and how to tweak the default Windows setup in that regard.
Hope this was helpful,
Ingmar.