How to programmatically monitor system availability

System availability overview

Overall stability and availability of a machine in a specific time range is usually called system uptime. This measurement represents a period (sometimes percentage) when the system is stable and performing without unattended reboots, except for maintenance and administrative purposes, and works without issues. The opposite, system downtime is a period when machine is turned off (on purpose), or encounters experiences problems that result is the system being unavailable to users and processes. The combination of these two measurements is called system availability which is both identified and tracked with Windows Event Viewer (System Log).

A crucial concern related to the system availability monitoring is: how long the system is running and/or why it stopped at the specific moment. The bigger and more complex the system is, the more important it is to check the ratio of uptime/downtime and monitor it thoroughly. Uptime expectations can vary between 95%-99.999%, based on ideal (and expected) projections of machine availability.

Methods for checking and monitoring the system availability status

There are several approaches to determine system uptime/downtime, manually and programmatically.

Checking uptime visually with Windows Task Manager

There is information about current uptime (which means since the last system boot) in the Performance tab in Windows Task Manager, within CPU thread:

(Note that above is the screenshot from Windows 10, and position of the uptime information may vary slightly depending on Windows version, but always with the same format: H:MM:SS:tt)

Checking uptime programmatically with PowerShell

This script will show the length of uptime since the last boot, and generate a Text file with that information:

(get-date) – (gcim Win32_OperatingSystem).LastBootUpTime | Out-File “d:\LengthUptime.txt”

The results will appear like this:

Also, the information when the system initially booted can be fetched with this script:

((Get-WmiObject Win32_OperatingSystem).ConvertToDateTime((Get-WmiObject Win32_OperatingSystem).LastBootUpTime)) | Out-File “d:\LastBootUpTime.txt”

Pinging the remote machine as a pre-check for availability

To check the and monitor availability of the remote machine, ping it, and read results in Text file:

Test-Connection “<name_of_the_machine>” | Out-File “d:\RemoteAvailability.txt”

Pinging the remote machine doesn’t provide valid information on real system availability, because it is expected that the machine answers to the sent ping, and the real status is unreachable at that point. To see the exact time of the remote machine’s last boot up time.

$LastBootUpTime = Get-WmiObject Win32_OperatingSystem -Comp <name_of_the_remote_machine> | Select -Exp LastBootUpTime
[System.Management.ManagementDateTimeConverter]::ToDateTime($LastBootUpTime) | Out-File “d:\RemoteLastBootUpTime.txt”

Finally, use this script for the system uptime information from the remote machine, displaying the number of days, hours, minutes and seconds how long system is running:

((Get-Date) – ([wmi]”).ConvertToDateTime((Get-WmiObject win32_operatingsystem -Comp <machine_name> LastBootUpTime)).ToString(“dd\-hh\:mm\:ss”) | Out-File “d:\RemoteLengthUptime.txt”

Parsing the Event Viewer log for uptime/downtime errors

Previous examples were only meant to determine the last boot time and uptime for machines. Although there are many events that relate to uptime/downtime, the focus in this article will be on exact moments when the user logs in and logs out. To get more accurate information on these events you will need to analyze the Windows Event Viewer’s System log.

These events have a unique IDs, which reveal the level (type of information), date and time, source (the event handler), the true reason of the particular system availability behavior, and specific time span when they occurred. We’ll focus on some of them, displayed in the grid below:

ID Source Level of severity* Description
7001 Winlogon Information This event represents the moment when user logs in.
7002 Winlogon Information This event represents the moment when user logs off.

* In Level of severity column, the information in brackets represents the System log’s schema for naming levels of severity (Error, Information, etc.).

To examine the System log further, run the eventvwr.msc, and choose Windows Logs -> System:

Create a custom view, in order to filter down to only the desired events with a particular Event ID, by clicking Create Custom View, like shown above. The new dialog will appear, select and input like shown below:

This filter for the custom view can be also saved (in this case, it will be titled Uptime-Downtime Log):

Sort the events by Date and Time, and examine particular event within collected information in General and Details tab, if needed:

As can be seen, details about events mentioned above are fully present here in the System log, which can be exported and used for further analysis.

Automatically ping the remote machine and check availability

The previously mentioned script for pinging the remote machine can be used to log the current availability of the machine, if it is included in custom command alert:

Powershell.exe “Test-Connection <name_of_the_machine> | Out-File d:\RemoteAvailability.txt”

Extract other uptime/downtime events with specific IDs and emphasize the investigation of system availability

Use this PowerShell script to generate the Extra Uptime Downtime Events Log, with customized date (follow this format in the input), and include it in custom command alert action:

powershell.exe “Get-WinEvent -FilterHashtable @{logname=’system’;id=6008,6009;StartTime=”MM/DD/YY“;EndTime=”MM/DD/YY“;} -ErrorAction SilentlyContinue | Out-File d:\ExtraUptimeDowntimeEventsLog.txt -Append -Force”

Results should show like this:

The explanation about mentioned IDs are in this grid below:

ID Source Level of severity Description
6008 EventLog High (Error) This event is present when system starts after the unexpected shut down.
6009 EventLog None (Information) User-initiated reboot (using CTRL+ALT+DEL, e.g.).

Useful resources:

December 23, 2016