vSphere Performance Troubleshooting using PAL

A while ago I was asked to aid in some performance issues in a customers’ vSphere environment. This article is all about troubleshooting your own environment using esxtop in combination with a tool called PAL (Performance Analysis of Logs) which can generate a report including alerts and graphs.

Approach

First of all, you need to decide how much data you need to capture from your ESXi host(s). In my case, the environment got really slow between 8 and 10 AM, so I suggested to capture data from 7 until 11 AM to easily identify the healthy and unhealthy state.

The way to capture this data is done using esxtop, which can be run in batch mode and captures all data in a CSV-based file or even a compressed file as the output can grow to extreme amounts (Excel files of about 200 meg is not impossible). How to exactly setup and run esxtop in the right way can be found on Duncan Epping’s article about esxtop.

This page also includes all the important counters you should pay attention to while you are troubleshooting.

Performance Analysis of Logs (PAL)

After you captured your data, it’s possible to replay it using various tools (also described in Duncan’s article), but PAL is not mentioned here. PAL was suggested to me by one of my colleagues who is using it to create Microsoft Exchange, Active Directory and IIS health reports. He is blogging together with other colleagues at uccexperts.com if you’re interested in some articles about Microsoft products.

Easy does it, launch the application, browse to your log file(s) and apply a threshold file (more about that further below).

PALinput

PAL is free and available for download at their CodePlex project page.

Threshold files

PAL can directly read your exported CSV file and apply so-called threshold filters to clean up unnecessary data. There are already over 60 built-in filters for various products like Exchange, Sharepoint, Lync and SQL.

As there isn’t a VMware vSphere threshold file yet, I used the metrics and thresholds described in Duncan’s article. I was able to put in the following counters:

  • CPU – CoStop
  • CPU – Max Limited
  • CPU – Ready
  • CPU – Swap Wait
  • CPU – System
  • DISK – Average Driver MilliSec/Command
  • DISK – Average Guest MilliSec/Command
  • DISK – Average Kernel MilliSec/Command
  • DISK – Average Queue MilliSec/Command
  • MEM – Memctl Current MBytes
  • MEM – Swap MBytes Read/sec
  • MEM – Swap MBytes Write/sec
  • MEM – Swap Used MBytes
  • MEM – Total Compress MBytes
  • NETWORK – Outbound Packets Dropped
  • NETWORK – Received Packets Dropped

For some reason, I wasn’t able to add MEM – N%L, DISK – ABRTS/s, DISK – RESETS/s and DISK – CONS/s. Maybe because the esxtop export I was using didn’t contain these counters.

To save you some time, the VMware vSphere (snowvm.com) version 1.1 threshold file is publicly available using the URL below:

VMware vSphere Threshold file (version 1.1, 2nd of March 2015)
VMware vSphere Threshold file (version 1.0, 20th of February 2015)

After downloading the threshold file, place it in the installation folder of PAL (default path is C:\Program Files\PAL\PAL). Be sure to remove any old versions of the threshold file.

Give PAL a spin afterwards and see the new threshold filter appear in the list as seen in the screenshot below. Following the PAL wizard and using this threshold will provide you with a readable report, including alerts based on proven thresholds.

Threshold v1.1

Oh, before I forget; I included some exclusions for idle CPU counters to filter out unnecessary data. Because each ESXi process has it’s own unique ID, the exclusion is not applied 100% correctly. Therefore, you should edit the PAL.ps1 file using the instructions on this page or simply paste these lines of code:

ForEach ($XmlExcludeNode in $XmlDataSource.SelectNodes(‘./EXCLUDE’))
{
If ($XmlCounterInstanceNode.NAME -match $XmlExcludeNode.INSTANCE)
{
$IsCounterInstanceMatch = $False
}
}

The report

To give you an idea about the way those reports are presented, be sure to check out the screenshots below. The time displayed is based on the UTC timezone when running the esxtop batch export, remember that when analyzing the report.

Conclusion

If you are ever in need to parse a lot of performance data, be sure to check this tool out! Got feedback? Please leave it below in the comments.

2 thoughts on “vSphere Performance Troubleshooting using PAL

  1. Hi René

    I used your VMware vSphere threshold file with PAL. I worked very good.
    Where you able to add the counters MEM – N%L, DISK – ABRTS/s, DISK – RESETS/s and DISK – CONS/s.??

    I think I need to exclude some CPU counters. I get lots of these:

    Vcpu(1:idle:32769:idle1)\% Ready

    Where it shows high values. I have some difficulties to exclude them. Where in the PAL.ps1 file do I need to add the lines you posted above?

    Thanks and regards

    Patrick

  2. Hi Patrick, depending on what version you are using, you should try to find these lines:

    ForEach ($XmlExcludeNode in $XmlDataSource.SelectNodes(‘./EXCLUDE’))
    {
    If ($XmlExcludeNode.INSTANCE -eq $XmlCounterInstanceNode.NAME)
    {
    $IsCounterInstanceMatch = $False
    }
    }

    In the latest version, I spotted them by searching for XmlExcludeNode in the PAL.ps1 file.

    Please let me know if it worked out!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s