Dealing with Windows Performance with Splunk

It has been a busy few weeks for me. I’ve just left Splunk to strike out on my own, so expect me to be blogging about that at some point. In the mean time, I’ve moved my musings on Splunk stuff to this blog as well.

Let’s start with dealing with Windows Performance Data in Splunk. When you add a Windows performance monitor, you will get events like this:

02/13/2015 15:09:59.044 -0800
counter="C3 Transitions/sec"

Hmm – 125 bytes for one counter. The Processor object has 15 counters in it, so every time you want to record all 15 counters, that’s a minimum of 1,875 bytes and 15 IOPS, without the overhead of indexing and so on. Of course, this is gzipped on disk, but that counts against your indexing volume.

With performance, I think there are two basic things you want to do – you want to collect a list of all the objects, counters and instances on a host so you can push it into a CSV or KVstore table for using in drop-downs on your dashboards, and you want to draw a graph of the performance counter over some period of time. Both of these are very easy with this format. To get the list:

eventtype=win-performance sourcetype=Perfmon:* 
| fillnull value=NULL instance
| stats count by host,collection,object,counter,instance
| table host,collection,object,counter,instance

To get the chart of a particular counter:

eventtype=win-performance host=MYHOST sourcetype=Perfmon:Memory 
    object=Memory counter="% Committed Bytes In Use" instance=0 
| timechart max(Value) as Value

However, there is another method of storing all these counters – table mode. To enable it, you edit the inputs.conf where you have defined your performance counter and you add the following:


When you see the events now, they look something like this:

C:	51.904982261298784	16825	0	24.377964337190882	0.24377964337190883	0	0	24.377964337190882	0.24377964337190883	0.012570000000000001	0	0.012570000000000001	19.69922436273994	0	19.69922436273994	88756.825288761072	0	88756.825288761072	4505.6000000000004	0	4505.6000000000004	82.074419311847109	0

Just one line – no time and date stamp – 23 fields in 296 bytes. That’s an approximately 85+% savings in disk storage and license usage. More importantly, it’s 1 IOP instead of 15, saving significant performance on your indexing tier. However, it’s not the most friendly of events to work with. Let’s take a look at our two featured searches again, starting with getting a time chart.

Getting the time chart is the easier of the two – you just have to remember that the counter is the field name. Just check out the sidebar to see that. You can do the same search as before like this:

eventtype=win-performance host=MYHOST sourcetype=PerfmonMk:Memory
    object=Memory instance=0
| timechart max(%_Committed_Bytes_In_Use) as Value

Note that the spaces in the counter are replaced by underscores in the field name. Now, about that list of objects, counters and instances for the drop-downs. That was a little more difficult…

eventtype=win-performance sourcetype=PerfmonMk:*
| stats count(*) by host,collection,object,instance 
| eval f = "" 
| foreach count* [ eval f=if('<>' > 0,f + "|<>",f) ] 
| eval f=replace(f,"^|count(","") | eval f=replace(f, ")$","") 
| eval fmv = split(f, ")|count(") 
| fields host,collection,object,instance,fmv 
| mvexpand fmv 
| search 
| eval fmv=replace(fmv, "_"," ") 
| rename fmv AS counter
| table host,collection,object,counter,instance

This may need some dissecting. First off, we get the events for every single multikv performance record. We then do a stats and include every field not in the by clause. This will give us a bunch of fields named things like count(%_Committed_Bytes_In_Use). The next two lines use a foreach loop to construct a new variable called f which is the concatenation of all the fields. The resulting f variable will start with |count( and end with ) and each element has a |count( separator. We can use these separators to create a multi-valued field, but first we have to get rid of the start and end stuff, which is what the replace statements do. The split then creates the multi-valued field and the mvexpand turns the multi-valued field into lots of rows. Finally, there are some fields that we are not interested in – the ones that Splunk puts in by default. We can filter those before or after, but I chose after. Finally, remember the field names have underscores but the counters have spaces, so I fix that as well.

You will probably have to create the eventtype=win-performance for yourself – these is nothing magical about it. Mine reads

search = index=winperf (sourcetype=Perfmon:* OR sourcetype=PerfmonMk:*)

So there you have it – now you can have your indexing performance and licensing usage at a tolerable level and get all the information you need as well.