Logging to Splunk with Winston

I have to admit, I’ve still got a soft spot for Splunk in my heart. I spent several years developing apps there and it is still my go-to logging platform. Recently, I’ve been playing with ExpressJS and using Winston as my logger of choice, together with express-winston to hook the two pieces up. My projects inevitably start with this:

var express = require('express'),
    winston = require('winston'),
    expressLogger = require('express-winston');

var app = express();

app.use(expressLogger.logger({
    transports: [
        new winston.transports.Console()
    ],
    level: 'debug'
}));

// Do other setup here
app.listen(process.env.PORT || 3000);

This is all well and good, but what about Splunk? My prior version of this wrote the log to a file and then Splunk would consume the file. However, I’m operating inside of Azure App Service these days and my Splunk instance is operating on a different machine – it’s a virtual machine inside of Azure. So what am I to do?

Splunk recognized this and so they produced a high-performance HTTP event collector. This is a REST endpoint that allows you to submit data as long as you have a token. I’m not going to explain how to get a token (Splunk does a really good job of that). However, I need to handle the other side of things – the Winston transport.

Fortunately, Winston has a highly extensible transport system and I’ve done just that. You can download the module from npmjs.org or get it from my GitHub repository.

So, how do you use it? It doesn’t require Express, but I’m going to alter my prior example to show how easy it is. Note the highlighted lines:

var express = require('express'),
    winston = require('winston'),
    SplunkStreamEvent = require('winston-splunk-httplogger'),
    expressLogger = require('express-winston');

var app = express();

var splunkSettings = {
    host: 'my-splunk-host',
    token: 'MY-DATAINPUT-TOKEN'
};

app.use(expressLogger.logger({
    transports: [
        new winston.transports.Console(),
        new SplunkStreamEvent({ splunk: splunkSettings });
    ],
    level: 'debug'
}));

// Do other setup here
app.listen(process.env.PORT || 3000);

Block-by-block:

  1. Line 3 brings in the library – standard Node module management here
  2. Lines 8-11 define the options for the splunk-logging library
  3. Line 16 adds the transport to winston for logging

It’s as simple as that. There is one proviso. Underneath, it uses the excellent splunk-logging library. Winston expects that you send off each event individually. It doesn’t really stream events. As a result, setting any of the options in such a way that batching occurs will result in weird errors. That’s because Winston is expecting a callback for each event and the splunk-logging library doesn’t call the callback unless it actually writes to the channel. I haven’t done any high capacity tests to see what happens when batching does occur, so I’d avoid that for now.

If you find any bugs or wish to ask a question, please let me know through the GitHub Repository.

Installing Apache Solr on Linux as a Container

Have you ever had a need to do a recommendation engine (something akin to Amazon) for a website? Ranking search results? Search at scale is hard. Scale can be performance (how long do you really want to wait for the result back from a website), size (the Amazon search database is a huge database, I’m sure) or velocity (have you ever seen the rate at which a domain controller produces data!). Now, I’m not saying “Amazon uses Solr” here – I’m saying you can do it too, but you have to have the right tool.

Search is hard

I’m going to throw it out there. Search is hard. At least, it’s hard when you need to do it at scale. Many companies exist to ensure you don’t need to know how hard it is. Just think what happens when you type a search keyword into the Google search box. It’s just plain hard.

You can simplify the problem by understanding the data you want to search, how you want the results to be presented and how you intend to get there; and then utilizing the appropriate tool for the job.

Choose by Understanding the data

Are you searching structured data? Does it look like an excel table with headers at the top? Do you know what sort of data is in each column? If so, you’ve got a great case for a relational database running with SQL. If you need Enterprise features, I’d suggest SQL Server or Oracle. If not, then try MySQL or PostgreSQL.

Are you searching small messages, like events, in a time-series manner? Are you wanting to do searching that start “tell me what happened between these two points in time?” Then you want something like ELK or Splunk.

Smaller blobs of data? Well, if those blobs are key-value pairs, then try a NoSQL solution like MongoDB. If they are JSON, try Firebase.

How about bigger bits of data, like full documents? Then you want to go to a specific Document-based search system, like Apache Solr.

Sure, everyone in those environments will tell you that you can store all data in all of these databases. But sometimes, the database is tuned for a specific use – SQL databases for structured data, Splunk for time-series data, Apache Solr for document data.

Installing Apache Solr

My favorite method of trying out these technologies right now is to use Docker. I can spin up a new image quickly, try it out and then shut it down. That’s not to say that I would want to run a container in production. I’m using Docker as a solution to stamp out an example environment quickly.

To install and start running Apache Solr quickly, use the following:

docker run -d -p 8983:8983 -t makuk66/docker-solr

Apparently, there are companies out there that run clusters of Apache Solr in the hundreds of machines. If that is the case, I’m not worried at this point about scaling (although I do wonder what they could be searching!)

Before I can use Apache Solr, I need to create a collection. First of all, I need to know what the name of the container is. I use docker ps to find that out:

blog-0702-1

In my case, the name is jolly_lumier. Don’t like the name? Well, let’s stop that instance and re-run it with a new name:

docker stop f4dc01217dd3
docker rm f4dc01217dd3
docker run -d -p 8983:8983 --name solr1 -t makuk66/docker-solr

Now I can reference the container by my name. To create a collection:

blog-0702-2

Note I am referencing the container by name with the -t argument.

But what are collections and shards?

Great question. Terminology tends to bite me a lot. Every single technology has their own terminology and usually it isn’t defined well. There is an assumption that you already know what they are talking about Fortunately, Apache Solr has a Getting Started Document to define some things.

A collection is a set of documents that have been indexed together. It’s the raw data plus the index together. Collections implement a scaling technique called sharding in which the collection is split into multiple shards in order to scale up the number of documents in a collection beyond what could physically fit on a single server. Those shards can exist on one or more servers.

If you are familiar with map-reduce, then this will sound familiar. Incoming search queries are distributed to every shard in the collection. The shards all respond and then the results are merged. Check out the Apache Solr Wiki for more information on this. For right now, it’s enough to know that a collection is a set of data you want to search and shards are where that data is actually stored.

The Admin Interface

Apache Solr has a web-based admin interface. In my case, I’ve forwarded the local port 8983 to the port 8983 on the container. I can access the web-based admin interface at http://localhost:8983/solr. You will note that I could have created a collection (called a Core in the non-cloud version of Apache Solr) from the web interface.

Sending Documents to Solr

Solr provides a Linux script called bin/post to post data. It’s a wrapped Java function, which means you need Java on your system in order to use it. Want to index the entire Project Gutenberg – you can do it, but there are some extra steps – most notably installing Solr and Java on your client machine..

For my first test, I wanted to index some data I already had. I have a Dungeons and Dragons Spell List with all the statistics of the individual spells. This is in a single CSV file. To do this, I can do the following:

curl 'http://localhost:8983/solr/collection1/update?commit=true' --data-binary @Data/spells.csv -H 'Content-type:application/csv'

You will get something like this back:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">902</int></lst>
</response>

According to the manual, that means success (since the status is 0). Non-zero status means a failure of some description.

Searching Solr

Now, let’s take a look at the data. We can do that by using the web-UI at http://localhost:8983/solr/collection1/browse – it’s fairly basic and can search for all sorts of things. I can not only search for things like “Fire” (and my favorite Fireball spell), but also for things like “Druid=Yes” to find all the Druid spells.

My keen interest is in using this programatically, however. I don’t want my users to even be aware of the search capabilities of the Solr system. I’d prefer them not to know what I was running. After all, do you think “that’s a nice Solr implementation” when browsing your favorite web shop?

If I want to look for the Fireball spell, I do the following:

curl http://localhost:8983/solr/collection1/select?q=Fireball

The syntax and options for a query is extensive. You can read all about it on their wiki. The response is an XML document. If I want it in another format, I use the wt parameter:

curl 'http://localhost:8983/solr/collection1/select?q=Fireball&wt=json'

It’s a good practice to use quotes around your URL so that you don’t end up with the shell preempting your meaning with special characters (like the ampersand, which puts a process in the background).

What else can Solr do?

Turns out – lots of things. Here are a bunch of my favorite things:

  1. Faceting – when you search for something and you get a table that says “keywords (12)” – that’s faceting. It groups things together to allow for better drill-down navigation.
  2. Geo-spatial – location-based search (find something “near here”)
  3. Query Suggestions – that drop-down from google that suggests searches? Yep – you can do that too
  4. Clustering – automatically discover groups of related search hits

What’s Missing

There is a lot missing from a base Apache Solr deployment. I’ll try to put some of the more important ones down here, but there is a solution – check out LucidWorks. LucidWorks was founded by the guys who wrote Solr and it adds a lot of the enterprise features that you will want in their Fusion product.

  1. Authentication – talking of enterprise features, top of the list is authentication. That’s right – Solr has no authentication – not even an encrypted channel. That means anyone (out of the box) can just submit a document to your Solr instance if they have a route to the port that it’s running on. It relies on the web container (Jetty, Tomcat or JBoss for example) to do the authentication. This isn’t really a big problem as authentication is pretty well documented. Incidentally, the Docker image uses Jetty for the web container.
  2. Getting Data In – I was going to call this crawling. However, it is more than that. If you have a fairly static set of data, then maybe the API’s and command line tools are good enough. What if you want to index the data in your SharePoint application? How about all the emails flowing through your Exchange server? You will need to write (quite complex) code for this purpose.
  3. Monitoring – if you are running a large Solr deployment, then you will want to monitor those instances. Solr exposes this stuff via JMX – not exactly the friendliest approach.
  4. Orchestration – this is only important if you have gone into production with a nice resilient cluster. How do you bring up additional nodes when the load gets high and how do you run multi-node systems? The answer is zookeeper and it’s not pretty to set up and has several issues of its own.

What’s Solr not good at

Solr isn’t good at time-series data. Ok – it’s not that hard, but it’s still not the best thing for the job. Similarly, if you are doing per-row or per-field updates to records, then perhaps you should be using a relational database instead.

If you are indexing documents, however, then this is the tool to use. It’s easy to set up and get started. It has a REST interface for programmatic access and it likely does all the search and analytics related stuff you want.

Other References

Go ahead, explore!

Dealing with Windows Performance with Splunk

It has been a busy few weeks for me. I’ve just left Splunk to strike out on my own, so expect me to be blogging about that at some point. In the mean time, I’ve moved my musings on Splunk stuff to this blog as well.

Let’s start with dealing with Windows Performance Data in Splunk. When you add a Windows performance monitor, you will get events like this:

02/13/2015 15:09:59.044 -0800
collection=Processor
object=Processor
counter="C3 Transitions/sec"
instance=_Total
Value=0

Hmm – 125 bytes for one counter. The Processor object has 15 counters in it, so every time you want to record all 15 counters, that’s a minimum of 1,875 bytes and 15 IOPS, without the overhead of indexing and so on. Of course, this is gzipped on disk, but that counts against your indexing volume.

With performance, I think there are two basic things you want to do – you want to collect a list of all the objects, counters and instances on a host so you can push it into a CSV or KVstore table for using in drop-downs on your dashboards, and you want to draw a graph of the performance counter over some period of time. Both of these are very easy with this format. To get the list:

eventtype=win-performance sourcetype=Perfmon:* 
| fillnull value=NULL instance
| stats count by host,collection,object,counter,instance
| table host,collection,object,counter,instance

To get the chart of a particular counter:

eventtype=win-performance host=MYHOST sourcetype=Perfmon:Memory 
    object=Memory counter="% Committed Bytes In Use" instance=0 
| timechart max(Value) as Value

However, there is another method of storing all these counters – table mode. To enable it, you edit the inputs.conf where you have defined your performance counter and you add the following:

mode=multikv
showZeroValue=1

When you see the events now, they look something like this:

C:	51.904982261298784	16825	0	24.377964337190882	0.24377964337190883	0	0	24.377964337190882	0.24377964337190883	0.012570000000000001	0	0.012570000000000001	19.69922436273994	0	19.69922436273994	88756.825288761072	0	88756.825288761072	4505.6000000000004	0	4505.6000000000004	82.074419311847109	0

Just one line – no time and date stamp – 23 fields in 296 bytes. That’s an approximately 85+% savings in disk storage and license usage. More importantly, it’s 1 IOP instead of 15, saving significant performance on your indexing tier. However, it’s not the most friendly of events to work with. Let’s take a look at our two featured searches again, starting with getting a time chart.

Getting the time chart is the easier of the two – you just have to remember that the counter is the field name. Just check out the sidebar to see that. You can do the same search as before like this:

eventtype=win-performance host=MYHOST sourcetype=PerfmonMk:Memory
    object=Memory instance=0
| timechart max(%_Committed_Bytes_In_Use) as Value

Note that the spaces in the counter are replaced by underscores in the field name. Now, about that list of objects, counters and instances for the drop-downs. That was a little more difficult…

eventtype=win-performance sourcetype=PerfmonMk:*
| stats count(*) by host,collection,object,instance 
| eval f = "" 
| foreach count* [ eval f=if('&lt;&gt;' &gt; 0,f + "|&lt;&gt;",f) ] 
| eval f=replace(f,"^|count(","") | eval f=replace(f, ")$","") 
| eval fmv = split(f, ")|count(") 
| fields host,collection,object,instance,fmv 
| mvexpand fmv 
| search 
	fmv!="category" 
	fmv!="date_*" 
	fmv!="index" 
	fmv!="linecount" 
	fmv!="punct" 
	fmv!="source" 
	fmv!="sourcetype" 
	fmv!="splunk*" 
	fmv!="timeendpos" 
	fmv!="timestartpos" 
| eval fmv=replace(fmv, "_"," ") 
| rename fmv AS counter
| table host,collection,object,counter,instance

This may need some dissecting. First off, we get the events for every single multikv performance record. We then do a stats and include every field not in the by clause. This will give us a bunch of fields named things like count(%_Committed_Bytes_In_Use). The next two lines use a foreach loop to construct a new variable called f which is the concatenation of all the fields. The resulting f variable will start with |count( and end with ) and each element has a |count( separator. We can use these separators to create a multi-valued field, but first we have to get rid of the start and end stuff, which is what the replace statements do. The split then creates the multi-valued field and the mvexpand turns the multi-valued field into lots of rows. Finally, there are some fields that we are not interested in – the ones that Splunk puts in by default. We can filter those before or after, but I chose after. Finally, remember the field names have underscores but the counters have spaces, so I fix that as well.

You will probably have to create the eventtype=win-performance for yourself – these is nothing magical about it. Mine reads

[win-performance]
search = index=winperf (sourcetype=Perfmon:* OR sourcetype=PerfmonMk:*)

So there you have it – now you can have your indexing performance and licensing usage at a tolerable level and get all the information you need as well.

Adding SQL Services to an AutomatedLab Configuration

I switched back my efforts to AutomatedLab recently, mostly because education never stops. My main area of focus is to learn a little SQL Server, so I wanted a SQL Server lab. AutomatedLab – my preferred mechanism for creating labs on Hyper-V – has a nice role that does installs for SQL Server 2012 and 2014, but it wasn’t exactly adjustable and I wanted mine adjustable. I wanted to put the data on a separate drive, run the server as a domain service account, and only install the minimum required.

My first step was to add a SQL Server machine to my configuration. Since I’m also doing a little Splunk work (I can’t really get away from it right now), I did the following:

$SQLAct = @()
$SQLAct += Get-LabPostInstallationActivity -ScriptFileName InstallSplunkUF.ps1 -DependencyFolder "$PSScriptRoot\InstallSplunkUF"
Add-LabMachineDefinition -Name SQL1 `
  -Processors 2 -MemoryInMb 1024 -DiskSizeInGb 60 `
    -IPAddress "$Network.6" -DnsServer1 "$Network.2" -DnsServer2 "$Network.3" `
    -Network $LabName -DomainName $LabDNS -IsDomainJoined `
    -InstallationUserCredential $Creds -OperatingSystem 'Windows Server 2012 R2 SERVERSTANDARD' `
    -UserLocale en-US -TimeZone 'Pacific Standard Time' `
    -PostInstallationActivity $SQLAct -ToolsPath $Tools

This is very similar to our other machines – just no roles. We don’t want to use the SQL Server 2012 role. Down that path leads a completely standard SQL Server deployment that is good for development purposes, but no good for what I want. I need another hard drive before I get to installing SQL Server. I do that right before the Install-Lab -StartRemainingMachines so that I can add the hard drive after definition but before power-on:

Get-VM | Where Name -Like "SQL*" | Foreach-Object {
  $Name = $_.Name
  $VPath = "$LabPath\Data-$Name.vhdx"
  Remove-Item -Force -Path $VPath
  New-VHD -Path $VPath -SizeBytes 100GB -Dynamic
  Add-VMHardDiskDrive -VMName $Name -Path $VPath
}

This adds a 100GB drive to any machine starting with SQL (i.e. all my SQL machines, if they are so defined). I use the Install-LabPostInstallActivity cmdlet for the majority of the work, which includes:

  • Install any pre-requisites, like .NET Framework v3.5
  • Format the new drive as the I: drive
  • Create an I:\SQL directory
  • Mount the SQL Server ISO
  • Run the SQL Server unattended setup
# Handle each of the SQL Machines
$SQLMachines = Get-LabMachine -All | Where Name -like "SQL*"
foreach ($Machine in $SQLMachines)
{
    # Install .NET Framework 3.5
    Install-LabWindowsFeature -Name $Machine.Name -FeatureName Net-Framework-Core

    # Mount the DVD Drive with the ISO 
    Set-VMDvdDrive -VMName $Machine.Name -Path $SqlServer
    
    # Initialize the Data Disk
    Invoke-LabPostInstallActivity -ActivityName 'FormatDataDisk' -ComputerName $Machine.Name -UseCredSsp -ScriptBlock {
        Get-Disk | 
            Where PartitionStyle -eq 'raw' | 
            Initialize-Disk -PartitionStyle MBR -PassThru | 
            New-Partition -DriveLetter 'I' -UseMaximumSize | 
            Format-Volume -Verbose -FileSystem NTFS -NewFileSystemLabel "SQLData" -Confirm:$false
    }  

    # Turn off the Firewall
    Invoke-LabPostInstallActivity -ActivityName 'DisableFirewall' -ComputerName $Machine.Name -UseCredSsp -ScriptBlock {
        Set-NetFirewallProfile -Verbose -Name domain -Enabled false
    }

    # Install SQL Server
    Invoke-LabPostInstallActivity -ActivityName 'InstallSQLServer' -ComputerName $Machine.Name -UseCredSsp -ScriptBlock {
        New-Item -ItemType Directory -Path I:\SQL

        Set-Location -Path (Get-WmiObject -Class Win32_CDRomDrive).Drive
        .\Setup.exe /Q /Action=Install /ENU /IAcceptSQLServerLicenseTerms `
                /UpdateEnabled=0 /ErrorReporting=0 /IndicateProgress `
                /Features="SQLEngine,Tools" /InstanceDir=I:\SQL /InstanceName=MSSQLSERVER `
                /AgtSvcStartupType=Disabled /BrowserSvcStartupType=Disabled `
                /SQLSvcAccount=LAB\svc_sqlserver /SQLSvcPassword=P@ssw0rd `
                /SQLSvcStartupType=Automatic /SQLSysAdminAccounts=LAB\SQLAdmins `
                /SQMReporting=0 /FilestreamLevel=0 /TCPEnabled=1
    }
}

The Setup.exe contains all the necessary parameters to install SQL Server successfully. For this, I started with the TechNet page and then modified on a pre-built machine until I got it right. Then I just cut-and-paste it into the script. Note that I am specifying a service account on the domain and a domain group. I use an adjustment to my CreateDomain.ps1 script for these:

$sqlsvc = New-ADUser -Name svc_sqlserver -AccountPassword $SecurePassword -Path $ServicesOU -Enabled $true -PassThru
New-ADGroup -Name 'SQLAdmins' -SamAccountName SQLAdmins -Path $GroupsOU -GroupScope Domain -GroupCategory Security -DisplayName 'SQL Admins' 
Get-ADGroup -Identity 'SQLAdmins' | Add-ADGroupMember -Members @($devadmin,$sqlsvc) -PassThru

With this method (rather than the role method), you can install any type of SQL Server and also install differing types of SQL Server with differing settings as you see fit. This was exactly what I needed to move my work along.

Creating a Splunk Lab with AutomatedLab

It’s been a while since I have posted, but it’s also been a while since I’ve worked with AutomatedLab.  I’ve been busy moving to Seattle over the past couple of months.  I’m back now and I’ve got two labs to create – the first is for the Splunk Windows Infrastructure app and the second is for the Splunk Exchange app.  Let’s start with the easier one – Windows Infrastructure.​​​

​The header for this script is similar to all the other AutomatedLab scripts – it’s kind of a recipe at this point:

###
### Lab Creation Script
###
$Start = Get-Date

$LabName = "winfra"
$LabDNS = "winfra.local"
$Network = "172.16.50"
$AdminPW = "P@ssw0rd"

# This is where you want the Lab virtual machines to be installed
$VMPath = "D:\Hyper-V"

# This is where the ISO images are located
$ISOPath = "D:\ISO"

# This is installed with AutomatedLab and you shouldn't have to change it
$LabSources = "C:\LabSources"

## DO NOT CHANGE ANYTHING BELOW HERE

$LabIndicator = $LabName.ToString().ToUpper();
$LabPath = "$VMPath$LabName"
$ToolsPath = "$LabSourcesTools"

if (-not (Test-Path $LabPath)) {
  Write-Host "Creating Lab Directory $LabPath"
  New-Item $LabPath -ItemType Directory
}

Write-Host "$LabIndicator: Creating Lab Definition $LabName"
New-LabDefinition -Path $LabPath -VmPath $LabPath -Name $LabName -ReferenceDiskSizeInGB 60

Write-Host "$LabIndicator: Creating Network Definition with $Network.1/24"
Add-LabVirtualNetworkDefinition -Name $LabName -IpAddress "$Network.1" -PrefixLength 24

Write-Host "$LabIndicator: Adding OS Images"
Add-LabIsoImageDefinition -Name Win2012R2 -Path "$ISOPath\en_windows_server_2012_r2_with_update_x64_dvd_4065220.iso" -IsOperatingSystem

Write-Host "$LabIndicator: Administrator Credentials are Administrator / $AdminPW"
$Creds = New-Object PSCredential('Administrator', ($AdminPW | ConvertTo-SecureString -AsPlainText -Force))

Write-Host "$LabIndicator: Adding Domain Definition for domain $LabDNS = NetBIOS $LabName"
Add-LabDomainDefinition -Name $LabDNS -AdminUser 'Administrator' -AdminPassword $AdminPW

​I need five machines for this lab.  Firstly, I need my AD environment – 2 domain controllers running at Windows Server 2012 R2.  I need a manager machine and I need two Splunk machines – one will act as purely an indexer and the other as a search head, license master and deployment server.  Creating servers is relatively easy.  I’m using the root domain prep scripts with slight modifications to create lots of users.

###
$DC1Activities = @()
$DC1Activities += Get-LabPostInstallationActivity -ScriptFileName PrepareRootDomain.ps1 -DependencyFolder "$PSScriptRootPrepareRootDomain"
$DC1Activities += Get-LabPostInstallationActivity -ScriptFileName New-ADLabAccounts.ps1 -DependencyFolder "$PSScriptRootAddDomainUsers"
$DC1Activities += Get-LabPostInstallationActivity -ScriptFileName InstallSplunkUF.ps1 -DependencyFolder "$PSScriptRootInstallSplunkUF"
$DC1Roles = Get-LabMachineRoleDefinition -Role RootDC @{ DomainFunctionalLevel = 'Win2012R2'; ForestFunctionalLevel = 'Win2012R2' }
Write-Host "$LabIndicator>>> Adding WS2012 Standard (Core) Definition for DC1"
Add-LabMachineDefinition -Name DC1 -Processors 2 -MemoryInMb 512 -DiskSizeInGb 60 -Network $LabName `
    -IPAddress "$Network.2" -DnsServer1 "$Network.2" -DnsServer2 "$Network.3" -DomainName $LabDNS `
    -Roles $DC1Roles -IsDomainJoined -PostInstallationActivity $DC1Activities `
    -InstallationUserCredential $Creds -OperatingSystem 'Windows Server 2012 R2 SERVERSTANDARD' `
    -ToolsPath $ToolsPath -UserLocale en-US -TimeZone 'Pacific Standard Time'

$DC2Activities = @()
$DC2Activities += Get-LabPostInstallationActivity -ScriptFileName InstallSplunkUF.ps1 -DependencyFolder "$PSScriptRootInstallSplunkUF"
$DC2Roles = Get-LabMachineRoleDefinition -Role DC
Write-Host "$LabIndicator>>> Adding WS2012 Standard (Core) Definition for DC2"
Add-LabMachineDefinition -Name DC2 -Processors 2 -MemoryInMb 512 -DiskSizeInGb 60 -Network $LabName `
    -IPAddress "$Network.3" -DnsServer1 "$Network.2" -DnsServer2 "$Network.3" -DomainName $LabDNS `
    -Roles $DC2Roles -IsDomainJoined -InstallationUserCredential $Creds `
    -OperatingSystem 'Windows Server 2012 R2 SERVERSTANDARD' `
    -ToolsPath $ToolsPath -UserLocale en-US -TimeZone 'Pacific Standard Time' `
    -PostInstallationActivity $DC2Activities

$SPLUNKIDXActivities = @()
$SPLUNKIDXActivities += Get-LabPostInstallationActivity -KeepFolder -ScriptFileName InstallSplunk.ps1 -DependencyFolder "$PSScriptRootInstallSplunk"
Write-Host "$LabIndicator>>> Adding WS2012 Standard (Core) Definition for SPLUNKIDX"
Add-LabMachineDefinition -Name SPLUNKIDX -Processors 4 -MemoryInMb 512 -DiskSizeInGb 60 -Network $LabName `
    -IPAddress "$Network.4" -DnsServer1 "$Network.2" -DnsServer2 "$Network.3" -DomainName $LabDNS `
    -IsDomainJoined -InstallationUserCredential $Creds -ToolsPath $ToolsPath -UserLocale en-US `
    -OperatingSystem 'Windows Server 2012 R2 SERVERSTANDARD' -TimeZone 'Pacific Standard Time' `
    -PostInstallationActivity $SPLUNKIDXActivities

$SPLUNKActivities = @()
$SPLUNKActivities += Get-LabPostInstallationActivity -KeepFolder -ScriptFileName InstallSplunk.ps1 -DependencyFolder "$PSScriptRootInstallSplunk"
$SPLUNKActivities += Get-LabPostInstallationActivity -ScriptFileName InstallSplunkDS.ps1 -DependencyFolder "$PSScriptRootInstallSplunkDS"
Write-Host "$LabIndicator>>> Adding WS2012 Standard (Core) Definition for SPLUNK"
Add-LabMachineDefinition -Name SPLUNK -Processors 4 -MemoryInMb 512 -DiskSizeInGb 60 -Network $LabName `
    -IPAddress "$Network.5" -DnsServer1 "$Network.2" -DnsServer2 "$Network.3" -DomainName $LabDNS `
    -IsDomainJoined -InstallationUserCredential $Creds -ToolsPath $ToolsPath -UserLocale en-US `
    -OperatingSystem 'Windows Server 2012 R2 SERVERSTANDARD' -TimeZone 'Pacific Standard Time' `
    -PostInstallationActivity $SPLUNKActivities

$MANAGERActivities = @()
$MANAGERActivities += Get-LabPostInstallationActivity -ScriptFileName InstallSplunkUF.ps1 -DependencyFolder "$PSScriptRootInstallSplunkUF"
Write-Host "$LabIndicator>>> Adding Windows 2012R2 Standard Definition for MANAGER"
Add-LabMachineDefinition -Name MANAGER -Processors 2 -MemoryInMb 512 -DiskSizeInGb 60 -Network $LabName `
    -IPAddress "$Network.6" -DnsServer1 "$Network.2" -DnsServer2 "$Network.3" -DomainName $LabDNS `
    -IsDomainJoined -InstallationUserCredential $Creds -ToolsPath $ToolsPath -UserLocale en-US `
    -OperatingSystem 'Windows Server 2012 R2 SERVERSTANDARD' -TimeZone 'Pacific Standard Time' `
    -PostInstallationActivity $MANAGERActivities

You will note that I have specific activities for InstallSplunk and InstallSplunkUF.  Let’s take a side-bar right now and discuss those.  The Universal Forwarder (InstallSplunkUF.ps1) is the easier one.  I just place the splunkforwarder.msi for the version I want to install in the InstallSplunkUF directory together with an appropriate script:

###
### Install Splunk from the accompanying splunk.msi
###
$MSIFile = "$PSScriptRoot\splunkforwarder.msi"

Write-Host "Starting quiet install of $MSIFile"
$st = Get-Date
Start-Process -FilePath $MSIFile -Wait -Verbose `
-ArgumentList @( "AGREETOLICENSE=Yes /Liwem! C:splunk-install.log" )

### Don't forget to set the deploy-poll and add forward-server in your main lab setup

$en = Get-Date
Write-Host "Installation of $MSIFile took $($en-$st)"

The InstallSplunk post-install activity is similar.  However, I have an additional aim – I want to pre-set the admin password and then ensure that the user is not prompted to change the admin password on login.  This requires an additional file – user-seed.conf – that looks like this:

[user_info]
USERNAME = admin
PASSWORD = P@ssw0rd

Also, the script is slightly different.

###
### Install Splunk from the accompanying splunk.msi
###
$MSIFile = "$PSScriptRoot\splunk.msi"

Write-Host "Starting quiet install of splunk.msi"
$st = Get-Date

Start-Process -FilePath $MSIFile -Wait -Verbose -ArgumentList "AGREETOLICENSE=Yes LAUNCHSPLUNK=0 /Liwem! C:splunk-install.log"
Copy-Item -Path "$PSScriptRoot\user-seed.conf" -Destination "C:\Program Files\Splunk\etc\system\local"

# Hack to get rid of the UI Login Change Password prompt
New-Item -ItemType File -Path "C:\Program Files\Splunk\etc\.ui_login"

$en = Get-Date
Write-Host "Installation of splunk.msi took $($en-$st)"

I don’t do a lot within these scripts because there are many configuration options and I like to re-use the scripts.  I’d rather do the configuration in my main script – more on that later.  I also have another script called InstallSplunkDS – this adds a bunch of files to the splunk instance I am using for the deployment server and license master – it’s a basic copy operation.  Now, back to our main script​.  Our next step is to do the normal AutomatedLab machine creation.  Since my Splunk Indexer is going to store a bunch of data, let’s create a new virtual disk for it – we’ll configure it later.  I also want to set up our MANAGER machine with the remote system tools.

Write-Host "$LabIndicator>>> Exporting Lab Definition"
Export-LabDefinition -Force -ExportDefaultUnattendedXml
Write-Host "$LabIndicator>>> Setting up Remoting"
Set-LabHostRemoting
Write-Host "$LabIndicator>>> Importing Lab Definition"
Import-Lab -Path (Get-LabDefinition).LabFilePath

Write-Host "$LabIndicator>>> Basic Lab Configuration"
Install-Lab -NetworkSwitches -BaseImages -VMs
Write-Host "$LabIndicator>>> Creating Lab - Domains"
Install-Lab -Domains

if (Test-Path "$LabPath\SPLUNKIDX-Data.vhdx") {
    Remove-Item -Force "$LabPath\SPLUNKIDX-Data.vhdx"
}
Write-Host "$LabIndicator>>> Preparing Splunk Data Disk on SPLUNKIDX"
Write-Host "$LabIndicator>>> Creating new 100Gb VHD for SLUNKIDX"
New-VHD -Path "$LabPath\SPLUNKIDX-Data.vhdx" -SizeBytes 100GB -Dynamic
Write-Host "$LabIndicator>>> Connecting New VHD to SPLUNKIDX VM"
Add-VMHardDiskDrive -VMName SPLUNKIDX -Path "$LabPath\SPLUNKIDX-Data.vhdx"

Write-Host "$LabIndicator>>> Creating Lab - Other Machines"
Install-Lab -StartRemainingMachines
Write-Host "$LabIndicator>>> Sleeping for 3 minutes, just because it is required"
Start-Sleep -Seconds 180

Write-Host "$LabIndicator>>> Adding Features to MANAGER"
Install-LabWindowsFeature -Name MANAGER -FeatureName RSAT, GPMC, RSAT-DNS-Server

Write-Host "$LabIndicator>>> Doing Post Installs"
Install-Lab -PostInstallations

Finally, we are onto Splunk configuration.  This is where the new features of AutomatedLab v2.5 comes in – we can do powershell invocation directly within our script.  This gives us two ways to configure things – one is with a script in a directory and that’s useful when you need to transfer other things (like the MSI files) to the remote server.  The second way is to run powershell directly, and that gives us the ability to configure things separately.   My first thing to do is to format and mount the additional drive on SPLUNKIDX.  But there are additional things I want to do – I need to adjust where the databases are stored, start Splunk (auto-accepting the license agreement), set up a listener, pre-create indices and disable the Windows firewall.

Write-Host "$LabIndicator>>> Format and Mount Data Drive for SPLUNKIDX"
Invoke-LabPostInstallActivity -ComputerName SPLUNKIDX -UseCredSsp -ScriptBlock {
    Get-Disk |
       Where PartitionStyle -eq 'raw' |
       Initialize-Disk -PartitionStyle MBR -PassThru |
       New-Partition -DriveLetter I -UseMaximumSize |
       Format-Volume -FileSystem NTFS -NewFileSystemLabel "Disk2" -Confirm:$false

    # Adjust the DB location to be I:Splunk
    Write-Host "SPLUNKIDX>>> Creating Database Path"
    New-Item -Type Directory -Path "I:SPLUNK"
    ("`nSPLUNK_DB=I:SPLUNK") | Out-File -Encoding ascii -Append -FilePath "C:\Program Files\Splunk\etc\splunk-launch.conf"

    # Start Splunk with accept-eula
    Write-Host "SPLUNKIDX>>> Starting Splunk Instance"
    Set-Alias splunk "C:\Program Files\Splunk\bin\splunk.exe"
    splunk start --accept-license --answer-yes --no-prompt

    # Enable receiving on port 9997
    Write-Host "SPLUNKIDX>>> Enabling Listener on port 9997"
    splunk enable listen 9997 -auth admin:P@ssw0rd

    # Add indices
    @( "wineventlog", "windows", "perfmon" ) | Foreach-Object {
        Write-Host "SPLUNKIDX>>> Adding Index $_"
        splunk add index $_ -homePath "I:\SPLUNK\$_" -coldPath "I:\SPLUNK\cold\$_" -thawedPath "I:\SPLUNK\thawed\$_" -auth admin:P@ssw0rd
    }

    # Disable the firewall
    Write-Host "SPLUNKIDX>>> Disabling the firewall"
    Set-NetFirewallProfile -Name domain -Enabled False
}

The set up for the SPLUNK machine is simpler.  We have placed the deployment server and license files in the InstallSplunkDS area, so we don’t need to do anything there.  We do need to start Splunk and add a forward-server, which will require us to restart Splunk.  We will also need to turn off the Windows firewall.

Write-Host "$LabIndicator>>> Configuring SPLUNK"
Invoke-LabPostInstallActivity -ComputerName SPLUNK -UseCredSsp -ScriptBlock {
    # Start Splunk with accept-eula
    Write-Host "SPLUNK>>> Starting Splunk Instance"
    Set-Alias splunk "C:\Program Files\Splunk\bin\splunk.exe"
    splunk start --accept-license --answer-yes --no-prompt

    # Configure Forwarding on this box
    Write-Host "SPLUNK>>> Forwarding all events to SPLUNKIDX:9997"
    splunk add forward-server SPLUNKIDX:9997 -auth admin:P@ssw0rd

    # Restart splunk
    Write-Host "SPLUNK>>> Restarting"
    splunk restart

    # Disable the firewall
    Write-Host "SPLUNK>>> Disabling the Password"
    Set-NetFirewallProfile -Name domain -Enabled False
}

A quirk of Splunk is that certain things need to be done in a certain order – in this case, we need the Splunk indexer up to add the forward server, and we need the Splunk license master up to add the license server, and we need the indexer to be licensed to set up distributed search so we have to do two service restarts.  Here is the rest of the configuration.

Write-Host "$LabIndicator>>> Configuring SPLUNKIDX Licensing"
Invoke-LabPostInstallActivity -ComputerName SPLUNKIDX -UseCredSsp -ScriptBlock {
    # Set up license master on splunk (needs to be done after SPLUNK is set up)
    Write-Host "SPLUNKIDX>>> Setting up License Slave"
    Set-Alias splunk "C:\Program Files\Splunk\bin\splunk.exe"
    splunk edit licenser-localslave -master_uri https://SPLUNK:8089 -auth admin:P@ssw0rd

    # Restart splunk to install license
    Write-Host "SPLUNKIDX>>> Restarting"
    splunk restart
}

Write-Host "$LabIndicator>>> Configuring SPLUNK Distributed Search"
Invoke-LabPostInstallActivity -ComputerName SpLUNK -UseCredSsp -ScriptBlock {
    # Set up the distributed search peer
    Write-Host "SPLUNK>>> Setting up distributed search"
    Set-Alias splunk "C:\Program Files\Splunk\bin\splunk.exe"
    splunk add search-server -host SPLUNKIDX:8089 -auth admin:P@ssw0rd -remoteUsername admin -remotePassword P@ssw0rd
}

Our final step is to configure the Universal Forwarders.  I could have set up a deployment server in the InstallSplunkUF.ps1, but I wanted to keep those scripts very basic so I can do the remote configuration.  Well, now I’ve completed the splunk instance configuration I can continue on to the universal forwarder configuration.  All the other machines are universal forwarders so I can handle them in bulk:

# Configure the deloyment client and forward server on all Universal Forwarders
# The Where-Object explicitly removes SPLUNK and SPLUNKIDX from this!
Get-LabMachine -All | Where-Object { $_.Name -notlike "SPL*" } | Foreach-Object {
    Write-Host "$($_.Name)>>> Configuring Forwarding and Deployment"
    Invoke-LabPostInstallActivity -ComputerName $_.Name -UseCredSsp -ScriptBlock {
        Set-Alias splunk "C:\Program Files\SplunkUniversalForwarder\bin\splunk.exe"
        Write-Host "Configuring Forwarder"
        splunk add forward-server SPLUNKIDX:9997 -auth admin:changeme</code>

        Write-Host "Configuring Deployment Server"
        splunk set deploy-poll SPLUNK:8089 -auth admin:changeme

        Write-Host "Restarting Splunk UF"
        Restart-Service SplunkForwarder
    }
}

Wrapping up our script, let’s do a snapshot of the entire system:

Write-Host "$LabIndicator>>> Creating Snapshots"
Checkpoint-LabVM -All -SnapshotName "SetupComplete"

$end = Get-Date
Write-Host "$LabIndicator>>> Setting up the lab took $($end - $start)"

Of course, there is setup still to do for a fully functional Windows Infrastructure Lab.  However, this has the basics of everything installed for you.