I use Splunk/SPL (Splunk Processing Language) nearly daily in my job and thought I'd provide a quick summary of the commands, functions, & features I use regularly to perform my searches. For full documentation/reference of Splunk search commands, visit this page.
First, thing's first. The most useful and empathetic language feature is the comment. In Splunk, and as of this post's publication date, you designate the start and end of a comment block with three backticks. Below, you can find the comment, "this will only output eight attributes from the lookup," enclosed with backticks.
``` this will only output eight attributes from the lookup ```
| inputlookup some_predefined_lookup
| table attribute1 attribute2 attribute3 attribute4 attribute5 attribute6 attribute7 attribute8
Before I discovered this trick, I had ran into a problem where I had created a very long query and wanted to reuse its results a second time to append the secondary results to the first set of results. I was essentially looking for a command that would perform a "jump" (similar to a jump statement would work in assembly) back to the start of the long query. I also didn't want to copy over the long query because doing so would decrease legibility.
The savedsearch command allows you to re-run a report in a new search. To solve my aforementioned issue, I found that saving the original (long) query as something like a base report and leveraging it in my "children" searches/reports allowed for more elegant re-use. Below is a snippet of how it is used.
| savedsearch base_report
| table attribute1 attribute2 attribute3 attribute4 attribute5 attribute6 attribute7 attribute8
The loadjob command performs similarly to the savedsearch. The difference is that the loadjob command does not re-run the report and instead loads cached results from a previously scheduled run of the report. This command is ideal for expensive searches. Below is a snippet of how it is used.
| loadjob base_report
| table attribute1 attribute2 attribute3 attribute4 attribute5 attribute6 attribute7 attribute8
When dealing with repetitive searches, it may be better to utilize macros. In Splunk, macros behave like modular functions. You can even specify if the macro requires an input (arguments).
To define a macro, you can navigate to Settings > Advanced search > Search macros, and click New Search Macro. When defining your macro, you can set additional behaviors if applicable (i.e. validation expressions, error messages, etc.). To use the macro in you searches, call the name of the macro enclosed with a single backtick on either side of it. Below is a snippet of how it is used (for this example, assume the macro takes one argument, "attribute9," as input).
index=myindex sourcetype=dasmytype
| `mymacro(attribute9)`
| table attribute1 attribute2 attribute3 attribute4 attribute5 attribute6 attribute7 attribute8
With large datasets, succintly counting values against a condition can be a useful trick. When outputting data with the stats command, instead of using two lines of SPL to enumerate data (eval piped to a count), you can combine them as nested functions.
Take for example a lookup file, myfoods, containing an inventory of foods in my refridgerator. Assume the inventory is actually a large dataset and contains many attributes, but for the sake of this example, we are only interested in these attributes: foodname, foodtype (our only values are "FRUIT" and "VEGETABLE"), and food_freshness (our only values are "VERY FRESH," "FRESH," and "SPOILED").
Now we want to organize the dataset to get simple statistics about these three attributes. In SPL, the top search can be re-written as the bottom search to get your desired results more succinctly:
| inputlookup myfoods
| eval very_fresh_fruits = food_freshness = "VERY FRESH" AND food_type = "FRUIT"
| eval fresh_fruits = food_freshness = "FRESH" AND food_type = "FRUIT"
| eval spoiled_fruits = food_freshness = "SPOILED" AND food_type = "FRUIT"
| stats count(very_fresh_fruits) as very_fresh_fruits_count, count(fresh_fruits) as fresh_fruits_count, count(spoiled_fruits) as spoiled_fruits_count
| table food_name very_fresh_fruits fresh_fruits spoiled_fruits
| inputlookup myfoods
| stats count(food_freshness = "VERY FRESH" AND food_type = "FRUIT") as very_fresh_fruits_count, count(food_freshness = "FRESH" AND food_type = "FRUIT") as fresh_fruits_count, count(food_freshness = "SPOILED" AND food_type = "FRUIT") as spoiled_fruits_count
| table food_name very_fresh_fruits fresh_fruits spoiled_fruits
It's not uncommon when examing large datasets of logs to encounter many timestamps. Sometimes I deal with 5-10+ timestamp fields that are subject to the same data massaging techniques. Instead of applying such techniques one by one, using for-loops quickly takes care of each one without writing repetitive steps.
Take for example a lookup file, mylogofmanytimestamps, containing 20 timestamp fields (for whatever reason). We want to change the format of the timestamps from one ("%d-%m-%Y") to another ("%m/%d/%Y"). For this to work, assume that each timestamp field is named so that its suffix is "_timestamp" - so three timestamp fields, for example, are created_timestamp, sleep_timestamp, and destroyed_timestamp.
The for-loop to reformat the timestamps collectively can be written as such:
| inputlookup mylogofmanytimestamps
| foreach *_timestamp
[ eval <<FIELD>> = strptime(<<FIELD>>, "%m/%d/%Y")]
| table *
In this query, the asterisk is use to capture any permutation of characters, and <<FIELD>> is a placeholder for the timestamp variable used in each loop.
Sometimes in my reports, I like to combine multiple small searches as one. To accomplish this, I usually use the appendcols command.
Take for example two reports, report1 and report2. Each of these reports contains unique fields that neither of them share - report1 has attribute1 and attribute2, while report2 has attribute3 and attribute4 - and I want to combine the datasets as one to enumerate attribute1, attribute2, attribute3, and attribute4. Below is a snippet of how I can accomplish this using appendcols.
| savedsearch report1
| appendcols [| savedsearch report2 ]
| table *
As you can imagine, you do not have to use saved searches/reports to use appendcols. Searches and subsearches can be used; I just used reports to demonstrate the command. The append command behaves similarly, except the results are appended as a row and not a column, so they will appear below the parent search. The only warning to using append is that Splunk will create new columns if the parent and child searches do not already share the same attributes, and stylistically, it just does not look very nice.
to be continued because im lazy right now, and i want to watch Dune, but the function is mvindex()... read more in the official documentation, brb
also to be continued because this could be a big ass section and i want to watch Dune, but read more in the offiical documentation.... brb
Sometimes when I use append to bring two datasets together, and I want to utilize a line break, I do the following (there's probably a more elegant way to utilize line breaks, but I'm not pressed to really discover them right now):
| savedsearch samplereport1
| table attribute1 attribute2 attribute3
| append [ search | head 1 | eval attribute1="------------------------------ lol ------------------------------"]
| table attribute1 attribute2 attribute3
| append [|savedsearch samplereport2 | table attribute1 attribute2 attribute3 ]
There was a rare time I had to show the same attribute twice (verbatim) in a single report, and Splunk does not allow that for obvious reasons (you cannot attribute two values to a single field if the record is unique). To accomplish that, I duplicated and renamed the same field using a whitespace, and reassigned the value back to the new field. It's a very weird scenario, but anyway, here is what I did:
index=myindex sourcetype=dasmytype
| eval attribute1_ = attribute1, attribute1__ = attribute1
| reanme attribute1_ as "attribute1 ", attribute1__ as "attribute1 "
| table attribute1 "attribute1 " "attribute1 "