Splunk to analyse Java logs and other machine data
Q. What is Splunk and where will you use it?
A. Splunk is an enterprise-grade software tool for collecting and analyzing “machine data” like log files, feed files, and other big data in terra bytes. You can upload logs from your websites and let Splunk index them, and produce reports with graphs to analyze the reports. This is very useful in capturing start and finish times from asynchronous processes to calculate elapsed times. For example, here are the basic steps required.
Step 1: log4j MDC logging can be used to output context based logs and then
Step 2: upload that into Splunk to index and produce elapsed times for monitoring application performance.
Q. What are the different ways to get data into Splunk?
- Uploading a log file via Splunk's web interface.
- Getting Splunk to monitor a local directory or file.
- Splunk can index data from any network port. For example, Splunk can index remote data from syslog-ng or any other application that transmits via TCP. Splunk can also receive and index SNMP events.
- Splunk also supports other kinds of data sources like FIFO queues and Scripted inputs to get data from APIs and other remote data interfaces and message queues. for example, here is a simple scripted script via input.conf file.
[script://$SCRIPT] <attrbute1> = <val1> <attrbute2> = <val2> ...
Here is an example of using Splunk to write query against log files to monitor performance.
Step 1: Configure the Java application to output log statements as shown below using MDC (Mapped Diagnostic Context) feature offered by log-back or log4j.
2013-05-31 19:36:03,617 INFO [Camel (MyApp) thread #24 - Multicast] c.j.w.a.c.s.i.MyAppForecastServiceImpl - [groupId=48937,jobType=CASH_FEED,ValuationDate=2013-01-04] - Total Time spent on group level cashforecast feed - 225 ms
Step 2: Upload the test-myapp.log file via the Splunk interface.
Step 3: Once the file is uploaded, you can write the search queries to your requirement.
The query shown above is
source="test4.aes.log" | search "Total Time" | search "group level cashforecast"
The Splunk search language is very powerful. Here is an extended search language.
source="test.aes.log" | search "Total Time" | search "group level cashforecast" | rex field=_raw "feed - (?
\d+)" | bucket _time span=5d | stats avg(timeTaken) as AvgDur, max(timeTaken) as MaxDur by _time