,

Splunk interview questions and answer

Posted by

Splunk interview questions and answer

What is Splunk? 

Splunk is basically a software platform that provides users with the ability to access, analyze, and visualize machine-generated data from multiple sources, including hardware devices, networks, servers, IoT devices, and other sources. The machine data is analyzed and processed, and subsequently transformed into powerful operational intelligence that offers some real-time insight. It is widely used to search, visualize, monitor, and report enterprise data. Among Splunk’s many capabilities are application management, security and compliance, business and web analytics, etc.

Splunk relies on its indexes to store all of its data, so it does not need any database to do so. Splunk gathered all relevant information into one central index, making it easy to search for specific data within a massive amount of data. Moreover, machine data is extremely important for monitoring, understanding, and optimizing machine performance. 

In this article, we have created a list of Splunk interview questions and answers for Freshers and Experienced candidates so you can prepare for your job interview and move closer to achieving your dream career. 

 What is Splunk used for?

In general, machine data is difficult to understand and has an unstructured format (not arranged as per pre-defined data model), making it unsuitable for analysis and visualization of data. Splunk is the perfect tool for tackling such problems. Splunk is used to analyze machine data for several reasons:   

  • Provides business insights: The Splunk platform analyzes machine data for patterns and trends, providing operational insights that assist businesses in making smarter decisions for the profitability of the organization.
  • Enhances operational visibility: Splunk obtains a comprehensive view of overall operations based on machine data and then aggregates it across the entire infrastructure.
  • Ensures proactive monitoring: Splunk employs a real-time analysis of machine data to discover system errors and vulnerabilities (external/internal breaches and intrusions).
  • Search and Investigation: Splunk uses machine data to pinpoint and fix problems by correlating events across numerous data sources and detecting patterns in large datasets.

Can you explain how Splunk works?

In order to use Splunk in your infrastructure, you must understand how Splunk performs on the internal level. In general, Splunk processes data in three stages:

  • Data Input Stage: This stage involves Splunk consuming raw data not from a single, but from many sources, breaking it up into 64K blocks, and annotating each block with metadata keys. A metadata key includes the hostname, source, and source type of the data.
  • Data Storage Stage: In this stage, two different phases are performed, Parsing and Indexing.
    • In the Parsing phase, Splunk analyzes the data, transforms it, and extracts only the relevant information. This is also called “event processing,” since it breaks down the data sets into different events.
    • During the indexing phase, Splunk software writes the parsed events into the index queue. One of the main benefits of using this is to make sure the data is easily accessible for anyone during the search.
  • Data Searching Stage: This stage usually controls how the index data is accessed, viewed, and used by the user. Reports, event types, dashboards, visualization, alerts, and other knowledge objects can be created based on the user’s reporting requirements.

What are the main components of Splunk Architecture?

As shown below, Splunk Architecture is composed of three main components:

  • Splunk Forwarder: These are components that you use to collect machine data/logs. This is responsible for gathering and forwarding real-time data with less processing power to Indexer.  Splunk forwarder performs cleansing of data depending on the type of forwarder used (Universal or Heavy forwarder).
  • Splunk Indexer: The indexer allows you to index i.e., transform raw data into events and then store the results data coming from the forwarder. Incoming data is processed by the indexer in real-time. Forwarder transforms data into events and stores them in indexes to enable search operations to be performed efficiently.
  • Search Head: This component is used to interact with Splunk. It lets users perform various operations like performing queries, analysis, etc., on stored data through a graphical user interface. Users can perform searches, analyze data, and report results.

Write different types of Splunk forwarder.

A forwarder is a Splunk instance or agent you deploy on IT systems, which collects machine logs and sends them to the indexer. You can choose between two types of forwarders:  

  • Universal Forwarder: A universal forwarder is ideal for sending raw data collected at the source to an indexer without any prior processing. Basically, it’s a component that performs minimal processing before forwarding incoming data streams to an indexer. Although it is faster, it also results in a lot of unnecessary information being forwarded to the indexer, which will result in higher performance overhead for the indexer.
  • Heavy Forwarder: You can eliminate half of your problems using a heavy forwarder since one level of data processing happens at the source before forwarding the data to the indexer. Parsing and indexing take place on the source machine and only data events that are parsed are sent to the indexer.

What are the advantages of getting data into a Splunk instance through Forwarders?

Data entering into Splunk instances via forwarders has many advantages including bandwidth throttling, a TCP connection, and an encrypted SSL connection between the forwarder and indexer. By default, data forwarded to the indexers are also load-balanced, and if one indexer goes down for any reason, that data can always be routed to another indexer instance in a very short amount of time. Furthermore, the forwarder stores the data events locally before forwarding them, creating a temporary backup of the data.

What do you mean by Splunk Dashboards and write its type?

In a dashboard, tables, charts, event lists, etc., are used to represent data visualizations, and they do so by using panels. Dashboard panels present or display chart data, table data, or summarized data visually in a pleasing manner. On the same dashboard, we can add multiple panels, and therefore multiple reports and charts. Splunk is a popular data platform with lots of customization options and dashboard options. 

There are three kinds of the dashboard you can create with Splunk: 

  • Dynamic form-based dashboards: They allow Splunk users to change the dashboard data based on values entered in input fields without leaving the page. A dashboard can be customized by adding input fields (such as time, radio buttons, text boxes, checkboxes, dropdowns, and so on) that change the data, depending on the selection made. Dashboards of this type are useful for troubleshooting issues and analyzing data.
  • Static Real-time Dashboards: They are often displayed on a large screen for constant viewing. It also provides alerts and indicators to prompt quick responses from relevant personnel.
  • Scheduled Dashboards: These dashboards can be downloaded as PDF files and shared with team members at predetermined intervals. There are times when active live dashboards can only be viewed by certain viewers/users only.

Some of the Splunk dashboard examples include security analytics dashboard, patient treatment flow dashboard, eCommerce website monitoring dashboard, exercise tracking dashboard, runner data dashboard, etc. 

Explain Splunk Query.

Splunk queries allow specific operations to be run on machine-generated data. Splunk queries communicate with a database or source of data by using SPL (Search Processing Language). This language contains many functions, arguments, commands, etc., that can be used to extract desired information from machine-generated data. This makes it possible for users to analyze their data by running queries. Similar to SQL, it allows users to update, query, and change data in databases.

It is primarily used to analyze log files and extract reference information from machine-generated data. In particular, it is beneficial to companies that have a variety of data sources and need to process and analyze them simultaneously in order to produce real-time results. 

What are different types of Splunk License?

A license is required for each Splunk instance. With Splunk, you receive a license that specifies which features you can use and how much data can be indexed. Various Splunk License types include:

  • The Splunk Enterprise license: Among all Splunk license types, Enterprise licenses are the most popular. These licenses give users access to all the features of Splunk Enterprise within a specified limit of indexed data or vCPU usage per day. These licenses include enterprise features such as authentication and distributed search. Several types of Splunk Enterprise licenses are available, including the Splunk for Industrial IoT license and Splunk Enterprise Trial license.
  • The Free license: Under the Free license, Splunk Enterprise is completely free to use with limited functionality. Some features are not available under this license, such as authentication. Only a limited amount of data can be indexed.
  • The Forwarder license: A Forwarder license enables unlimited forwarding of data, as well as a subset of the Splunk Enterprise features that are required for authentication, configuration management, and sending data.
  • The Beta license: Each Splunk beta release requires a separate beta license, which cannot be used with other Splunk releases. With a beta license, Splunk Enterprise features are enabled for a specific beta release period.

What is the importance of License Master in Splunk? If the License Master is not reachable, what will happen?

It is the responsibility of the license master in Splunk to ensure that the limited amount of data is indexed. Since each Splunk license is based on the amount of data that is coming into the platform in 24 hours, it is essential to keep the environment within the limits of its purchased volume.  

Whenever the license master becomes unavailable, it is simply impossible to search the data. Therefore, only searching remains halted while the indexing of data continues. Data entering the Indexer won’t be impacted. Your Splunk deployment will continue to receive data, and the Indexers will continue to index the data as usual. However, upon exceeding the indexing volume, you will receive a warning message on top of your Search head or web interface so that you can either reduce your data intake or purchase a larger capacity license. 

Explain License violation. How will you handle or troubleshoot a license violation warning?

License violations occur after a series of license warnings, and license warnings occur when your daily indexing volume exceeds the license’s limit. Getting multiple license warnings and exceeding the maximum warning limit for your license will result in a license violation. With a Splunk commercial license, users can receive five warnings within a 30-day period before Indexer stops triggering search results and reports. Users of the free version, however, will only receive three warnings. 

Avoid License Warning:

  • Monitor your license usage over time and ensure that you have enough license volume to meet your daily needs.
  • Viewing the license usage report in the license master can help troubleshoot index volume.
  • In the monitoring console, set up an alert to track daily license usage.

Troubleshoot License Violation Warning:

  • Determine which index/source type recently received more data than usual.
  • Splunk Master license pool-wise quotas can be checked to identify the pool for which the violation occurred.
  • Once we know which pool is receiving more data, then we need to determine which source type is likely to be receiving more than normal data.
  • Having identified the source type, the next step is to find out which machine is sending so many logs and the reason behind it.
  • We can then troubleshoot the problem accordingly.

Write down some common Splunk ports.

The following are common ports used by Splunk: 

  • Web Port: 8000 
  • Management Port: 8089 
  • Network port: 514 
  • Index Replication Port: 8080 
  • Indexing Port: 9997 
  • KV store: 8191

Explain Splunk Database (DB) Connect.

Splunk Database (DB) Connect is a general-purpose SQL (Structured Query Language) database extension/plugin for Splunk that permits easy integration between database information and Splunk queries/reports. Splunk DB Connect is effectively used to combine structured data from databases with unstructured machine data, and Splunk Enterprise can then be used to uncover insights from the combined data. 

Some of the benefits of using Splunk Database Connect connect are as follows:   

  • By using Splunk DB Connect, you are adding new data inputs for Splunk Enterprise, i.e., adding additional sources of data to Splunk Enterprise. Splunk DB Connect lets you import your database tables, rows, and columns directly into Splunk Enterprise, which then indexes them. Once that relational data is within Splunk Enterprise, you can analyze and visualize it the same way you would any other Splunk Enterprise data.
  • In addition, Splunk DB Connect enables you to write your Splunk Enterprise data back to your relational databases.
  • With DB Connect, you can reference fields from an external database that match fields in your event data, using the Database Lookup feature. This way, you can enrich your event data with more meaningful information.

What are different versions of the Splunk product?

Splunk products come in three different versions as follows:  

  • Splunk Enterprise: A number of IT companies use Splunk Enterprise. This software analyzes data from diverse websites, applications, devices, sensors, etc. Data from your IT or business infrastructure can be searched, analyzed, and visualized using this program.
  • Splunk Cloud: It is basically a SaaS (Software as a Service) offering many of the same features as enterprise versions, including APIs, SDKs, etc. User logins, lost passwords, failed login attempts, and server restarts can all be tracked and sorted.
  • Splunk Light: This is a free version of Splunk which allows you to view, search, and edit your log data. This version has fewer capabilities and features than other versions.

Name some of the features that are not available in the Splunk free version.

The free version of Splunk lacks the following features:  

  • Distributed searching
  • Forwarding of data through HTTP or TCP (to non-Splunk)
  • Agile reporting and statistics based on a real-time architecture
  • Scheduled searches/alerts and authentication
  • Managing deployments.

Explain Splunk alerts and write about different options available while setting up alerts.

Splunk alerts are actions that get triggered when a specific criterion is met; these conditions are defined by the user. You can use Splunk Alerts to be notified whenever anything goes awry with your system. For instance, the user can set up Alerts so that an email notification will be sent to the admin when three unsuccessful login attempts are made within 24 hours. 

The following options are available when setting up alerts:  

  • A webhook can be created to send messages to Hipchat or Github. With this email, you can send a message to a group of machines along with a subject, priority, and message body.
  • Results can be attached as .csv files, pdf files, or inline with the message body to ensure the recipient understands what alerts have been fired, at what conditions, and what actions have been taken.
  • You can also create tickets and control alerts based on conditions such as an IP address or machine name. As an example, if a virus outbreak occurs, you do not want every alert to be triggered as it will create a lot of tickets in your system, which will be overwhelming. Such alerts can be controlled from the alert window.

What do you mean by Summary Index in Splunk?

Summary indexes store analyses, reports, and summaries computed by Splunk. This is an inexpensive and fast way to run a query for a long period of time. Essentially, it is the default index that Splunk Enterprise uses if there isn’t another one specified by the user. Among the key features of the Summary Index is that you can retain the analytics and reports even after the data has gotten older.

What is the way to exclude certain events from being indexed by Splunk?

In the case where you do not wish to index all of your events in Splunk, what can you do to prevent the entry of those events into Splunk? Debug messages are a good example of this in your application development cycle.  

Such debug messages can be excluded by putting them in the null queue. This is achieved by specifying a regex that matches the necessary events and sending the rest to the NULL queue. Null queues are defined at the forwarder level in transforms.conf. Below is an example that drops all events except those containing the debug message. 

In props.conf  

[source::/var/log/foo] 

#By applying transforms in this order  

#events will be dropped to the floor  

#before being routed to the index processor 

TRANSFORMS-set = setnull, setparsing 

In transforms.conf  

[setnull] 

REGEX = . 

DEST_KEY = queue 

FORMAT = nullQueue 

[setparsing] 

REGEX = debugmessage 

DEST_KEY = queue 

FORMAT = indexQueue

18. Write the commands used to start/stop the Splunk service.

The following commands can be used to start and stop Splunk services: 

  • Start Splunk service ./splunk start
  • Stop Splunk service ./splunk stop
  • Restart Splunk service ./splunk restart

What is the importance of time zone property in Splunk?

A time zone is a crucial factor to consider when searching for events from a fraud or security perspective. This is because Splunk uses the time zone defined by your browser. Your browser then picks up the time zone associated with the machine/computer system you’re working on. So, you will not be able to find your desired event if you search for it in the wrong time zone. The timezone is picked up by Splunk when data is entered, and it is particularly important when you are searching and comparing data from different sources. You can, for instance, look for events coming in at 4:00 PM IST, for your London data centre, or for your Singapore data centre, etc. The timezone property is therefore vital when correlating such events. 

State difference between Splunk app and add-on.

Generally, Splunk applications and add-ons are separate entities, but both have the same extension, i.e., SPL files. 

  • Splunk Apps: A Splunk app extends Splunk functionality with its own inbuilt user interface. Each of these apps are separate and serves a specific purpose. Each Splunk app consists of a collection of Splunk knowledge objects (lookups, tags, saved searches, event types, etc). They can also make use of other Splunk apps or add-ons. Multiple apps can be run simultaneously in Splunk. Several apps offer the option of restricting or limiting the amount of information a user can access. By controlling access levels, the user has access to only the information that is necessary for him and not the rest. You can open apps from the Splunk Enterprise homepage or through the App menu or in the Apps section of the Settings page. 
    Example: Splunk Enterprise Security App, etc.
  • Splunk Add-on: These are types of applications that are built on top of the Splunk platform that add features and functionality to other apps, such as allowing users to import data, map data, save searches, macros. Add-ons typically do not run as standalone apps, rather they are reusable components that support other apps in different scenarios. Most of the time, it is used as a framework, where a team leverages its functionality to some extent and creates something new on top of it. As a rule, they do not have navigable user interfaces. You cannot open an Add-on from the Splunk Enterprise homepage or app menu. 
    Examples: Splunk Add-on for Checkpoint OPSEC LEA, Splunk Add-on for EMC VNX or the Splunk Common Information Model Add-on.

Mention some important configuration files in Slunk.

Configuration files that are of the utmost importance in Splunk are: 

  • Props.conf: It configures indexing properties, such as timezone offset, pattern collision priority, custom source type rules, etc.
  • Indexes.conf: It configures and manages index settings.
  • Inputs.conf: It is used to set up data inputs.
  • Transforms.conf: It can be used to configure regex transformations to be performed on data inputs.
  • Server.conf: There are a variety of settings available for configuring the overall state of the Splunk Enterprise instance.

What are Splunk commands and list out some of the basic Splunk commands?

Many Splunk commands are available, including those related to searching, correlation, data or indexing, and identifying specific fields. Following are some of the basic Splunk commands: 

  • Accum: Maintains a running total of a numeric field.
  • Bucketdir: Replaces a field value with a higher-level grouping, just like replacing filenames with directories.
  • Chart: Provides results in a tabular format for charting.
  • Timechart: Creates a time series chart and the corresponding statistics table.
  • Rare: Displays the values that are least common in a field.
  • Cluster: Groups/clusters similar events together.
  • Delta: Calculates the difference between two search results.
  • Eval: Calculates the expression and stores the result in a field.
  • Gauge: Converts the output result into a format compatible with gauge chart types.
  • K-means: Perform K-means clustering for selected fields.
  • Top: Shows/displays the most common values of a field that are mostly used.

Name a few important Splunk search commands

Splunk provides the following search commands: 

  • Abstract: It provides a brief summary of the text of the search results. It replaces the original text with the summary.
  • Addtotals: It sums up all the numerical fields for each result.  You can see the results under the Statistics tab. Rather than calculating every numeric field, you can specify a list of fields whose sum you want to compute.
  • Accum: It calculates a running total of a numeric field. This accumulated sum can be returned to the same field, or to a new field specified by you.
  • Filldown: It will generally replace NULL values with the last non-NULL value for the field or set of fields. Filldown will be applied to all fields if there is no list of fields given.
  • Typer: It basically calculates the eventtype field for search results matching a specific/known event type.
  • Rename: It renames the specified field. Multiple fields can be specified using wildcards.
  • Anomalies: It computes the “unexpectedness” score for a given event. The anomalies command can be used to identify events or field values that are unusual or unexpected.

State difference between stats vs eventstats command.

  • Stats: The Stats command in Splunk calculates statistics for every field present in your events (search results) and stores these values in newly created fields.
  • Eventstats: Similar to the stats command, this calculates a statistical result. While the Eventstats command is similar to the Stats command, it adds the aggregate results inline to each event (if only the aggregate is relevant to that event).

Name the commands included in the “filtering results” category.

Below are the commands included in the “filtering results” category:

  • Search: This command retrieves events from indexes or filters the results of the previous search command. Events can be retrieved from your indexes by using keywords, wildcards, quoted phrases, and key/value expressions.
  • Sort: The search results are sorted based on the fields that are specified. The results can be sorted in reverse, ascending, or descending order. When sorting, the results can also be limited.
  • Where: The ‘where’ command, however, filters search results using ‘eval’ expressions. When the ‘search’ command is used, it retains only those search results for which an evaluation was successful, while the ‘where’ command enables a deeper investigation of those search results. By using a ‘search’ command, one can determine the number of active nodes, but the ‘where’ command will provide a matching condition of an active node that is running a specific application.
  • Rex: You can extract specific fields or data from your events using the ‘rex’ command. For instance, when you want to determine specific fields in an email id, like [email protected], you can use the ‘rex’ command. This will distinguish scaler as the user ID, Techclick.co as the domain, and Techclick as the company. Rex allows you to slice, split, and break down your events however you like.

What do you mean by the Lookup command? State difference between Inputlookup and Outputlookup commands.

Splunk lookup commands can be used to retrieve specific fields from an external file (e.g., Python script, CSV file, etc.) to get the value of an event. 

  • Inputlookup: Inputlookup can be used to search the contents of a lookup table (CSV lookup or a KV store lookup). It is used to take input. This command, for instance, could take the product price or product name as input and match it with an internal field like the product ID.
  • Outputlookup: Conversely, the outputlookup command outputs search results to a specified lookup table, i.e., it places a search result into a specific lookup table.

 Explain what is Splunk Btool.

The btool command-line tool can be used to figure out what settings are set on a Splunk Enterprise instance, as well as to see where those settings are configured. Using the Btool command, we can troubleshoot configuration file issues. 

Conf files, also called Splunk software configuration files, are loaded and merged together to create a functional set of configurations that can be used by Splunk software when executing tasks. Conf files can be placed/found in many different folders under the Splunk installation. Using the on-disk conf files, Btool simulates the merging process and creates a report displaying the merged settings.

What do you mean by File precedence in Splunk?

A developer, administrator, and architect all have to consider file precedence when troubleshooting Splunk. All Splunk configurations are saved in plain text .conf files. Almost every aspect of Splunk’s behaviour is determined by configuration files. There can be multiple copies of the same configuration file in a Splunk platform deployment. In most cases, these file copies are layered in directories that might affect users, applications, or the overall system. If you want to modify configuration files, you must know how the Splunk software evaluates those files and which ones have precedence when the Splunk software runs or is restarted.

Splunk software considers the context of each configuration file when determining the order of directories to prioritize configuration files. Configuration files can either be operated in a global context or in the context of the current application/user. 

Directory priority descends as follows when the file context is global:   

  • System local directory — highest priority  ->
  • Application local directories  ->
  • Application default directories  ->
  • System default directory — lowest priority

Directory priority descends from user to application to system when file context is current application/user:

  • User directories for the current user — highest priority   ->
  • Application directories for the currently running application (local, followed by default)  ->
  • Application directories for all the other applications (local, followed by default) — for exported settings only ->
  • System directories (local, followed by default) — lowest priority

State difference between ELK and Splunk.

IT Operations professionals are familiar with Splunk and ELK (ElasticSearch, LogStash, and Kibana), two of the most widely used tools in the area of Operational Data Analytics. 

ELK vs Splunk –

ELK Splunk 
ELK is a powerful open-source enterprise platform that combines ElasticSearch, LogStash, and Kibana for searching, visualizing, monitoring, and analyzing machine data. The Splunk product is a closed-source tool for searching, visualizing, monitoring, and analyzing machine data. 
The elasticsearch tool integrates with Logstash and Kibana to operate similarly to Splunk. Additionally, it can also be integrated with many other tools, such as Datadog, Amazon, Couchbase, Elasticsearch Services, and Contentful, etc.  Additionally, Splunk integrates with several other tools, including Google Anthos, OverOps, Wazuh, PagerDuty, Amazon Guard Duty, etc. 
Some of the largest companies worldwide use ElasticStack to store, analyze, search and visualize data, including Uber, Stack Overflow, Udemy, Shopify, Instacart, and Slank, etc.In contrast, Splunk is used by a range of companies, including Starbucks, Craftbase, Intuit, SendGrid, Yelp, Rent the Runway, and Blend.  
Wizards and features are not pre-loaded in Elasticsearch. Even so, it doesn’t have an interactive user interface, so users must install a plugin or use Kibana with it.  It comes preloaded with wizards and features that are easy and reliable to use. They allow managers to manage resources efficiently.   
The ELK stack includes Kibana for visualization. Additionally, Kibana offers the same visualization features as Splunk Web UI, such as line charts, tables, etc., that can be presented on a dashboard.Splunk Web UI comes with flexible controls that you can use to edit, add, and remove components to your dashboard. XML (Extensible Markup Language) can even be used to customize the application and visualization components on mobile devices. 

 Explain what is Dispatch Directory.

A directory is included in the Dispatch Directory for each search that is running or has been completed. 

The Dispatch Directory is configured as follows: 

$SPLUNK_HOME/var/run/splunk/dispatch 

Take the example of a directory named 14333208943.348. This directory includes a CSV file of all search results, a search.log containing details/information about the search execution, as well as other pertinent information. You can delete this directory within 10 minutes after the search is completed using the default configuration. Search results are deleted after seven days if you have saved them. 

 State difference between Search head pooling and Search head clustering.

Splunk Enterprise instances, also called search heads, distribute search requests to other instances called search peers, that performs the actual data searching and indexing. Results are merged and returned to the user by the search head. You can implement Distributed Search using Search head pooling or Search head clustering in your Splunk deployment. 

  • Search head pooling: Pooling refers to sharing resources in this context. It uses shared storage for configuring multiple search heads to share user data and configuration. Quite simply, it allows you to have multiple search heads so they share user data and configuration. Multiplying search heads facilitate horizontal scaling when a lot of users are searching the same data.
  • Search head clustering: In Splunk Enterprise, a search head cluster is a collection of search heads that are used as a centralized resource for searching. All members of the cluster can access and run the same searches, dashboards, and search results.

What do you mean by SF (Search Factor) and RF (Replication Factor)?

SF (Search Factor) & RF (Replication Factor) are terms associated with Clustering techniques i.e., Search head clustering & Indexer clustering. 

  • Search Factor: It is only associated with indexer clustering. It determines how many searchable copies of data the indexing cluster maintains. By default, the value of the search factor is 2.
  • Replication Factor: It is associated with both Search head clustering & Indexer clustering. In the case of the indexer cluster, replication factor determines the number of copies of the data that an indexer cluster maintains, while in the case of the search head cluster, replication factor determines the minimum number of copies of the search artefacts that a search head cluster maintains. For the replication factor, the default value is 3.

Explain what is a fish bucket and fish bucket index.

Essentially, Splunk Fishbucket is a subdirectory within Splunk that is used to monitor and track the extent to which the content of a file has been indexed within Splunk. For this feature, there are two types of contents: seek pointers and CRCs (cyclic redundancy checks).  

The default location of the fish bucket subdirectory is: /opt/splunk/var/lib/splunk.

You can find it through the GUI (Graphical User Interface) by searching for: index=_thefishbucket.

What do you mean by buckets? Explain Splunk bucket lifecycle?

A bucket is a directory in which Splunk stores index data. Each bucket contains data events in a particular time frame. As data ages, buckets move through different stages as given below: 

  • Hot bucket: Newly indexed data is present in a ​​hot bucket. Every index contains one or more hot buckets, and every index is open for writing.
  • Warm bucket: This bucket contains data that has been rolled or pulled out of the hot bucket. The warm buckets are numerous.
  • Cold bucket: This bucket contains data that has been rolled or pulled out of the warm bucket. The cold buckets are numerous.
  • Frozen bucket: This bucket contains data that has been rolled or pulled out of the cold bucket. By default, the indexer removes frozen data, but we can archive it.

Buckets are by default located in:$SPLUNK_HOME/var/lib/splunk/defaultdb/db.

Explain how will you set default search time in Splunk 6.

Using ‘ui-prefs.conf’ in Splunk 6, we can specify the default search time. If we set the value as follows, all users would see it as the default setting: $SPLUNK_HOME/etc/system/local

For example, if our $SPLUNK_HOME/etc/system/local/ui-prefs.conf file Includes 

[search]

dispatch.earliest_time = @d

dispatch.latest_time = now

The default time range that will appear to all users in the search app is today.

What is the best way to clear Splunk’s search history?

The following file on the Splunk server needs to be deleted in order to clear Splunk search history: $splunk_home/var/log/splunk/searches.log.

How to reset Splunk Admin (Administrator) password?

Depending on your Splunk version, you can reset the Admin password.   In case you have Splunk 7.1 and higher version, then you need to follow these steps: 

  • Splunk Enterprise must be stopped first.
  • Find and rename ‘passwd’ file to ‘passwd.bk’.
  • In the below directory, create a file named ‘user-seed.conf’:

$SPLUNK_HOME/etc/system/local/

  • Enter the following command in the file. ‘NEW_PASSWORD’ will be replaced by our own new password here.

[user_info] 

PASSWORD = NEW_PASSWORD

  • Restart Splunk Enterprise and log in with the new password again.

If you’re using a version prior to 7.1, you need to follow these steps: 

  • Splunk Enterprise must be stopped first.
  • Find and rename ‘passwd’ file to ‘passwd.bk’.
  • Use the default credentials of admin/changeme to log in to Splunk Enterprise.
  • If you’re asked to change your admin username and password, just follow the instructions.

Explain how Splunk avoids duplicate indexing of logs.

Essentially, Splunk Fishbucket is a subdirectory within Splunk that is used to monitor and track the extent to which the content of a file has been indexed within Splunk. 

The default location of the fish bucket subdirectory is: /opt/splunk/var/lib/splunk

It generally includes seeking pointers and CRCs (cyclic redundancy checks) for the files we are indexing so that Splunk knows whether it has already read them. 

Name the commands used to restart Splunk Web Server and Splunk Daemon.

In order to restart the Splunk Web Server, we need to use the following command: splunk start splunkweb.

In order to restart the Splunk Daemon, we need to use the following command: splunk start splunkd.

Name the commands used to enable and disable Splunk boot start.

In order to enable Splunk boot-start, we need to use the following command: $SPLUNK_HOME/bin/splunk enable boot-start

In order to disable Splunk boot-start, we need to use the following command: $SPLUNK_HOME/bin/splunk disable boot-start

1. Compare Splunk with Spark.

CriteriaSplunkSpark
Deployment areaCollecting large amounts of machine-generated dataIterative applications and in-memory processing
Nature of toolProprietaryOpen-source
Working modeStreaming modeBoth streaming and batch modes

What is Splunk?

Splunk is ‘Google’ for our machine-generated data. It’s a software/engine that can be used for searching, visualizing, monitoring, reporting, etc. of our enterprise data. Splunk takes valuable machine data and turns it into powerful operational intelligence by providing real-time insights into our data through charts, alerts, reports, etc.

What are the common port numbers used by Splunk?

Below are the common port numbers used by Splunk. However, we can change them if required.

ServicePort Number Used
Splunk Web port8000
Splunk Management port8089
Splunk Indexing port9997
Splunk Index Replication port8080
Splunk Network port514 (Used to get data from the Network port, i.e., UDP data)
KV Store8191

What are the components of Splunk? Explain Splunk architecture.

This is one of the most frequently asked Splunk interview questions. Below are the components of Splunk:

  • Search Head: Provides the GUI for searching
  • Indexer: Indexes the machine data
  • Forwarder: Forwards logs to the Indexer.
  • Deployment Server: Manages Splunk components in a distributed environment.

Which is the latest Splunk version in use?

Splunk 8.2.1 (as of June 21, 2021)

What is a Splunk indexer? What are the stages of Splunk indexing?

A Splunk indexer is the Splunk Enterprise component that creates and manages indexes. The primary functions of an indexer are mentioned below:

  • Indexing incoming data
  • Searching the indexed data
  • Picture

Bottom of Form

What is a Splunk forwarder? What are the types of Splunk forwarders?

There are two types of Splunk forwarders, which are mentioned below:

  • Universal Forwarder (UF): the Splunk agent installed on a non-Splunk system to gather data locally; it can’t parse or index data.
  • Heavyweight Forwarder (HWF): A full instance of Splunk with advanced functionalities.

It generally works as a remote collector, intermediate forwarder, and possible data filter, and since it parses data, it is not recommended for production systems.

 Can you name a few most important configuration files in Splunk?

  • props.conf
  • indexes.conf
  • inputs.conf
  • transforms.conf
  • server.conf

What are the types of Splunk Licenses?

  • Enterprise license
  • Free license
  • Forwarder license
  • Beta license
  • Licenses for search heads (for distributed search)
  • Licenses for cluster members (for index replication)

What is the Splunk app?

The Splunk app is a container or directory of configurations, searches, dashboards, etc. in Splunk.

Where is the Splunk default configuration stored?

$splunkhome/etc/system/default

What are the features not available in Splunk Free?

Splunk Free does not include below features:

  • Authentication and scheduled searches/alerting
  • Distributed search
  • Forwarding in TCP/HTTP (to non-Splunk)
  • Deployment management

What happens if the license master is unreachable?

If the license master is not available, the license slave will start a 24-hour timer, after which the search will be blocked on the license slave (though indexing continues). However, users will not be able to search for data in that slave until it can reach the license master again.

What is a summary index in Splunk?

A summary index is the default Splunk index (the index that Splunk Enterprise uses if we do not indicate another one).

If we plan to run a variety of summary index reports, we may need to create additional summary indexes.

What is Splunk DB Connect?

Splunk DB Connect is a generic SQL database plugin for Splunk that allows us to easily integrate database information with Splunk queries and reports.

  •  

Intermediate Interview Questions

Can you write down a general regular expression for extracting the IP address from logs?

There are multiple ways in which we can extract the IP address from logs. Below are a few examples:

By using a regular expression:

rex field=_raw  “(?<ip_address>\d+\.\d+\.\d+\.\d+)”

OR

rex field=_raw  “(?<ip_address>([0-9]{1,3}[\.]){3}[0-9]{1,3})”

Explain Stats vs Transaction commands.

This is another frequently asked interview question on Splunk that will test the developer’s or engineer’s knowledge. The transaction command is most useful in the following two specific cases:

  • When the unique ID (from one or more fields) alone is not sufficient to discriminate between two transactions. This is the case when the identifier is reused, for example, in web sessions identified by a cookie or client IP. In this case, the time span or pauses are also used to segment the data into transactions.
  • When an identifier is reused, say, in DHCP logs, a particular message identifies the beginning or end of a transaction.
  • When it is desirable to see the raw text of events combined rather than an analysis of the constituent fields of the events.

In other cases, it’s usually better to use stats.

  • As the performance of the stats command is higher, it can be used, especially in a distributed search environment.
  •  

If there is a unique ID, the stats command can be used

How do I troubleshoot Splunk performance issues?

The answer to this question would be very wide, but, mostly, an interviewer would be looking for the following keywords:

  • Check splunkd.log for errors
  • Check server performance issues, i.e., CPU, memory usage, disk I/O, etc.
  • Install the SOS (Splunk on Splunk) app and check for warnings and errors in its dashboard
  • Check the number of saved searches currently running and their consumption of system resources
  • Install and enable Firebug, which is a Firefox extension. Log into Splunk (using Firefox) and open Firebug’s panels. Then, switch to the ‘Net’ panel, which we will have to enable. The Net panel will show us the HTTP requests and responses, along with the time spent on each. This will give us a lot of information quickly, such as which requests are hanging Splunk, which requests are blameless, etc.
  •  

What are buckets? Explain the Splunk bucket lifecycle.

Splunk places indexed data in directories, which are called ‘buckets.’ It is physically a directory containing events from a certain period.

A bucket moves through several stages as it ages. Below are the various stages it goes through:

  • Hot: A hot bucket contains newly indexed data. It is open for writing. There can be one or more hot buckets for each index.
  • Warm: A warm bucket consists of data rolled out from a hot bucket. There are many warm buckets.
  • Cold: A cold bucket has data that is rolled out from a warm bucket. There are many cold buckets.
  • Frozen: A frozen bucket is comprised of data rolled out from a cold bucket. The indexer deletes frozen data by default, but we can archive it. Archived data can later be thawed (data in a frozen bucket is not searchable).

By default, the buckets are located in the following location:

$SPLUNK_HOME/var/lib/splunk/defaultdb/db

We should see the hot-db there and any warm buckets we have. By default, Splunk sets the bucket size to 10 GB for 64-bit systems and 750 MB for 32-bit systems.

What is the difference between stats and eventstats commands?

  • The stats command generates summary statistics of all the existing fields in the search results and saves them as values in new fields.
  • Eventstats is similar to the stats command, except that the aggregation results are added inline to each event and only if the aggregation is pertinent to that event. The eventstats command computes requested statistics, much like how stats do, but aggregates them to the original raw data.

Who are the top direct competitors to Splunk?

Logstash, Loggly, LogLogic, Sumo Logic, etc. are some of the top direct competitors to Splunk.

What do Splunk licenses specify?

Splunk licenses specify how much data we can index per calendar day.

How does Splunk determine 1 day, from a licensing perspective?

In terms of licensing, for Splunk, one day is from midnight to midnight on the clock of the license master.

How are forwarder licenses purchased?

They are included in Splunk. Therefore, there is no need to purchase them separately.

 What is the command for restarting Splunk web server?

This is another frequently asked Splunk commands interview question. Get a thorough idea of commands We can restart the Splunk web server by using the following command:

splunk start splunkweb

What is the command for restarting the Splunk Daemon?

Splunk Deamon can be restarted with the below command:

splunk start splunkd

What is the command used to check the running Splunk processes on Unix/Linux?

If we want to check the running Splunk Enterprise processes on Unix/Linux, we can make use of the following command:

ps aux | grep splunk

What is the command used for enabling Splunk to boot start?

To boot start Splunk, we have to use the following command:

$SPLUNK_HOME/bin/splunk enable boot-start

How to disable Splunk boot-start?

In order to disable Splunk boot-start, we can use the following:

$SPLUNK_HOME/bin/splunk disable boot-start

What is a source type in Splunk?

The source type is Splunk way of identifying data.

Advanced Interview Questions

How to reset the Splunk admin password?

Resetting the Splunk admin password depends on the version of Splunk. If we are using Splunk 7.1 and above, then we have to follow the below steps:

  • First, we have to stop our Splunk Enterprise
  • Now, we need to find the ‘passwd’ file and rename it to ‘passwd.bk’
  • Then, we have to create a file named ‘user-seed.conf’ in the below directory:

$SPLUNK_HOME/etc/system/local/

In the file, we will have to use the following command (here, in place of ‘NEW_PASSWORD’, we will add our own new password):

[user_info]

PASSWORD = NEW_PASSWORD

  • After that, we can just restart the Splunk Enterprise and use the new password to log in

Now, if we are using versions prior to 7.1, we will follow the below steps:

  • First, stop the Splunk Enterprise
  • Find the passwd file and rename it to ‘passw.bk.’
  • Start Splunk Enterprise and log in using the default credentials of admin/changeme.
  • Here, when asked to enter a new password for our admin account, we will follow the instructions

Note: In case we have created other users earlier and know their login details, copy and paste their credentials from the passwd.bk file into the passwd file and restart Splunk.

How to disable the Splunk launch message?

Set value OFFENSIVE=Less in splunk_launch.conf

How to clear the Splunk search history?

We can clear the Splunk search history by deleting the following file from the Splunk server:

$splunk_home/var/log/splunk/searches.log

What is Btool? How will you troubleshoot Splunk configuration files?

Splunk Btool is a command-line tool that helps us troubleshoot configuration file issues or just see what values are being used by our Splunk Enterprise installation in the existing environment.

What is the difference between the Splunk app and Splunk add-ons?

In fact, both contain preconfigured configuration, reports, etc., but the Splunk add-on does not have a visual app. On the other hand, a Splunk app has a preconfigured visual app.

What is the ‘.conf’ file’s precedence in Splunk?

File precedence is as follows:

System local directory — highest priority

App local directories

App default directories

System default directory — lowest priority

What is a fishbucket? What is a fishbucket index?

Fishbucket is a directory or index at the default location:

/opt/splunk/var/lib/splunk

It contains seek pointers and CRCs for the files we are indexing, so ‘splunkd’ can tell us if it has read them already. We can access it through the GUI by searching for:

index=_thefishbucket

How do I exclude some events from being indexed by Splunk?

This can be done by defining a regex to match the necessary event(s) and sending everything else to NullQueue. Here is a basic example that will drop everything except events that contain the string login:
In props.conf:

<code>[source::/var/log/foo]

# Transforms must be applied in this order

# to make sure events are dropped on the

# floor prior to making their way to the

# index processor

TRANSFORMS-set= setnull,setparsing

</code>

In transforms.conf:

[setnull] REGEX = . DEST_KEY = queue FORMAT = nullQueue

[setparsing]

REGEX = login

DEST_KEY = queue

FORMAT = indexQueue

How can I understand when Splunk has finished indexing a log file?

We can figure this out:
By watching data from Splunk’s metrics log in real-time:

index=”_internal” source=”*metrics.log” group=”per_sourcetype_thruput” series=”&lt;your_sourcetype_here&gt;” |

eval MB=kb/1024 | chart sum(MB)

By watching everything split by source type:

index=”_internal” source=”*metrics.log” group=”per_sourcetype_thruput” | eval MB=kb/1024 | chart sum(MB) avg(eps) over series

If we are having trouble with data input and we want a way to troubleshoot it, particularly if our whitelist/blacklist rules are not working the way we expected, we will go to the following URL:

How to set the default search time in Splunk 6?

To do this in Splunk Enterprise 6.0, we have to use ‘ui-prefs.conf’. If we set the value in the following, all our users would see it as the default setting:

$SPLUNK_HOME/etc/system/local

For example, if our

$SPLUNK_HOME/etc/system/local/ui-prefs.conf file

includes:

[search]

dispatch.earliest_time = @d

dispatch.latest_time = now

The default time range that all users will see in the search app will be today.

The configuration file reference for ui-prefs.conf is here:

http://docs.splunk.com/Documentation/Splunk/latest/Admin/Ui-prefsconf

What is a dispatch directory?

$SPLUNK_HOME/var/run/splunk/dispatch

contains a directory for each search that is running or has completed. For example, a directory named 1434308943.358 will contain a CSV file of its search results, a search.log with details about the search execution, and other stuff. Using the defaults (which we can override in limits.conf), these directories will be deleted 10 minutes after the search completes—unless the user saves the search results, in which case the results will be deleted after 7 days.

What is the difference between search head pooling and search head clustering?

Both are features provided by Splunk for the high availability of Splunk search head in case any search head goes down. However, the search head cluster feature has only recently been introduced, while the search head pooling feature will be removed in the next few versions.

The search head cluster is managed by a captain, and the captain controls its slaves. The search head cluster is more reliable and efficient than the search head pooling.

If I want to add folder access logs from a windows machine to Splunk, how do I do it?

Below are the steps to add folder access logs to Splunk:

  1. Enable Object Access Audit through group policy on the Windows machine on which the folder is located
  2. Enable auditing on a specific folder for which we want to monitor logs
  3. Install the Splunk universal forwarder on the Windows machine.
  4. Configure the universal forwarder to send security logs to the Splunk indexer.

How would you handle/troubleshoot a Splunk license violation warning?

A license violation warning implies that Splunk has indexed more data than our purchased license quota. We have to identify which index/source type has received more data recently than the usual daily data volume. We can check the Splunk license master pool-wise available quota and identify the pool in which the violation has occurred. Once we identify the pool that is receiving more data, we have to identify the top source type that is receiving more data than usual. Once the source type is also identified, we find the source machine that is sending the huge number of logs and, in turn, the root cause for the same, and troubleshoot it accordingly.

What is MapReduce algorithm?

MapReduce algorithm is the secret behind Splunk’s faster data searching. It’s an algorithm typically used for batch-based large-scale parallelization. It’s inspired by functional programming’s map() and reduce() functions.

 How does Splunk avoid the duplicate indexing of logs?

At the indexer, Splunk keeps track of the indexed events in a directory called Fishbucket with the following default location:

/opt/splunk/var/lib/splunk

It contains seek pointers and CRCs for the files we are indexing, so splunkd can tell us if it has read them already.

See more at:

http://www.learnsplunk.com/splunk-indexer-configuration.html#sthash.t1ixi19P.dpuf.

What is the difference between the Splunk SDK and the Splunk Framework?

Splunk SDKs are designed to allow us to develop applications from scratch; they do not require Splunk Web or any components from the Splunk App Framework. These are separately licensed from Splunk, and they do not alter the Splunk software.

Splunk App Framework resides within the Splunk web server and permits us to customize the Splunk Web UI that comes with the product and develop Splunk apps using the Splunk web server. It is an important part of the features and functionalities of Splunk, which does not license users to modify anything in Splunk.

For what purpose inputlookup and outputlookup are used in Splunk Search?

The inputlookup command is used to search the contents of a Splunk lookup table. The lookup table can be a CSV lookup or a KV store lookup. The inputlookup command is considered an event-generating command. An event-generating command generates events or reports from one or more indexes without transforming them. There are numerous commands that come under event-generating commands, including metadata, loadjob, inputcsv, etc. The inputlookup command is event-generating.

Syntax:

inputlookup [append=] [start=] [max=] [ | ] [WHERE ]

Now coming to the outputlookup command, it writes the search results to a static lookup table, or KV store collection, that we specify. The outputlookup command is not being used with external lookups.

Syntax:

outputlookup [append=<bool>] [create_empty=<bool>] [max=<int>] [key_field=<field_name>] [createinapp=<bool>] [override_if_empty=<bool>] (<filename> | <tablename>)

Explain how Splunk works.

We can divide the working of Splunk into three main parts:

  • Forwarder: You can see it as a dumb agent whose main task is to collect the data from various sources like remote machines and transfer it to the indexer.
  • Indexer: The indexer processes the data in real time and stores and indexes it on the localhost or cloud server.
  • Search Head: It allows the end-user to interact with the data and perform various operations like searching, analyzing, and visualizing the information.

How to add the colors in Splunk UI based on the field names?

Splunk UI has a number of features that allow the administrator to make the reports more presentable. One such feature that proves to be very useful for presenting distinguished results is the custom colors. For example, if the sales of a product drop below a threshold value, then as an administrator you can set the chart to display the values in red color.

The administrator can also change chart colors in the Splunk Web UI by editing the panels from the panel settings mentioned above the dashboard. Moreover, you can write the codes and use hexadecimal values to choose a color from the palette.

How the Data Ages in Splunk?

The data that is entering an indexer gets sorted into directories, which are also known as buckets. Over a period of time, these buckets roll over different stages, from hot to warm, cold to frozen, and finally thawed. The indexer goes through a pipeline where event processing takes place. It occurs in two stages: parsing breaks them into individual events, while indexing takes these events into the pipeline for processing.

This is what happens to the data at each stage of the indexing pipeline:

  • As soon as the data center the pipeline, it goes to the hot bucket. There can be multiple hot buckets at any point in time, which you can both search and write to.
  • If any problem like the Splunk getting restarted or the hot bucket has reached a certain threshold value/size, then a new bucket will be created in its place and the existing ones roll to become a warm bucket. These warm buckets are searchable, but you cannot write anything in them.
  • Further, if the indexer reaches its maximum capacity, the warm bucket will be rolled to become a cold one. Splunk will automatically execute the process by selecting the oldest warm bucket from the pipeline. However, it doesn’t rename the bucket. All the above buckets will be stored in the default location ‘$SPLUNK_HOME/var/lib/splunk/defaultdb/db/*’.
  • After a certain period of time, the cold bucket rolls to become the frozen bucket. These buckets don’t have the same location as the previous buckets and are non-searchable. These buckets can either be archived or deleted based on the priorities.
  • You can’t do anything if the bucket is deleted, but you can retrieve the frozen bucket if it’s being archived. The process of retrieving an archived bucket is known as thawing. Once a bucket is thawed it becomes searchable and stores into a new location

‘$SPLUNK_HOME/var/lib/splunk/defaultdb/thaweddb/’

.

What are pivots and data models in Splunk?

Data models in Splunk are used when you have to process huge amounts of unstructured data and create a hierarchical model without executing complex search queries on the data. Data models are widely used for creating sales reports, adding access levels, and creating a structure of authentication for various applications.

Pivots, on the other hand, give you the flexibility to create multiple views and see the results as per the requirements. With pivots, even the managers of stakeholders from non-technical backgrounds can create views and get more details about their departments.

Explain workflow actions.

This topic will be present in any set of Splunk interview questions and answers. Workflow actions in Splunk are referred to as highly configurable, knowledge objects that enable you to interact with web resources and other fields. Splunk workflow actions can be used to create HTML links and use them to search field values, put HTTP post requests for specific URLs, and run secondary searches for selected events.

How many types of dashboards are available in Splunk?

There are three types of dashboards available in Splunk:

  • Real-time dashboards
  • Dynamic form-based dashboards
  • Dashboards for scheduled reports

What are the types of alerts available in Splunk?

Alerts are the actions generated by a saved search result after a certain period of time. Once an alert has occurred, subsequent actions like sending an email or a message will also be triggered. There are two types of alters available in Splunk, which are mentioned below:

Types of alters available in Splunk:

  • Real-Time Alerts: We can divide the real-time alerts into two parts: pre-result and rolling-window alerts. The pre-result alert gets triggered with every search, while rolling-window alerts are triggered when a specific criterion is met by the search.
  • Scheduled Alerts: As the name suggests, scheduled alerts can be initialized to trigger multiple alerts based on the set criteria.

Define the terms ‘search factor’ and ‘replication factor.’

Search factor: The search factor (SF) decides the number of searchable copies an indexer cluster can maintain of the data/bucket. For example, the search factor value of 3 shows that the cluster can maintain up to 3 copies of each bucket.

Replication factor: The replication factor (RF) determines the number of users that can receive copies of your data/buckets. However, the search factor should not be greater than the replication factor.

How to stop/start the Splunk service?

The command for starting Splunk service:

./splunk start

The command for stopping Splunk service:

./splunk stop

What is the use of a ‘time zone’ property in Splunk?

Time Zone is an important property that helps you search for the events in case any fraud or security issue occurs. The default time zone will be taken from the browser settings or the machine you are using. Apart from event searching, it is also used in data pouring from multiple sources and aligns them based on different time zones.

What are the important Search commands in Splunk?

Below are some of the important search commands in Splunk:

  • Erex
  • Abstract
  • Typer
  • Rename
  • Anomalies
  • Fill down
  • Accum
  • Add totals

How many types of search modes are there in Splunk?

There are three types of search modes in Splunk:

  • Fast mode: speeds up your search result by limiting the types of data.
  • Verbose mode: Slower as compared to the fast mode, but returns the information for as many events as possible.
  • Smart mode: It toggles between different modes and search behaviors to provide maximum results in the shortest period of time.

[the-post-grid id=”9538″ title=””]

Leave a Reply

Your email address will not be published. Required fields are marked *

Visit Our Store and Buy All document (F5, Zscaler, ASA, Paloalto, Checkpoint,Forescout, Cisco ISE etc) only in  1600RS, click here on store - Store

X
error: Content is protected !!