Search commands for machine learning
The AI Toolkit includes several custom machine learning search commands. You can use these ML-SPL commands on any Splunk platform instance where the AI Toolkit is installed.
Showing 5 of 5
- About the ai command The `ai` command is a new ML-SPL search command. This command lets you connect to externally hosted large language models (LLMs) from third-party providers including OpenAI, Gemini, Bedrock, Groq, and Ollama.
- About the fit and apply commands The `fit` and `apply` commands are 2 of the custom machine learning (ML) Splunk Search Processing Language (SPL) commands included in the AI Toolkit. These ML-SPL commands implement classic machine learning and statistical learning tasks.
- Permissions for machine learning commands Access to the machine learning search commands (ML-SPL) of `fit`, `apply`, and `ai` is permission based. You can configure permissions to restrict access to these search commands.
- Search commands for machine learning safeguards The Splunk platform contains search processing language (SPL) safeguards to warn you when you might unknowingly run a search in Splunk Web that has commands that might be either a security or a performance risk. If a search command that…
- Search macros in the AI Toolkit The AI Toolkit includes 3 search macros. Search macros are reusable blocks of Splunk Search Processing Language (SPL) that you can insert into other searches. Search macros can be any part of a search, such as an `eval` statement or sear…
No items match the selected filters.
The AI Toolkit includes several custom machine learning search commands. You can use these ML-SPL commands on any Splunk platform instance where the AI Toolkit is installed.
ML-SPL commands implement the following classic machine learning and statistical learning tasks:
| ML-SPL command name | Description |
|---|---|
ai |
Send Splunk platform data through externally hosted large language models (LLMs) and have the response presented back in the Splunk search pipeline. |
fit |
Fit and apply a machine learning model to search results. |
apply |
Apply a machine learning model that was learned using the fit command. |
summary |
Return a summary of a machine learning model that was learned using the fit command. |
listmodels |
Return a list of machine learning models that were learned using the fit command. |
deletemodel |
Delete a machine learning model that was learned using the fit command. |
sample |
Randomly sample or partition events. |
score |
Run statistical tests to validate model outcomes. |
ML-SPL commands follow the same syntax as other SPL commands in the Splunk platform. For more details on this syntax, see Understanding SPL syntax. in the Search Reference manual.
You can also configure the performance costs of the fit and apply commands. For details, see Configure algorithm performance costs.
Note: The fit and apply commands work on relative searches with relative time ranges, but will not complete on real-time searches.
The ML-SPL commands along with the available AI Toolkit algorithms can be viewed as a quick reference guide. You You can download a PDF of the AI Toolkit Quick Reference Guide in English or Japanese:
- Splunk AI Toolkit Quick Reference Guide, English
- Splunk AI Toolkit Quick Reference Guide, Japanese
ai command Permalink to this section
The ai command was introduced in version 5.6.0. For information on this new command, see About the ai command.
Note: The ai command modifies the model. The command is considered risky because running it can cause performance issues. As a result, this command triggers SPL safeguards. To learn more, see Search commands for machine learning safeguards.
fit command Permalink to this section
Use the fit command to fit and apply a machine learning model to search results. The syntax is the same for supervised (labeled data) and unsupervised (unlabeled data) learning.
Note: The fit command modifies the model. The command is considered risky because running it can cause performance issues. As a result, this command triggers SPL safeguards. To learn more, see Search commands for machine learning safeguards.
Syntax
The first argument for the algorithm is required. The options following the algorithm vary depending on the algorithm chosen:
fit <algorithm> [option_name]=[option_value]... [into <model_name>]
Some algorithms require a response-field:
fit <algorithm> [option_name]=[option_value]...<response-field> [into <model_name>]
Some algorithms require an explanatory-field:
fit <algorithm> [option_name]=[option_value]...<explanatory-field> [into <model_name>]
Some algorithms require both a response-field and an explanatory-field. The from field is only required if both the response-field and explanatory-field are present:
fit <algorithm> [option_name]=[option_value]...<response-field> from <explanatory-field> [into <model_name>]
Use the into keyword to store the learned model in an artifact that can later be applied to new search results with the apply command.
Note: Not all algorithms support saved models. For details on all the algorithms that ship with MLTK, see Algorithms in the Splunk Machine Learning Toolkit.
Examples
The following example fits a LinearRegression model to predict errors using _time:
... | fit LinearRegression errors from _time
The following example fits a LinearRegression model to predict errors using _time and saves it into a model named errors_over_time:
... | fit LinearRegression errors from _time into errors_over_time
The following example fits a LogisticRegression model to predict a categorical response from numerical measurements:
... | fit LogisticRegression species from petal_length petal_width sepal_length sepal_width
apply command Permalink to this section
Use the apply command to compute predictions for the current search results based on a model that was learned using the fit command. The apply command can be used on different search results than those used when fitting the model, but the results should have an identical list of fields.
Syntax
apply <model_name> [as <output_field>]
Use the as keyword to rename the field added to search results by the model.
Examples
The following example applies a learned LinearRegression model named errors_over_time:
... | apply errors_over_time
The following example renames the output of the model to predicted_errors:
... | apply errors_over_time as predicted_errors
summary command Permalink to this section
Use the summary command to return a summary of a machine learning model that was learned using the fit command. The summary is algorithm specific. For example, the summary for the LinearRegression algorithm is a list of coefficients. The summary for the LogisticRegression algorithm is a list of coefficients for each class.
Syntax
summary <model_name>
Examples
The following example inspects a learned LinearRegression model named errors_over_time:
| summary errors_over_time
listmodels command Permalink to this section
Use the listmodels command to return a list of machine learning models that were learned using the fit command. The algorithm and arguments given when fit was invoked are displayed for each model.
Syntax
listmodels
Example
The following example lists all models:
| listmodels
deletemodel command Permalink to this section
Use the deletemodel command to delete a machine learning model learned using the fit command.
Note: The deletemodel command modifies the model. The command is considered risky because running it can cause performance issues. As a result, this command triggers SPL safeguards. To learn more, see Search commands for machine learning safeguards.
Syntax
deletemodel <model_name>
Example
The following example deletes the model named errors_over_time:
| deletemodel errors_over_time
sample command Permalink to this section
Use the sample command to randomly sample or partition events. The command samples in one of the following three modes:
ratio: Returns an event with the given probabilitycount: Returns exactly that number of eventsproportional: Samples each event with probability specified by a field value
A fourth mode of partitioning randomly divides events into a given number of partitions.
Refer to the following table for more details on the sample command modes and additional options:
| Mode or option | Name | Description |
|---|---|---|
| Sampling mode | ratio |
A float between 0 and 1 indicating the probability as a percentage that each event has of being included in the result set. For example, a ratio of 0.01 means that events have a 1% probability of being included in the results. Use ratio when you want an approximation. |
count |
A number that indicates the exact number of randomly-chosen events to return. If the sample count exceeds the total number of events in the search, all events are returned. | |
proportional |
The name of a numeric field to use to determine the sampling probability of each event, which yields a biased sampling. Each event is sampled with a probability specified by this field value. | |
| Partitioning mode | partitions |
Use partitions to specify the number of partitions in which to randomly divide events, approximately split. Use partitions when you want to divide your results into groups for different purposes, such as using results for testing and training. |
| Additional option | seed |
A number that specifies a random seed. Using seed ensures reproducible results. If unspecified, a pseudorandom value is used. |
count by <field> |
Specifies a field by which to split events, returning the count number of events for each value of the specified field. If there are more events than count, all events are included in the results. |
|
inverse |
Use with proportional sampling. Inverts the probability, returning samples with one minus the probability specified in the proportional field |
|
fieldname |
The name of the field in which to store the partition number. Defaults to partition_number. |
This sample command is not identical to using sampling options on the Event Sampling menu on the Search page in Splunk Web:
- Options from the Event Sampling menu perform sampling before the data is collected from indexes, at the beginning of the search pipeline.
- The
samplecommand is applied after data is collected, accessing everything in the search pipeline.
Using the Event Sampling menu option is faster, but the sample command is usable anywhere in the search command and provides several modes that are not available to the Event Sampling feature. For example, the sample command supports partitioning, biased sampling, and the ability to retrieve an exact number of results.
Syntax
sample [ratio=<float between 0 and 1>] [count=<positive integer>] [proportional=<name of numeric field> [inverse]] [partitions=<natural number greater than 1> [fieldname=<string>]] [seed=<number>] [by <split_by_field>]
Examples
The following example uses the ratio keyword and retrieves approximately 1% of all events at random:
... | sample ratio=0.01
The following example uses the count keyword and retrieves exactly 20 events at random:
... | sample count=20
The following example uses the count keyword and retrieves exactly 20 events at random from each host:
... | sample count=20 by host
The following example uses the proportional keyword and returns each event with a probability determined by the value of some_field:
... | sample proportional="some_field"
The following example partitions events into seven groups, with the chosen group returned in a field called partition_number:
... | sample partitions=7 fieldname="partition_number"
score command Permalink to this section
The score command runs statistical tests to validate model outcomes. Use the score command to validate models and statistical tests for any use case. Choose the scoring method best suited to your data and problem you want to solve with score.
Syntax
The first argument for the scoring method is required. The options following the scoring method vary depending on the scoring method chosen.
Some scoring methods support pairwise comparisons between two sets of fields:
... | score <scoring-method-name> a_field_1 a_field_2 ... a_field_n against b_field_1 b_field_2 … b_field_m
Some scoring methods support pairwise comparisons between two sets of arrays:
... | score <scoring-method-name> array_a against array_b [options]
Some scoring methods are specific to the evaluation of clustering models:
... | score <scoring-method-name> <label_field> against <feature_field_1> ... <feature_field_n> metric=<options>
Example
The following example uses the score command on test data:
... | score confusion_matrix true="species" pred="predicted(species)"
The AI Toolkit includes the following classes of the score command, each with their own sets of methods:
- Classification
- Clustering scoring
- Pairwise distances scoring
- Regression scoring
- Statistical functions (statsfunctions)
- Statistical testing (statstest)
Note: Score commands are not customizable within the AI Toolkit.
The AI Toolkit can also help you test for model overfitting using the K-fold scoring option. For more information, see K-fold scoring.
Source: /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/ai-toolkit-commands-macros-and-visualizations/search-commands-for-machine-learning (upstream Splunk AITK 5.6.4 docs)