Algorithms
Imputer
The Imputer algorithm is a preprocessing step wherein missing data is replaced with substitute values. The substitute values can be estimated, or based on other statistics or values in the dataset. To use Imputer, the user passes in the…
The Imputer algorithm is a preprocessing step wherein missing data is replaced with substitute values. The substitute values can be estimated, or based on other statistics or values in the dataset. To use Imputer, the user passes in the names of the fields to impute, along with arguments specifying the imputation strategy, and the values representing missing data. Imputer then adds new imputed versions of those fields to the data, which are copies of the original fields, except that their missing values are replaced by values computed according to the imputation strategy.
Parameters
- Available imputation strategies include mean, median, most frequent, and field. The default strategy is
mean. - All but the
fieldparameter require numeric data. Thefieldstrategy accepts categorical data.
Syntax
.. | fit Imputer <field>* [as <field prefix>] [missing_values=<"NaN"|integer>] [strategy=<mean|median|most_frequent>] [into <model name>]
You can inspect the value (mean, median, or mode) that was substituted for missing values by Imputer with the summary command.
... | summary <imputer model name>
You can save Imputer models using the into keyword and apply new data later using the apply command.
... | apply <imputer model name>
Example
The following example uses Imputer on a test set.
| inputlookup server_power.csv
| eval ac_power_missing=if(random() % 3 = 0, null, ac_power)
| fields - ac_power
| fit Imputer ac_power_missing
| eval imputed=if(isnull(ac_power_missing), 1, 0)
| eval ac_power_imputed=round(Imputed_ac_power_missing, 1)
| fields - ac_power_missing, Imputed_ac_power_missing
Local availability Permalink to this section
- Local class:
Imputer - Source file:
Splunk_ML_Toolkit/bin/algos/Imputer.py(in-repo pathSplunk_ML_Toolkit/bin/algos/Imputer.py) - algos.conf stanza:
[Imputer] - Class bases:
TransformerMixin,BaseAlgo
Class docstring Permalink to this section
Instance of Imputer algorithm to fill in missing values in data.
Source Permalink to this section
Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: preprocessor).