Algorithms

Imputer

The Imputer algorithm is a preprocessing step wherein missing data is replaced with substitute values. The substitute values can be estimated, or based on other statistics or values in the dataset. To use Imputer, the user passes in the…

The Imputer algorithm is a preprocessing step wherein missing data is replaced with substitute values. The substitute values can be estimated, or based on other statistics or values in the dataset. To use Imputer, the user passes in the names of the fields to impute, along with arguments specifying the imputation strategy, and the values representing missing data. Imputer then adds new imputed versions of those fields to the data, which are copies of the original fields, except that their missing values are replaced by values computed according to the imputation strategy.

Parameters

  • Available imputation strategies include mean, median, most frequent, and field. The default strategy is mean.
  • All but the field parameter require numeric data. The field strategy accepts categorical data.

Syntax

.. | fit Imputer <field>* [as <field prefix>] [missing_values=<"NaN"|integer>] [strategy=<mean|median|most_frequent>] [into <model name>]

You can inspect the value (mean, median, or mode) that was substituted for missing values by Imputer with the summary command.

... | summary <imputer model name>

You can save Imputer models using the into keyword and apply new data later using the apply command.

... | apply <imputer model name>

Example

The following example uses Imputer on a test set.

| inputlookup server_power.csv
| eval ac_power_missing=if(random() % 3 = 0, null, ac_power)
| fields - ac_power
| fit Imputer ac_power_missing
| eval imputed=if(isnull(ac_power_missing), 1, 0)
| eval ac_power_imputed=round(Imputed_ac_power_missing, 1)
| fields - ac_power_missing, Imputed_ac_power_missing

Local availability Permalink to this section

  • Local class: Imputer
  • Source file: Splunk_ML_Toolkit/bin/algos/Imputer.py (in-repo path Splunk_ML_Toolkit/bin/algos/Imputer.py)
  • algos.conf stanza: [Imputer]
  • Class bases: TransformerMixin, BaseAlgo

Class docstring Permalink to this section

Instance of Imputer algorithm to fill in missing values in data.

Source Permalink to this section

Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: preprocessor).

Press Cmd/Ctrl+K to focus search. Esc to close.

Type to search the portal.