AutoPrediction — AITK Info Portal

AutoPrediction automatically determines the data type as categorical or numeric. AutoPrediction then invokes the RandomForestClassifier algorithm to carry out the prediction. For further details, see RandomForestClassifier. AutoPrediction also executes the data split for training and testing during the fit process, eliminating the need for a separate command or macro. AutoPrediction uses particular cases to determine the data type, and uses the train_test_split function from sklearn to perform the data split.

Parameters

Use the target_type parameter to specify the target field as numeric or categorical.
The target_type parameter default is auto. When auto is used, AutoPrediction automatically determines the target field type.
AutoPrediction uses the following data types to determine the target_type field as categorical:
- Data of type bool, str, or numpy.object
- Data of type int and the criterion option is specified
AutoPrediction determines the target_type field as numeric for all other cases.
The test_split_ratio specifies the splitting of data for model training and model validation. Value must be a float between 0 (inclusive) and 1 (exclusive).
The test_split_ratio default is 0. A value of 0 means all data points get used to train the model.
- A test_split_ratio value of 0.3, for example, means 30% for the data points get used for testing and 70% are used for training.
Use n_estimators to optionally specify the number of trees.
Use max_depth to optionally set the maximum depth of the tree.
Specify the criterion value for classification (categorical) scenarios.
Ignore the criterion value for regression (numeric) scenarios.

Syntax

fit AutoPrediction Target from Predictors* into PredictorModel target_type=<auto|numeric|categorical> test_split_ratio=<[0-1]>[n_estimators=<int>] [max_depth=<int>]
[criterion=<gini | entropy>] [random_state=<int>][max_features=<str>] [min_samples_split=<int>] [max_leaf_nodes=<int>]

You can save AutoPrediction models using the into keyword and apply the saved model later to new data using the apply command.

... | apply PredictorModel

You can inspect the model learned by AutoPrediction with the summary command.

.... | summary PredictorModel

Syntax constraints

AutoPrediction does not support partial_fit.
Classification performance output columns for accuracy, f1, precision, and recall only appear if the target_type is categorical.
Regression performance output columns for RMSE and rSquared only appear if the target_type is numeric.

Example

The following example uses AutoPrediction on a test set.

| fit AutoPrediction random_state=42 species from * max_features=0.1 into auto_classify_model test_split_ratio=0.3 random_state=42

Also classified as: regressor Permalink to this section

AutoPrediction automatically determines the data type as categorical or numeric. AutoPrediction then invokes the RandomForestRegressor algorithm to carry out the prediction. For further details, see RandomForestRegressor. AutoPrediction also executes the data split for training and testing during the fit process, eliminating the need for a separate command or macro. AutoPrediction uses particular cases to determine the data type, and uses the train_test_split function from sklearn to perform the data split. The kfold cross-validation command can be used with AutoPrediction. See, K-fold_cross-validation.

Parameters

Use the target_type parameter to specify the target field as numeric or categorical.
The target_type parameter default is auto. When auto is used, AutoPrediction automatically determines the target field type.
AutoPrediction uses the following data types to determine the target_type field as categorical:
- Data of type bool, str, or numpy.object
- Data of type int and the criterion option is specified
AutoPrediction determines the target_type field as numeric for all other cases.
The test_split_ratio specifies the splitting of data for model training and model validation. Value must be a float between 0 (inclusive) and 1 (exclusive).
The test_split_ratio default is 0. A value of 0 means all data points get used to train the model.
- A test_split_ratio value of 0.3, for example, means 30% for the data points get used for testing and 70% are used for training.
Use n_estimators to optionally specify the number of trees.
Use max_depth to optionally set the maximum depth of the tree.
Specify the criterion value for classification (categorical) scenarios.
Ignore the criterion value for regression (numeric) scenarios.

Syntax

fit AutoPrediction Target from Predictors* into PredictorModel target_type=<auto|numeric|categorical> test_split_ratio=<[0-1]>[n_estimators=<int>] [max_depth=<int>]
[criterion=<gini | entropy>] [random_state=<int>][max_features=<str>] [min_samples_split=<int>] [max_leaf_nodes=<int>]

You can save AutoPrediction models using the into keyword and apply the saved model later to new data using the apply command.

... | apply PredictorModel

You can inspect the model learned by AutoPrediction with the summary command.

.... | summary PredictorModel

Syntax constraints

AutoPrediction does not support partial_fit.
Regression performance output columns for RMSE and rSquared only appear if the target_type is numeric.
Classification performance output columns for accuracy, f1, precision, and recall only appear if the target_type is categorical.

Example

The following example uses AutoPrediction on a test set.

| fit AutoPrediction random_state=42 sepal_length from * into auto_regress_model test_split_ratio=0.3 random_state=42

Local availability Permalink to this section

Local class: AutoPrediction
Source file: Splunk_ML_Toolkit/bin/algos/AutoPrediction.py (in-repo path Splunk_ML_Toolkit/bin/algos/AutoPrediction.py)
algos.conf stanza: [AutoPrediction]
Class bases: ClassifierMixin, RegressorMixin, BaseAlgo

Source Permalink to this section

Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: classifier).

Also classified as: regressor Permalink to this section#

Local availability Permalink to this section#

Source Permalink to this section#

Also classified as: regressor Permalink to this section

Local availability Permalink to this section

Source Permalink to this section