Algorithms
AutoPrediction
AutoPrediction automatically determines the data type as categorical or numeric. AutoPrediction then invokes the RandomForestClassifier algorithm to carry out the prediction. For further details, see RandomForestClassifier. AutoPredictio…
AutoPrediction automatically determines the data type as categorical or numeric. AutoPrediction then invokes the RandomForestClassifier algorithm to carry out the prediction. For further details, see RandomForestClassifier. AutoPrediction also executes the data split for training and testing during the fit process, eliminating the need for a separate command or macro. AutoPrediction uses particular cases to determine the data type, and uses the train_test_split function from sklearn to perform the data split.
Parameters
-
Use the
target_typeparameter to specify the target field as numeric or categorical. -
The
target_typeparameter default is auto. When auto is used, AutoPrediction automatically determines the target field type. -
AutoPrediction uses the following data types to determine the
target_typefield as categorical:- Data of type
bool,str, ornumpy.object - Data of type
intand thecriterionoption is specified
- Data of type
-
AutoPrediction determines the
target_typefield as numeric for all other cases. -
The
test_split_ratiospecifies the splitting of data for model training and model validation. Value must be a float between 0 (inclusive) and 1 (exclusive). -
The
test_split_ratiodefault is 0. A value of 0 means all data points get used to train the model.- A
test_split_ratiovalue of 0.3, for example, means 30% for the data points get used for testing and 70% are used for training.
- A
-
Use
n_estimatorsto optionally specify the number of trees. -
Use
max_depthto optionally set the maximum depth of the tree. -
Specify the
criterionvalue for classification (categorical) scenarios. -
Ignore the
criterionvalue for regression (numeric) scenarios.
Syntax
fit AutoPrediction Target from Predictors* into PredictorModel target_type=<auto|numeric|categorical> test_split_ratio=<[0-1]>[n_estimators=<int>] [max_depth=<int>]
[criterion=<gini | entropy>] [random_state=<int>][max_features=<str>] [min_samples_split=<int>] [max_leaf_nodes=<int>]
You can save AutoPrediction models using the into keyword and apply the saved model later to new data using the apply command.
... | apply PredictorModel
You can inspect the model learned by AutoPrediction with the summary command.
.... | summary PredictorModel
Syntax constraints
- AutoPrediction does not support
partial_fit. - Classification performance output columns for accuracy, f1, precision, and recall only appear if the
target_typeis categorical. - Regression performance output columns for RMSE and rSquared only appear if the
target_typeis numeric.
Example
The following example uses AutoPrediction on a test set.
| fit AutoPrediction random_state=42 species from * max_features=0.1 into auto_classify_model test_split_ratio=0.3 random_state=42
Also classified as: regressor Permalink to this section
AutoPrediction automatically determines the data type as categorical or numeric. AutoPrediction then invokes the RandomForestRegressor algorithm to carry out the prediction. For further details, see RandomForestRegressor. AutoPrediction also executes the data split for training and testing during the fit process, eliminating the need for a separate command or macro. AutoPrediction uses particular cases to determine the data type, and uses the train_test_split function from sklearn to perform the data split. The kfold cross-validation command can be used with AutoPrediction. See, K-fold_cross-validation.
Parameters
-
Use the
target_typeparameter to specify the target field as numeric or categorical. -
The
target_typeparameter default is auto. When auto is used, AutoPrediction automatically determines the target field type. -
AutoPrediction uses the following data types to determine the
target_typefield as categorical:- Data of type
bool,str, ornumpy.object - Data of type
intand thecriterionoption is specified
- Data of type
-
AutoPrediction determines the
target_typefield as numeric for all other cases. -
The
test_split_ratiospecifies the splitting of data for model training and model validation. Value must be a float between 0 (inclusive) and 1 (exclusive). -
The
test_split_ratiodefault is 0. A value of 0 means all data points get used to train the model.- A
test_split_ratiovalue of 0.3, for example, means 30% for the data points get used for testing and 70% are used for training.
- A
-
Use
n_estimatorsto optionally specify the number of trees. -
Use
max_depthto optionally set the maximum depth of the tree. -
Specify the
criterionvalue for classification (categorical) scenarios. -
Ignore the
criterionvalue for regression (numeric) scenarios.
Syntax
fit AutoPrediction Target from Predictors* into PredictorModel target_type=<auto|numeric|categorical> test_split_ratio=<[0-1]>[n_estimators=<int>] [max_depth=<int>]
[criterion=<gini | entropy>] [random_state=<int>][max_features=<str>] [min_samples_split=<int>] [max_leaf_nodes=<int>]
You can save AutoPrediction models using the into keyword and apply the saved model later to new data using the apply command.
... | apply PredictorModel
You can inspect the model learned by AutoPrediction with the summary command.
.... | summary PredictorModel
Syntax constraints
- AutoPrediction does not support
partial_fit. - Regression performance output columns for RMSE and rSquared only appear if the
target_typeis numeric. - Classification performance output columns for accuracy, f1, precision, and recall only appear if the
target_typeis categorical.
Example
The following example uses AutoPrediction on a test set.
| fit AutoPrediction random_state=42 sepal_length from * into auto_regress_model test_split_ratio=0.3 random_state=42
Local availability Permalink to this section
- Local class:
AutoPrediction - Source file:
Splunk_ML_Toolkit/bin/algos/AutoPrediction.py(in-repo pathSplunk_ML_Toolkit/bin/algos/AutoPrediction.py) - algos.conf stanza:
[AutoPrediction] - Class bases:
ClassifierMixin,RegressorMixin,BaseAlgo
Source Permalink to this section
Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: classifier).