Algorithms
SGDRegressor
\\
The SGDRegressor algorithm uses the scikit-learn SGDRegressor estimator to fit a model to predict the value of numeric fields. The kfold cross-validation command can be used with SGDRegressor. See, K-fold_cross-validation. This algorithm supports incremental fit.
Parameters
-
The
partial_fitparameter controls whether an existing model should be incrementally updated or not. This allows you to update an existing model using only new data without having to retrain it on the full training data set. The default is False. -
The
fit_intercept=<true|false>parameter determines whether the intercept should be estimated or not. -
The
fit_intercept=<true|false>parameter default is True. -
The
n_iter=<int>parameter is the number of passes over the training data also known as epochs. The default is 5.- The number of iterations is set to 1 if using
partial_fit.
- The number of iterations is set to 1 if using
-
The
penalty=<l2|l1|elasticnet>parameter set the penalty or regularization term to be used. The default is l2. -
The
learning_rate=<constant|optimal|invscaling>parameter is the learning rate.- constant: eta = eta0
- optimal: eta = 1.0/(alpha * t)
- invscaling: eta = eta0 / pow(t, power_t)
- default is
invscaling.
-
The
l1_ratio=<float>parameter is the Elastic Net mixing parameter, with 0 <= l1_ratio <= 1. Default is 0.15.- l1_ratio=0 corresponds to L2 penalty
- l1_ratio=1 to L1
-
The
alpha=<float>parameter is the constant that multiplies the regularization term. Default is 0.0001.- Also used to compute
learning_ratewhen set to Optimal.
- Also used to compute
-
The
eta0=<float>parameter is the initial learning rate. Default is 0.01. -
The
power_t=<float>parameter is the exponent for inverse scaling learning rate. Default is 0.25. -
The
random_state=<int>parameter is the seed of the pseudo random number generator to use when shuffling the data.
Syntax
fit SGDRegressor <field_to_predict> from <explanatory_fields>
[into <model name>] [partial_fit=<true|false>] [fit_intercept=<true|false>]
[random_state=<int>] [n_iter=<int>] [l1_ratio=<float>]
[alpha=<float>] [eta0=<float>] [power_t=<float>]
[penalty=<l1|l2|elasticnet>] [learning_rate=<constant|optimal|invscaling>]
You can save SGDRegressor models using the into keyword and apply new data later using the apply command.
... | apply temperature_model
You can inspect the coefficients learned by SGDRegressor with the summary command.
... | summary temperature_model
Syntax constraints
- If
My_Incremental_Modeldoes not exist, the command saves the model data under the model nameMy_Incremental_Model. - If
My_Incremental_Modelexists and was trained using SGDRegressor, the command updates the existing model with the new input. - If
My_Incremental_Modelexists but was not trained by SGDRegressor, an error message displays. - Using
partial_fit=trueon an existing model ignores the newly supplied parameters. The parameters supplied at model creation are used instead. - If
partial_fit=falseorpartial_fitis not specified the model specified is created and replaces the pre-trained model if one exists.
Examples
The following example uses SGDRegressor on a test set.
... | fit SGDRegressor temperature from date_month date_hour into temperature_model | ...
The following example includes thepartial_fit parameter.
| inputlookup server_power.csv | fit SGDRegressor "ac_power" from "total-cpu-utilization" "total-disk-accesses" partial_fit=true into My_Incremental_Model
Local availability Permalink to this section
- Local class:
SGDRegressor - Source file:
Splunk_ML_Toolkit/bin/algos/SGDRegressor.py(in-repo pathSplunk_ML_Toolkit/bin/algos/SGDRegressor.py) - algos.conf stanza:
[SGDRegressor] - Class bases:
RegressorMixin,BaseAlgo
Source Permalink to this section
Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: regressor).