Algorithms

SGDClassifier

The SGDClassifier algorithm uses the scikit-learn SGDClassifier estimator to fit a model to predict the value of categorical fields. This algorithm supports incremental fit.

The SGDClassifier algorithm uses the scikit-learn SGDClassifier estimator to fit a model to predict the value of categorical fields. This algorithm supports incremental fit.

Parameters

  • The partial_fit parameter controls whether an existing model should be incrementally updated or not. This allows you to update an existing model using only new data without having to retrain it on the full training data set.

  • The partial_fit parameter default is False.

  • n_iter=<int> is the number of passes over the training data also known as epochs. The default is 5. The number of iterations is set to 1 if using partial_fit.

  • The loss=<hinge|log|modified_huber|squared_hinge|perceptron> parameter is the loss function to be used.

    • Defaults to hinge, which gives a linear SVM.
  • The log loss gives logistic regression, a probabilistic classifier.

  • modified_huber is another smooth loss that brings tolerance to outliers as well as probability estimates.

  • squared_hinge is like hinge but is quadratically penalized.

  • perceptron is the linear loss used by the perceptron algorithm.

  • The fit_intercept=<true|false> parameter specifies whether the intercept should be estimated or not. The default is True.

  • penalty=<l2|l1|elasticnet> is the penalty, also known as regularization term, to be used. The default is l2.

  • learning_rate=<constant|optimal|invscaling>is the learning rate.

    • constant: eta = eta0
    • optimal: eta = 1.0/(alpha * t)
    • invscaling: eta = eta0 / pow(t, power_t)
    • The default is invscaling
  • l1_ratio=<float>is the Elastic Net mixing parameter, with 0 <= l1_ratio <= 1 (default 0.15).

    • l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1.
  • alpha=<float> is the constant that multiplies the regularization term (default 0.0001). Also used to compute learning_rate when set to optimal.

  • eta0=<float> is the initial learning rate. The default is 0.01.

  • power_t=<float> is the exponent for inverse scaling learning rate. The default is 0.25.

  • random_state=<int> is the seed of the pseudo random number generator to use when shuffling the data.

Syntax

fit SGDClassifier <field_to_predict> from <explanatory_fields>
[into <model name>] [partial_fit=<true|false>]
[loss=<hinge|log|modified_huber|squared_hinge|perceptron>]
[fit_intercept=<true|false>]
[random_state=<int>] [n_iter=<int>] [l1_ratio=<float>]
[alpha=<float>] [eta0=<float>] [power_t=<float>]
[penalty=<l1|l2|elasticnet>] [learning_rate=<constant|optimal|invscaling>]

You can save SGDClassifier models using the into keyword and apply the saved model later to new data using the apply command.

... | apply sla_model

You can inspect the model learned by SGDClassifier with the summary command.

... | summary sla_model

Syntax constraints

  • If My_Incremental_Model does not exist, the command saves the model data under the model name My_Incremental_Model.
  • If My_Incremental_Model exists and was trained using SGDClassifier, the command updates the existing model with the new input.
  • If My_Incremental_Model exists but was not trained by SGDClassifier, an error displays.
  • Using partial_fit=true on an existing model ignores the newly supplied parameters. The parameters supplied at model creation are used instead.
  • If partial_fit=false or partial_fit is not specified the model specified is created and replaces the pre-trained model if one exists.

Example

The following example uses SGDClassifier on a test set.

... | fit SGDClassifier SLA_violation from * into sla_model

The following example includes the partial_fit=<true|false> command.

| inputlookup iris.csv | fit SGDClassifier species from * partial_fit=true into My_Incremental_Model

Local availability Permalink to this section

  • Local class: SGDClassifier
  • Source file: Splunk_ML_Toolkit/bin/algos/SGDClassifier.py (in-repo path Splunk_ML_Toolkit/bin/algos/SGDClassifier.py)
  • algos.conf stanza: [SGDClassifier]
  • Class bases: ClassifierMixin, BaseAlgo

Source Permalink to this section

Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: classifier).

Press Cmd/Ctrl+K to focus search. Esc to close.

Type to search the portal.