Algorithms

PCA

The Principal Component Analysis (PCA) algorithm uses the scikit-learn PCA algorithm to reduce the number of fields by extracting new, uncorrelated features out of the data.

The Principal Component Analysis (PCA) algorithm uses the scikit-learn PCA algorithm to reduce the number of fields by extracting new, uncorrelated features out of the data.

Parameters

  • The k parameter specifies the number of features to be extracted from the data.
  • The variance parameter is short for percentage variance ratio explained. This parameter determines the percentage of variance ratio explained in the principal components of the PCA. It computes the number of principal components dynamically by preserving the specified variance ratio.
  • The variance parameter defaults to 1 if k is not provided.
  • The variance parameter can take a value between 0 and 1.
  • The explained_variance parameter measures the proportion to which the principal component accounts for dispersion of a given dataset. A higher value denotes a higher variation.
  • The explained_variance_ratio parameter is the percentage of variance explained by each of the selected components.

Syntax

fit PCA <fields> [into <model name>] [k=<int>] [variance=<float>]

You can save PCA models using the into keyword and apply new data later using the apply command.

...into example_hard_drives_PCA_2 | apply example_hard_drives_PCA_2

You can inspect the model learned by PCA with the summary command.

| summary example_hard_drives_PCA_2

Syntax constraints

The variance parameter and k parameter cannot be used together. They are mutually exclusive.

Examples

The following example uses PCA on a test set.

| fit PCA "SS_SMART_1_Raw", "SS_SMART_2_Raw", "SS_SMART_3_Raw", "SS_SMART_4_Raw", "SS_SMART_5_Raw" k=2 into example_hard_drives_PCA_2

The following example includes the variance parameter. The value variance=0.5 tells the algorithm to choose as many principal components for the data set until able to explain 50% of the variance in the original dataset.

| fit PCA "SS_SMART_1_Raw", "SS_SMART_2_Raw", "SS_SMART_3_Raw", "SS_SMART_4_Raw", "SS_SMART_5_Raw" variance=0.50 into example_hard_drives_PCA_2

Local availability Permalink to this section

  • Local class: PCA
  • Source file: Splunk_ML_Toolkit/bin/algos/PCA.py (in-repo path Splunk_ML_Toolkit/bin/algos/PCA.py)
  • algos.conf stanza: [PCA]
  • Class bases: TransformerMixin, BaseAlgo

Source Permalink to this section

Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: preprocessor).

Press Cmd/Ctrl+K to focus search. Esc to close.

Type to search the portal.