Algorithms
PCA
The Principal Component Analysis (PCA) algorithm uses the scikit-learn PCA algorithm to reduce the number of fields by extracting new, uncorrelated features out of the data.
The Principal Component Analysis (PCA) algorithm uses the scikit-learn PCA algorithm to reduce the number of fields by extracting new, uncorrelated features out of the data.
Parameters
- The
kparameter specifies the number of features to be extracted from the data. - The
varianceparameter is short for percentage variance ratio explained. This parameter determines the percentage of variance ratio explained in the principal components of the PCA. It computes the number of principal components dynamically by preserving the specified variance ratio. - The
varianceparameter defaults to 1 if k is not provided. - The
varianceparameter can take a value between 0 and 1. - The
explained_varianceparameter measures the proportion to which the principal component accounts for dispersion of a given dataset. A higher value denotes a higher variation. - The
explained_variance_ratioparameter is the percentage of variance explained by each of the selected components.
Syntax
fit PCA <fields> [into <model name>] [k=<int>] [variance=<float>]
You can save PCA models using the into keyword and apply new data later using the apply command.
...into example_hard_drives_PCA_2 | apply example_hard_drives_PCA_2
You can inspect the model learned by PCA with the summary command.
| summary example_hard_drives_PCA_2
Syntax constraints
The variance parameter and k parameter cannot be used together. They are mutually exclusive.
Examples
The following example uses PCA on a test set.
| fit PCA "SS_SMART_1_Raw", "SS_SMART_2_Raw", "SS_SMART_3_Raw", "SS_SMART_4_Raw", "SS_SMART_5_Raw" k=2 into example_hard_drives_PCA_2
The following example includes the variance parameter. The value variance=0.5 tells the algorithm to choose as many principal components for the data set until able to explain 50% of the variance in the original dataset.
| fit PCA "SS_SMART_1_Raw", "SS_SMART_2_Raw", "SS_SMART_3_Raw", "SS_SMART_4_Raw", "SS_SMART_5_Raw" variance=0.50 into example_hard_drives_PCA_2
Local availability Permalink to this section
- Local class:
PCA - Source file:
Splunk_ML_Toolkit/bin/algos/PCA.py(in-repo pathSplunk_ML_Toolkit/bin/algos/PCA.py) - algos.conf stanza:
[PCA] - Class bases:
TransformerMixin,BaseAlgo
Source Permalink to this section
Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: preprocessor).