Algorithms
X-means
Use the X-means algorithm when you have unlabeled data and no prior knowledge of the total number of labels into which that data could be divided. The X-means clustering algorithm is an extended K-means that automatically determines the…
Use the X-means algorithm when you have unlabeled data and no prior knowledge of the total number of labels into which that data could be divided. The X-means clustering algorithm is an extended K-means that automatically determines the number of clusters based on Bayesian Information Criterion (BIC) scores. Starting with a single cluster, the X-means algorithm goes into action after each run of K-means, making local decisions about which subset of the current centroids should split themselves in order to fit the data better.
Using the X-means algorithm has the following advantages:
- Eliminates the requirement of having to provide the value of
k. - Normally produces tighter clusters than hierarchical clustering.
Using the X-means algorithm has the following disadvantages:
- Sensitive to scaling. See StandardScaler.
- Different initializations might result in different final clusters.
- Does not work well with clusters of different sizes and density.
Parameters
-
The splitting decision is done by computing the BIC.
-
The cluster for each event is set in a new field named cluster, and the total number of clusters is set in a new field named
n_clusters. -
By default, the cluster label field name is
cluster.- You can change the default behavior by using the
askeyword to specify a different field name.
- You can change the default behavior by using the
Syntax
fit XMeans <fields> [into <model name>]
You can apply new data to the saved X-means model using the apply command.
... | apply cluster_model
You can save X-means models using the into command. You can inspect the model learned by X-means with the summary command.
...| summary cluster_model
Example
The following example uses X-means on a test set.
... | fit XMeans * | stats count by cluster
Local availability Permalink to this section
- Local class:
XMeans - Source file:
Splunk_ML_Toolkit/bin/algos/XMeans.py(in-repo pathSplunk_ML_Toolkit/bin/algos/XMeans.py) - algos.conf stanza:
[XMeans] - Class bases:
KMeans
Module docstring Permalink to this section
Implementation of XMeans algorithm based on
Pelleg, Dan, and Andrew W. Moore. "X-means: Extending K-means with Efficient Estimation of the Number of Clusters."
ICML. Vol. 1. 2000.
https://www.cs.cmu.edu/~dpelleg/download/xmeans.pdf
Source Permalink to this section
Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: clusterer).