X-means — AITK Info Portal

Use the X-means algorithm when you have unlabeled data and no prior knowledge of the total number of labels into which that data could be divided. The X-means clustering algorithm is an extended K-means that automatically determines the number of clusters based on Bayesian Information Criterion (BIC) scores. Starting with a single cluster, the X-means algorithm goes into action after each run of K-means, making local decisions about which subset of the current centroids should split themselves in order to fit the data better.

Using the X-means algorithm has the following advantages:

Eliminates the requirement of having to provide the value of k.
Normally produces tighter clusters than hierarchical clustering.

Using the X-means algorithm has the following disadvantages:

Sensitive to scaling. See StandardScaler.
Different initializations might result in different final clusters.
Does not work well with clusters of different sizes and density.

Parameters

The splitting decision is done by computing the BIC.
The cluster for each event is set in a new field named cluster, and the total number of clusters is set in a new field named n_clusters.
By default, the cluster label field name is cluster.
- You can change the default behavior by using the as keyword to specify a different field name.

Syntax

fit XMeans <fields> [into <model name>]

You can apply new data to the saved X-means model using the apply command.

... | apply cluster_model

You can save X-means models using the into command. You can inspect the model learned by X-means with the summary command.

...| summary cluster_model

Example

The following example uses X-means on a test set.

... | fit XMeans * | stats count by cluster

Local availability Permalink to this section

Local class: XMeans
Source file: Splunk_ML_Toolkit/bin/algos/XMeans.py (in-repo path Splunk_ML_Toolkit/bin/algos/XMeans.py)
algos.conf stanza: [XMeans]
Class bases: KMeans

Module docstring Permalink to this section

Implementation of XMeans algorithm based on
Pelleg, Dan, and Andrew W. Moore. "X-means: Extending K-means with Efficient Estimation of the Number of Clusters."
ICML. Vol. 1. 2000.
https://www.cs.cmu.edu/~dpelleg/download/xmeans.pdf

Source Permalink to this section

Adapted from the Splunk AI Toolkit 5.7.3 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.7.3/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: clusterer).

X-means Clusterer Local

Local availability Permalink to this section#

Module docstring Permalink to this section#

Source Permalink to this section#

Local availability Permalink to this section

Module docstring Permalink to this section

Source Permalink to this section