Algorithms

X-means

Use the X-means algorithm when you have unlabeled data and no prior knowledge of the total number of labels into which that data could be divided. The X-means clustering algorithm is an extended K-means that automatically determines the…

Use the X-means algorithm when you have unlabeled data and no prior knowledge of the total number of labels into which that data could be divided. The X-means clustering algorithm is an extended K-means that automatically determines the number of clusters based on Bayesian Information Criterion (BIC) scores. Starting with a single cluster, the X-means algorithm goes into action after each run of K-means, making local decisions about which subset of the current centroids should split themselves in order to fit the data better.

Using the X-means algorithm has the following advantages:

  • Eliminates the requirement of having to provide the value of k.
  • Normally produces tighter clusters than hierarchical clustering.

Using the X-means algorithm has the following disadvantages:

  • Sensitive to scaling. See StandardScaler.
  • Different initializations might result in different final clusters.
  • Does not work well with clusters of different sizes and density.

Parameters

  • The splitting decision is done by computing the BIC.

  • The cluster for each event is set in a new field named cluster, and the total number of clusters is set in a new field named n_clusters.

  • By default, the cluster label field name is cluster.

    • You can change the default behavior by using the as keyword to specify a different field name.

Syntax

fit XMeans <fields> [into <model name>]

You can apply new data to the saved X-means model using the apply command.

... | apply cluster_model

You can save X-means models using the into command. You can inspect the model learned by X-means with the summary command.

...| summary cluster_model

Example

The following example uses X-means on a test set.

... | fit XMeans * | stats count by cluster

Local availability Permalink to this section

Module docstring Permalink to this section

Implementation of XMeans algorithm based on
Pelleg, Dan, and Andrew W. Moore. "X-means: Extending K-means with Efficient Estimation of the Number of Clusters."
ICML. Vol. 1. 2000.
https://www.cs.cmu.edu/~dpelleg/download/xmeans.pdf

Source Permalink to this section

Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: clusterer).

Press Cmd/Ctrl+K to focus search. Esc to close.

Type to search the portal.