Algorithms

G-means

G-means is a clustering algorithm based on K-means. The G-means algorithm is similar in purpose to the X-means algorithm. G-means uses the Anderson-Darling statistical test to determine when to split a cluster.

G-means is a clustering algorithm based on K-means. The G-means algorithm is similar in purpose to the X-means algorithm. G-means uses the Anderson-Darling statistical test to determine when to split a cluster.

Using the G-means algorithm has the following advantages:

  • The parameter k is computed automatically
  • G-means can produce more accurate clusters than X-means in some real-world scenarios

Parameters

  • The cluster splitting decision is done using the Anderson-Darling statistical test.

  • The cluster for each event is set in a new field named cluster, and the total number of clusters is set in a new field named n_clusters.

  • By default, the cluster label field name is cluster.

    • You can change the default behavior by using the as keyword to specify a different field name.
  • Optionally use the random_state parameter to set a seed value.

    • random_state must be an integer.

Syntax

| fit GMeans <fields> [into <cluster_model>]

You can apply new data to the saved G-means model using the apply command.

... | apply cluster_model

You can save G-means models using the into command. You can inspect the model learned by G-means with the summary command.

...| summary cluster_model

Example

The following example uses G-means on a test set.

| inputlookup housing.csv
| fields median_house_value distance_to_employment_center crime_rate
| fit GMeans * random_state=42 into cluster_model

Local availability Permalink to this section

Module docstring Permalink to this section

Implementation of GMeans algorithm based on
"Learning the k in k-means", by Greg Hamerly, Charles Elkan
https://papers.nips.cc/paper/2526-learning-the-k-in-k-means.pdf

Source Permalink to this section

Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: clusterer).

Press Cmd/Ctrl+K to focus search. Esc to close.

Type to search the portal.