G-means — AITK Info Portal

G-means is a clustering algorithm based on K-means. The G-means algorithm is similar in purpose to the X-means algorithm. G-means uses the Anderson-Darling statistical test to determine when to split a cluster.

Using the G-means algorithm has the following advantages:

The parameter k is computed automatically
G-means can produce more accurate clusters than X-means in some real-world scenarios

Parameters

The cluster splitting decision is done using the Anderson-Darling statistical test.
The cluster for each event is set in a new field named cluster, and the total number of clusters is set in a new field named n_clusters.
By default, the cluster label field name is cluster.
- You can change the default behavior by using the as keyword to specify a different field name.
Optionally use the random_state parameter to set a seed value.
- random_state must be an integer.

Syntax

| fit GMeans <fields> [into <cluster_model>]

You can apply new data to the saved G-means model using the apply command.

... | apply cluster_model

You can save G-means models using the into command. You can inspect the model learned by G-means with the summary command.

...| summary cluster_model

Example

The following example uses G-means on a test set.

| inputlookup housing.csv
| fields median_house_value distance_to_employment_center crime_rate
| fit GMeans * random_state=42 into cluster_model

Local availability Permalink to this section

Local class: GMeans
Source file: Splunk_ML_Toolkit/bin/algos/GMeans.py (in-repo path Splunk_ML_Toolkit/bin/algos/GMeans.py)
algos.conf stanza: [GMeans]
Class bases: KMeans

Module docstring Permalink to this section

Implementation of GMeans algorithm based on
"Learning the k in k-means", by Greg Hamerly, Charles Elkan
https://papers.nips.cc/paper/2526-learning-the-k-in-k-means.pdf

Source Permalink to this section

Adapted from the Splunk AI Toolkit 5.7.3 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.7.3/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: clusterer).

G-means Clusterer Local

Local availability Permalink to this section#

Module docstring Permalink to this section#

Source Permalink to this section#

Local availability Permalink to this section

Module docstring Permalink to this section

Source Permalink to this section