Algorithms
G-means
G-means is a clustering algorithm based on K-means. The G-means algorithm is similar in purpose to the X-means algorithm. G-means uses the Anderson-Darling statistical test to determine when to split a cluster.
G-means is a clustering algorithm based on K-means. The G-means algorithm is similar in purpose to the X-means algorithm. G-means uses the Anderson-Darling statistical test to determine when to split a cluster.
Using the G-means algorithm has the following advantages:
- The parameter
kis computed automatically - G-means can produce more accurate clusters than X-means in some real-world scenarios
Parameters
-
The cluster splitting decision is done using the Anderson-Darling statistical test.
-
The cluster for each event is set in a new field named cluster, and the total number of clusters is set in a new field named
n_clusters. -
By default, the cluster label field name is
cluster.- You can change the default behavior by using the
askeyword to specify a different field name.
- You can change the default behavior by using the
-
Optionally use the
random_stateparameter to set a seed value.random_statemust be an integer.
Syntax
| fit GMeans <fields> [into <cluster_model>]
You can apply new data to the saved G-means model using the apply command.
... | apply cluster_model
You can save G-means models using the into command. You can inspect the model learned by G-means with the summary command.
...| summary cluster_model
Example
The following example uses G-means on a test set.
| inputlookup housing.csv
| fields median_house_value distance_to_employment_center crime_rate
| fit GMeans * random_state=42 into cluster_model
Local availability Permalink to this section
- Local class:
GMeans - Source file:
Splunk_ML_Toolkit/bin/algos/GMeans.py(in-repo pathSplunk_ML_Toolkit/bin/algos/GMeans.py) - algos.conf stanza:
[GMeans] - Class bases:
KMeans
Module docstring Permalink to this section
Implementation of GMeans algorithm based on
"Learning the k in k-means", by Greg Hamerly, Charles Elkan
https://papers.nips.cc/paper/2526-learning-the-k-in-k-means.pdf
Source Permalink to this section
Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: clusterer).