K-means — AITK Info Portal

K-means clustering is a type of unsupervised learning. It is a clustering algorithm that groups similar data points, with the number of groups represented by the variable k. The K-means algorithm uses the scikit-learn K-means implementation. The cluster for each event is set in a new field named cluster. Use the K-means algorithm when you have unlabeled data and have at least approximate knowledge of the total number of groups into which the data can be divided.

Using the K-means algorithm has the following advantages:

Computationally faster than most other clustering algorithms.
Simple algorithm to explain and understand.
Normally produces tighter clusters than hierarchical clustering.

Using the K-means algorithm has the following disadvantages:

Difficult to determine optimal or true value of k. See X-means.
Sensitive to scaling. See StandardScaler.
Each clustering may be slightly different, unless you specify the random_state parameter.
Does not work well with clusters of different sizes and density.

For descriptions of default value of K, see the scikit-learn documentation at http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

Parameters

The k parameter specifies the number of clusters to divide the data into. By default, the cluster label field name is cluster. Change that behavior by using the as keyword to specify a different field name.

Syntax

fit KMeans <fields> [into <model name>]  [k=<int>]  [random_state=<int>]

You can save K-means models using the into keyword when using the fit command.

You can apply the model to new data using the apply command.

... | apply cluster_model

You can inspect the model using the summary command.

... | summary cluster_model

Example

The following example uses K-means on a test set.

... | fit KMeans * k=3 | stats count by cluster

Local availability Permalink to this section

Local class: KMeans
Source file: Splunk_ML_Toolkit/bin/algos/KMeans.py (in-repo path Splunk_ML_Toolkit/bin/algos/KMeans.py)
algos.conf stanza: [KMeans]
Class bases: ClustererMixin, BaseAlgo

Source Permalink to this section

Adapted from the Splunk AI Toolkit 5.7.3 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.7.3/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: clusterer).

K-means Clusterer Local

Local availability Permalink to this section#

Source Permalink to this section#

Local availability Permalink to this section

Source Permalink to this section