Algorithms

DBSCAN

The DBSCAN algorithm uses the scikit-learn DBSCAN clustering algorithm to divide a result set into distinct clusters. The cluster for each event is set in a new field named `cluster`. DBSCAN is distinct from K-Means in that it clusters r…

The DBSCAN algorithm uses the scikit-learn DBSCAN clustering algorithm to divide a result set into distinct clusters. The cluster for each event is set in a new field named cluster. DBSCAN is distinct from K-Means in that it clusters results based on local density, and uncovers a variable number of clusters, whereas K-Means finds a precise number of clusters. For example, k=5 finds 5 clusters.

Parameters

  • The eps parameter specifies the maximum distance between two samples for them to be considered in the same cluster.

    • By default, the cluster label field name is cluster. Change that behavior by using the as keyword to specify a different field name.
  • The min_samples parameter defines the number of samples, or the total weight, in a neighborhood for a point to be considered as a core point - including the point itself. You can choose the min_samples parameter's best value based on preference for cluster density or noise in your dataset.

  • The min_samples parameter is optional.

  • The min_samples default value is 5.

  • The minimum value for the min_samples parameter is 3.

  • If min_samples=8 you need at least 8 data points to form a dense cluster.

Note: If you choose the min_samples parameter's best value based on noise in your dataset, it's recommended to have a larger data set to pull from.

Syntax

| fit DBSCAN <fields> [eps=<number>] [min_samples=<integer>]

Syntax constraints

You cannot save DBSCAN models using the into keyword. To predict cluster assignments for future data, combine the DBSCAN algorithm with any classifier algorithm. For example, first cluster the data using DBSCAN, then fit RandomForestClassifier to predict the cluster.

Examples

The following example uses DBSCAN without the min_samples parameter.

... | fit DBSCAN * | stats count by cluster

The following example uses DBSCAN with the min_samples parameter.

...| inputlookup track_day.csv | fit DBSCAN eps=0.5 min_samples=1000 speed | table speed cluster

Local availability Permalink to this section

  • Local class: DBSCAN
  • Source file: Splunk_ML_Toolkit/bin/algos/DBSCAN.py (in-repo path Splunk_ML_Toolkit/bin/algos/DBSCAN.py)
  • algos.conf stanza: [DBSCAN]
  • Class bases: ClustererMixin, BaseAlgo

Source Permalink to this section

Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: clusterer).

Press Cmd/Ctrl+K to focus search. Esc to close.

Type to search the portal.