Algorithms
DBSCAN
The DBSCAN algorithm uses the scikit-learn DBSCAN clustering algorithm to divide a result set into distinct clusters. The cluster for each event is set in a new field named `cluster`. DBSCAN is distinct from K-Means in that it clusters r…
The DBSCAN algorithm uses the scikit-learn DBSCAN clustering algorithm to divide a result set into distinct clusters. The cluster for each event is set in a new field named cluster. DBSCAN is distinct from K-Means in that it clusters results based on local density, and uncovers a variable number of clusters, whereas K-Means finds a precise number of clusters. For example, k=5 finds 5 clusters.
Parameters
-
The
epsparameter specifies the maximum distance between two samples for them to be considered in the same cluster.- By default, the cluster label field name is
cluster. Change that behavior by using theaskeyword to specify a different field name.
- By default, the cluster label field name is
-
The
min_samplesparameter defines the number of samples, or the total weight, in a neighborhood for a point to be considered as a core point - including the point itself. You can choose themin_samplesparameter's best value based on preference for cluster density or noise in your dataset. -
The
min_samplesparameter is optional. -
The
min_samplesdefault value is 5. -
The minimum value for the
min_samplesparameter is 3. -
If
min_samples=8you need at least 8 data points to form a dense cluster.
Note: If you choose the min_samples parameter's best value based on noise in your dataset, it's recommended to have a larger data set to pull from.
Syntax
| fit DBSCAN <fields> [eps=<number>] [min_samples=<integer>]
Syntax constraints
You cannot save DBSCAN models using the into keyword. To predict cluster assignments for future data, combine the DBSCAN algorithm with any classifier algorithm. For example, first cluster the data using DBSCAN, then fit RandomForestClassifier to predict the cluster.
Examples
The following example uses DBSCAN without the min_samples parameter.
... | fit DBSCAN * | stats count by cluster
The following example uses DBSCAN with the min_samples parameter.
...| inputlookup track_day.csv | fit DBSCAN eps=0.5 min_samples=1000 speed | table speed cluster
Local availability Permalink to this section
- Local class:
DBSCAN - Source file:
Splunk_ML_Toolkit/bin/algos/DBSCAN.py(in-repo pathSplunk_ML_Toolkit/bin/algos/DBSCAN.py) - algos.conf stanza:
[DBSCAN] - Class bases:
ClustererMixin,BaseAlgo
Source Permalink to this section
Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: clusterer).