Algorithm catalog
Machine-generated catalog of AI Toolkit algorithms with category filters, upstream-source backlinks, and local app source pointers.
Showing 47 of 47
- ACF
- AITKContainer AITKContainer is a local Splunk_ML_Toolkit algorithm registered at `bin/algos/AITKContainer.py`.
- ARIMA
- AutoPrediction AutoPrediction automatically determines the data type as categorical or numeric. AutoPrediction then invokes the RandomForestClassifier algorithm to carry out the prediction. For further details, see RandomForestClassifier. AutoPredictio…
- BernoulliNB The BernoulliNB algorithm uses the scikit-learn BernoulliNB estimator to fit a model to predict the value of categorical fields where explanatory variables are assumed to be binary-valued. BernoulliNB is an implementation of the Naive Ba…
- Birch The Birch algorithm uses the scikit-learn Birch clustering algorithm to divide data points into set of distinct clusters. The cluster for each event is set in a new field named `cluster`. This algorithm supports incremental fit.
- DBSCAN The DBSCAN algorithm uses the scikit-learn DBSCAN clustering algorithm to divide a result set into distinct clusters. The cluster for each event is set in a new field named `cluster`. DBSCAN is distinct from K-Means in that it clusters r…
- DecisionTreeClassifier The DecisionTreeClassifier algorithm uses the scikit-learn DecisionTreeClassifier estimator to fit a model to predict the value of categorical fields. For further information, see the sci-kit learn documentation: http://scikit-learn.org/…
- DecisionTreeRegressor The DecisionTreeRegressor algorithm uses the scikit-learn DecisionTreeRegressor estimator to fit a model to predict the value of numeric fields. The `kfold` cross-validation command can be used with DecisionTreeRegressor. See, K-fold\\_cr…
- DensityFunction The DensityFunction algorithm provides a consistent and streamlined workflow to create and store density functions and utilize them for anomaly detection. DensityFunction allows for grouping of the data using the `by` clause, where for e…
- ElasticNet The ElasticNet algorithm uses the scikit-learn ElasticNet estimator to fit a model to predict the value of numeric fields. ElasticNet is a linear regression model that includes both L1 and L2 regularization and is a generalization of Las…
- FieldSelector The FieldSelector algorithm uses the scikit-learn GenericUnivariateSelect to select the best predictor fields based on univariate statistical tests. For descriptions of the `mode` and `param` parameters, see the scikit-learn documentatio…
- G-means G-means is a clustering algorithm based on K-means. The G-means algorithm is similar in purpose to the X-means algorithm. G-means uses the Anderson-Darling statistical test to determine when to split a cluster.
- GaussianNB The GaussianNB algorithm uses the scikit-learn GaussianNB estimator to fit a model to predict the value of categorical fields, where the likelihood of explanatory variables is assumed to be Gaussian. GaussianNB is an implementation of Ga…
- GradientBoostingClassifier This algorithm uses the GradientBoostingClassifier from scikit-learn to build a classification model by fitting regression trees on the negative gradient of a deviance loss function. For further information, see the sci-kit learn documen…
- GradientBoostingRegressor This algorithm uses the GradientBoostingRegressor algorithm from scikit-learn to build a regression model by fitting regression trees on the negative gradient of a loss function. The `kfold` cross-validation command can be used with Grad…
- HashingVectorizer The HashingVectorizer algorithm converts text documents to a matrix of token occurrences. It uses a feature hashing strategy to allow for hash collisions when measuring the occurrence of tokens. It is a stateless transformer, meaning tha…
- ICA ICA (Independent component analysis) separates a multivariate signal into additive sub-components that are maximally independent. Typically, ICA is not used for separating superimposed signals, but for reducing dimensionality. The ICA mo…
- Imputer The Imputer algorithm is a preprocessing step wherein missing data is replaced with substitute values. The substitute values can be estimated, or based on other statistics or values in the dataset. To use Imputer, the user passes in the…
- K-fold cross-validation In the `kfold_cv` parameter, the training set is randomly partitioned into k equal-sized subsamples. Then, each sub-sample takes a turn at becoming the validation (test) set, predicted by the other k-1 training sets. Each sample is used…
- K-means K-means clustering is a type of unsupervised learning. It is a clustering algorithm that groups similar data points, with the number of groups represented by the variable `k`. The K-means algorithm uses the scikit-learn K-means implement…
- KernelPCA The KernelPCA algorithm uses the scikit-learn KernelPCA to reduce the number of fields by extracting uncorrelated new features out of data. The difference between KernelPCA and PCA is the use of kernels in the former, which helps with fi…
- KernelRidge The KernelRidge algorithm uses the scikit-learn KernelRidge algorithm to fit a model to predict numeric fields. This algorithm uses the radial basis function (rbf) kernel by default. The `kfold` cross-validation command can be used with…
- Lasso The Lasso algorithm uses the scikit-learn Lasso estimator to fit a model to predict the value of numeric fields. Lasso is like LinearRegression, but it uses L1 regularization to learn a linear models with fewer coefficients and smaller c…
- LinearRegression The LinearRegression algorithm uses the scikit-learn LinearRegression estimator to fit a model to predict the value of numeric fields. The `kfold` cross-validation command can be used with LinearRegression. See, K-fold\\_cross-validation.
- LocalOutlierFactor The LocalOutlierFactor algorithm uses the scikit-learn Local Outlier Factor (LOF) to measure the local deviation of density of a given sample with respect to its neighbors. LocalOutlierFactor performs one-shot learning and is limited to…
- LogisticRegression The LogisticRegression algorithm uses the scikit-learn LogisticRegression estimator to fit a model to predict the value of categorical fields.
- MLPClassifier The MLPClassifier algorithm uses the scikit-learn Multi-layer Perceptron estimator for classification. MLPClassifier uses a feedforward artificial neural network model that trains using backpropagation. This algorithm supports incrementa…
- MultivariateOutlierDetection MultivariateOutlierDetection accepts a multivariate dataset. The algorithm receives multiple fields as input, then runs StandardScaler on the multivariate dataset to scale the dataset. Then PCA runs on the scaled dataset, deriving the fi…
- NPR The Normalized Perlich Ratio (NPR) algorithm converts high cardinality categorical field values into numeric field entries while intelligently handling space optimization. NPR offers low computational costs to perform feature extraction…
- OneClassSVM The OneClassSVM algorithm uses the scikit-learn OneClassSVM to fit a model from a set of features or fields for detecting anomalies and outliers, where features are expected to contain numerical values. OneClassSVM is an unsupervised out…
- PACF
- PCA The Principal Component Analysis (PCA) algorithm uses the scikit-learn PCA algorithm to reduce the number of fields by extracting new, uncorrelated features out of the data.
- PredictAI PredictAI is a local Splunk_ML_Toolkit algorithm registered at `bin/algos/PredictAI.py`.
- RandomForestClassifier The RandomForestClassifier algorithm uses the scikit-learn RandomForestClassifier estimator to fit a model to predict the value of categorical fields.
- RandomForestRegressor
- Ridge
- RobustScaler The RobustScaler algorithm uses the scikit-learn RobustScaler algorithm to standardize data fields by scaling their median and interquartile range to 0 and 1, respectively. It is very similar to the StandardScaler algorithm, in that it h…
- SGDClassifier The SGDClassifier algorithm uses the scikit-learn SGDClassifier estimator to fit a model to predict the value of categorical fields. This algorithm supports incremental fit.
- SGDRegressor
- SpectralClustering The SpectralClustering algorithm uses the scikit-learn SpectralClustering clustering algorithm to divide a result set into set of distinct clusters. SpectralClustering first transforms the input data using the Radial Basis Function (rbf)…
- StandardScaler The StandardScaler algorithm uses the scikit-learn StandardScaler algorithm to standardize data fields by scaling their mean and standard deviation to 0 and 1, respectively. This preprocessing step helps to avoid dominance of one or more…
- StateSpaceForecast
- SVM The SVM algorithm uses the scikit-learn kernel-based SVC estimator to fit a model to predict the value of categorical fields. It uses the radial basis function (rbf) kernel by default. For descriptions of the `C` and `gamma` parameters,…
- System Identification
- TFIDF The TFIDF algorithm uses the scikit-learn TfidfVectorizer to convert raw text data into a matrix making it possible to use other machine learning estimators on the data. For descriptions of `max_df`, `min_df`, `ngram_range`, `analyzer`,…
- X-means Use the X-means algorithm when you have unlabeled data and no prior knowledge of the total number of labels into which that data could be divided. The X-means clustering algorithm is an extended K-means that automatically determines the…
No items match the selected filters.
This catalog lists every algorithm exposed by the AI Toolkit, sourced from the Splunk AITK 5.6.4 documentation and enriched with this repository's Splunk_ML_Toolkit/bin/algos/ Python classes and default/algos.conf registrations.
Categories Permalink to this section
classifier— 10 algorithm(s)regressor— 10 algorithm(s)clusterer— 6 algorithm(s)anomaly— 4 algorithm(s)forecasting— 2 algorithm(s)preprocessor— 13 algorithm(s)nlp— 2 algorithm(s)