Algorithms
StateSpaceForecast
\\
StateSpaceForecast is a forecasting algorithm for time series data in MLTK. It is based on Kalman filters. The algorithm supports incremental fit.
Advantages of StateSpaceForecast over ARIMA include:
- Persists models created using the
fitcommand that can then be used withapply. - A
specialdaysfield allows you to account for the effects of a specified list of special days. - It is automatic in that you do no need to choose parameters or mode.
- Supports multivariate forecasting.
Parameters
-
By default the historical data results from running the
fitcommand are not shown. To modify this behavior setoutput_fit=True. -
The
fieldssegment of the search supports the wildcard (*) character. -
Use the
targetfield to specify fields from which to forecast using historical data and other values. -
The
targetfield is a comma-separated list of fields that can be univariate or multivariate. These fields must be specified during thefitprocess.- Optionally use the
targetfield to fit multiple fields during thefitprocess but apply only a selection of those target fields during theapplyprocess.
- Optionally use the
-
If the
targetfield is not specified, then all fields will be forecast together using historical data. -
The
specialdaysfield specifies the field that indicates effects due to special days such as holidays. -
The
specialdaysfield values must be numeric and are typically 0 and 1, with 1 indicating the existence of a special day effect. Null values are treated as 0. -
The majority of use cases have no
specialdays. Events that occur regularly and frequently such as weekends should not be treated asspecialdays. Usespecialdaysto capture events such as holiday sales. -
Use
specialdaysin theapplystep if it has been specified duringfit. The same field(s) must be assigned. -
Use the
periodparameter to specify if your data has a known periodicity. -
If the
periodparameter is not specified it is computed automatically. -
Set
period=1to treat the time series as non-periodic. -
As with other MLTK algorithms, the
partial_fitparameter controls whether a model should be incrementally updated or not. This allows you to update a model using only new data without having to retrain the model on the full dataset. -
The default for
partial_fitis False. -
Use
update_lastto modify the behavior ofpartial_fit -
The default for
update_lastis False. -
If
partial_fit=TrueStateSpaceForecast first updates the model parameters and then predicts. -
If
partial_fit=Trueandupdate_last=TrueStateSpaceForecast first predicts and then updates the model parameters. This allows you to review the forecast before running new data through. -
The
conf_interval=<1..99>parameter is the confidence interval in percentage around forecasted values. Input an integer between 1 and 99 where a larger number means a greater tolerance for forecast uncertainty. The default integer is 95. -
Use the
asfield to assign aliases to forecasted fields. -
In univariate cases the
asfieldfield-listis a single field name. -
In multivariate cases, the
asfield adheres to the following conventions:- The list must be in double quotes, separated by either spaces or commas.
- The aliases correspond to the original fields in the given order.
- The number of aliases can be smaller than the number of original fields.
-
The
summarycommand lists the names of the fields used in thefitcommand step, the name of thespecialdaysfield, and the period. -
The
holdbackparameter is the number of data points held back from training. This is useful for comparing the forecast against known data points. Default holdback value is 0. -
If you want to maintain the
holdbackposition, add the position number inforecast_kto yourholdbackvalue. -
The
forecast_kparameter tells StateSpaceForecast how many points into the future should be forecasted. If_timeis specified during fitting along with thefield_to_forecast, StateSpaceForecast also generates the timestamps for forecasted values. Default,forecast_kvalue is 0. -
The
holdbackandforecast_kvalues can be of two types: an integer or a time range.- An integer specifies a number of events. An example of
forecast_k=10forecasts 10 events into the future. An example ofholdback=10withholds the last 10 events from training. - A time range takes the form
XYwhere X is a non-negative integer and Y is either empty or adheres to format in the time range table. If Y is empty, then the time range is instead interpreted as an integer or a number of events. An example ofholdback=3day forecast_k=1weekwithholds 3 days of events and forecasts 1 week's worth of events.
- An integer specifies a number of events. An example of
Note: The actual number of events withheld and forecasted using the time range option depends on the time interval between consecutive events.
| Time range | Acceptable formats for Y value |
|---|---|
| seconds | s, sec, secs, second, seconds |
| minutes | m, min, minute, minutes |
| hours | h, hr, hrs, hour, hours |
| days | d, day, days |
| weeks | w, week, weeks |
| months | mon, month, months |
| quarters | q, qtr, qtrs, quarter, quarters |
| years | y, yr, yrs, year, years |
Syntax
| fit StateSpaceForecast <fields> [from *] [specialdays=<field name>] [holdback=<int | time-range>] [forecast_k=<int | time-range>] [conf_interval=<float>] [period=<int>]
[partial_fit=<true|false>] [update_last=<true|false>] [output_fit=<true|false>] [into <model name>] [as <field-list>]
You can apply the saved model to new data with the apply command.
| apply <model name> [specialdays=<field name>] [target=<fields>] [holdback=<int | time-range>] [forecast_k=<int | time-range>] [conf_interval=<float>]
You can inspect the model learned by StateSpaceForecast with the summary command.
| summary <model name>
Syntax constraints
- For univariate analysis the
fieldsparameter is a single field, but for multivariate analysis it is a list of fields. - For multivariate analysis, only one
specialdaysfield can be specified and it applies to all the fields. - The
specialdaysfield values must be numeric. - Null values in the
specialdaysfield are treated as 0. - Double quotes are required around field lists.
- Scoring metric values are based on the
holdbackperiod data.
Examples
The following is a univariate example of StateSpaceForecast on a test set. The example is considered univariate as there is only a single field following | fit StateSpaceForecast. The example dataset is derived from the milk.csv dataset that ships with MLTK. The milk2.csv has a new column named holiday. This column has two values 0 and 1. The 0 value represents no holiday and 1 value represents a holiday for the associated date. The 1 values were set randomly.
| inputlookup milk2.csv
| fit StateSpaceForecast milk_production from * specialdays=holiday into milk_model
| apply milk_model specialdays=holiday forecast_k=30
The following is a multivariate example of StateSpaceForecast on a test set. The syntax is the same as that in the univariate example, except that this case has a list of fields (CRM, ERP, and Expenses) following | fit StateSpaceForecast, making it multivariate.
| inputlookup app_usage.csv
| fields CRM ERP Expenses
| fit StateSpaceForecast CRM ERP Expenses holdback=12 into app_usage_model as "crm, erp"
The following example is also multivariate and includes the target field. In this example the fields of CRM and ERP are forecast using historical data and the Expenses field. The apply command is used against the model created in the fit command step, resulting in the app_usage_model model.
Note: Double quotes are required around any field list.
| inputlookup app_usage.csv
| fields CRM ERP Expenses
| apply app_usage_model target="CRM, ERP" forecast_k=36 holdback=36
The following example is again multivariate but without the target field. This example forecasts the fields CRM, ERP, and Expenses using historical data.
| inputlookup app_usage.csv
| fields CRM ERP Expenses
| apply app_usage_model forecast_k=36 holdback=36
The following example uses the wildcard (*) character to specify the three fields of total_accidents, front_accidents, and rear_accidents.
| inputlookup UKfrontrearseatKSI.csv
| eval total_accidents='British drivers KSI'
| eval front_accidents='front seat KSI'
| eval rear_accidents='rear seat KSI'
| fit StateSpaceForecast *accidents holdback=30 from * forecast_k=10
The following example shows how to improve your output with StateSpaceForecast.
| inputlookup cyclical_business_process_with_external_anomalies.csv
| eval holiday=if(random()%100<98,0,1)
| fit StateSpaceForecast logons from logons into My_Model forecast_k=3000
Adding of the SPL line period=2016 could improve the output, but would not account for the period being seven days rather than twenty-four hours.
| inputlookup cyclical_business_process_with_external_anomalies.csv
| table _time,logons
| eval holiday=if(random()%100<98,0,1)
| eval dayOfWeek=strftime(_time,"%a")
| eval holidayWeekend=case(in(dayOfWeek,"Sat","Sun"),1,true(),0)
| apply MyBadModel specialdays=holidayWeekend forecast_k=3000
| eval old_predict='predicted(logons)'
| eval dayOfWeek=strftime(_time,"%a")
| eval holidayWeekend=case(in(dayOfWeek,"Sat","Sun"),1,true(),0)
| apply My_Model specialdays=holidayWeekend holdback=3000 forecast_k=3000
Local availability Permalink to this section
- Local class:
StateSpaceForecast - Source file:
Splunk_ML_Toolkit/bin/algos/StateSpaceForecast.py(in-repo pathSplunk_ML_Toolkit/bin/algos/StateSpaceForecast.py) - algos.conf stanza:
[StateSpaceForecast] - Class bases:
BaseAlgo
Source Permalink to this section
Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: forecasting).