Algorithms

StateSpaceForecast

\\

StateSpaceForecast is a forecasting algorithm for time series data in MLTK. It is based on Kalman filters. The algorithm supports incremental fit.

Advantages of StateSpaceForecast over ARIMA include:

  • Persists models created using the fit command that can then be used with apply.
  • A specialdays field allows you to account for the effects of a specified list of special days.
  • It is automatic in that you do no need to choose parameters or mode.
  • Supports multivariate forecasting.

Parameters

  • By default the historical data results from running the fit command are not shown. To modify this behavior set output_fit=True.

  • The fields segment of the search supports the wildcard (*) character.

  • Use the target field to specify fields from which to forecast using historical data and other values.

  • The target field is a comma-separated list of fields that can be univariate or multivariate. These fields must be specified during the fit process.

    • Optionally use the target field to fit multiple fields during the fit process but apply only a selection of those target fields during the apply process.
  • If the target field is not specified, then all fields will be forecast together using historical data.

  • The specialdays field specifies the field that indicates effects due to special days such as holidays.

  • The specialdays field values must be numeric and are typically 0 and 1, with 1 indicating the existence of a special day effect. Null values are treated as 0.

  • The majority of use cases have no specialdays. Events that occur regularly and frequently such as weekends should not be treated as specialdays. Use specialdays to capture events such as holiday sales.

  • Use specialdays in the apply step if it has been specified during fit. The same field(s) must be assigned.

  • Use the period parameter to specify if your data has a known periodicity.

  • If the period parameter is not specified it is computed automatically.

  • Set period=1 to treat the time series as non-periodic.

  • As with other MLTK algorithms, the partial_fit parameter controls whether a model should be incrementally updated or not. This allows you to update a model using only new data without having to retrain the model on the full dataset.

  • The default for partial_fit is False.

  • Use update_last to modify the behavior of partial_fit

  • The default for update_last is False.

  • If partial_fit=True StateSpaceForecast first updates the model parameters and then predicts.

  • If partial_fit=True and update_last=True StateSpaceForecast first predicts and then updates the model parameters. This allows you to review the forecast before running new data through.

  • The conf_interval=<1..99> parameter is the confidence interval in percentage around forecasted values. Input an integer between 1 and 99 where a larger number means a greater tolerance for forecast uncertainty. The default integer is 95.

  • Use the as field to assign aliases to forecasted fields.

  • In univariate cases the as field field-list is a single field name.

  • In multivariate cases, the as field adheres to the following conventions:

    • The list must be in double quotes, separated by either spaces or commas.
    • The aliases correspond to the original fields in the given order.
    • The number of aliases can be smaller than the number of original fields.
  • The summary command lists the names of the fields used in the fit command step, the name of the specialdays field, and the period.

  • The holdback parameter is the number of data points held back from training. This is useful for comparing the forecast against known data points. Default holdback value is 0.

  • If you want to maintain the holdback position, add the position number in forecast_k to your holdback value.

  • The forecast_k parameter tells StateSpaceForecast how many points into the future should be forecasted. If _time is specified during fitting along with the field_to_forecast, StateSpaceForecast also generates the timestamps for forecasted values. Default, forecast_k value is 0.

  • The holdback and forecast_k values can be of two types: an integer or a time range.

    • An integer specifies a number of events. An example of forecast_k=10 forecasts 10 events into the future. An example of holdback=10 withholds the last 10 events from training.
    • A time range takes the form XY where X is a non-negative integer and Y is either empty or adheres to format in the time range table. If Y is empty, then the time range is instead interpreted as an integer or a number of events. An example of holdback=3day forecast_k=1week withholds 3 days of events and forecasts 1 week's worth of events.

Note: The actual number of events withheld and forecasted using the time range option depends on the time interval between consecutive events.

Time range Acceptable formats for Y value
seconds s, sec, secs, second, seconds
minutes m, min, minute, minutes
hours h, hr, hrs, hour, hours
days d, day, days
weeks w, week, weeks
months mon, month, months
quarters q, qtr, qtrs, quarter, quarters
years y, yr, yrs, year, years

Syntax

| fit StateSpaceForecast <fields> [from *] [specialdays=<field name>] [holdback=<int | time-range>] [forecast_k=<int | time-range>] [conf_interval=<float>] [period=<int>]
[partial_fit=<true|false>] [update_last=<true|false>] [output_fit=<true|false>] [into <model name>] [as <field-list>]

You can apply the saved model to new data with the apply command.

| apply <model name> [specialdays=<field name>] [target=<fields>] [holdback=<int | time-range>] [forecast_k=<int | time-range>] [conf_interval=<float>]

You can inspect the model learned by StateSpaceForecast with the summary command.

| summary <model name>

Syntax constraints

  • For univariate analysis the fields parameter is a single field, but for multivariate analysis it is a list of fields.
  • For multivariate analysis, only one specialdays field can be specified and it applies to all the fields.
  • The specialdays field values must be numeric.
  • Null values in the specialdays field are treated as 0.
  • Double quotes are required around field lists.
  • Scoring metric values are based on the holdback period data.

Examples

The following is a univariate example of StateSpaceForecast on a test set. The example is considered univariate as there is only a single field following | fit StateSpaceForecast. The example dataset is derived from the milk.csv dataset that ships with MLTK. The milk2.csv has a new column named holiday. This column has two values 0 and 1. The 0 value represents no holiday and 1 value represents a holiday for the associated date. The 1 values were set randomly.

| inputlookup milk2.csv
| fit StateSpaceForecast milk_production from * specialdays=holiday into milk_model
| apply milk_model specialdays=holiday forecast_k=30

The following is a multivariate example of StateSpaceForecast on a test set. The syntax is the same as that in the univariate example, except that this case has a list of fields (CRM, ERP, and Expenses) following | fit StateSpaceForecast, making it multivariate.

| inputlookup app_usage.csv
| fields CRM ERP Expenses
| fit StateSpaceForecast CRM ERP Expenses holdback=12 into app_usage_model as "crm, erp"

The following example is also multivariate and includes the target field. In this example the fields of CRM and ERP are forecast using historical data and the Expenses field. The apply command is used against the model created in the fit command step, resulting in the app_usage_model model.

Note: Double quotes are required around any field list.

| inputlookup app_usage.csv
| fields CRM ERP Expenses
| apply app_usage_model target="CRM, ERP" forecast_k=36 holdback=36

The following example is again multivariate but without the target field. This example forecasts the fields CRM, ERP, and Expenses using historical data.

| inputlookup app_usage.csv
| fields CRM ERP Expenses
| apply app_usage_model forecast_k=36 holdback=36

The following example uses the wildcard (*) character to specify the three fields of total_accidents, front_accidents, and rear_accidents.

| inputlookup UKfrontrearseatKSI.csv
| eval total_accidents='British drivers KSI'
| eval front_accidents='front seat KSI'
| eval rear_accidents='rear seat KSI'
| fit StateSpaceForecast *accidents holdback=30 from * forecast_k=10

The following example shows how to improve your output with StateSpaceForecast.

| inputlookup cyclical_business_process_with_external_anomalies.csv
| eval holiday=if(random()%100<98,0,1)
| fit StateSpaceForecast logons from logons into My_Model forecast_k=3000

Adding of the SPL line period=2016 could improve the output, but would not account for the period being seven days rather than twenty-four hours.

| inputlookup cyclical_business_process_with_external_anomalies.csv
| table _time,logons
| eval holiday=if(random()%100<98,0,1)
| eval dayOfWeek=strftime(_time,"%a")
| eval holidayWeekend=case(in(dayOfWeek,"Sat","Sun"),1,true(),0)
| apply MyBadModel specialdays=holidayWeekend forecast_k=3000
| eval old_predict='predicted(logons)'
| eval dayOfWeek=strftime(_time,"%a")
| eval holidayWeekend=case(in(dayOfWeek,"Sat","Sun"),1,true(),0)
| apply My_Model specialdays=holidayWeekend holdback=3000 forecast_k=3000

Local availability Permalink to this section

Source Permalink to this section

Adapted from the Splunk AI Toolkit 5.6.4 documentation at /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/algorithms-and-scoring-metrics-in-the-ai-toolkit/algorithms-in-the-ai-toolkit (section: forecasting).

Press Cmd/Ctrl+K to focus search. Esc to close.

Type to search the portal.