Upload and inference pre-trained AWS SageMaker models in the AI Toolkit

In version 5.6.4 and higher of the AI Toolkit you can invoke pre-trained AWS SageMaker machine learning (ML) models for inferencing in the toolkit.

The SageMaker Inference Endpoint Integration feature lets AI Toolkit users invoke their own advanced, custom-built, AWS SageMaker–hosted models directly from Splunk platform searches, dashboards, and alerts, bringing model predictions into Splunk platform workflows using the familiar ML-SPL apply command.

Pro-code users can operationalize advanced ML workloads within the Splunk platform while leveraging SageMaker's managed infrastructure for scalable inference. This eliminates GPU, CPU, and Python library limitations, allowing for inference on large, complex, or custom ML models hosted in AWS, without overloading the search head.

Note: SageMaker models follow the same permission rules as other models you create in the AI Toolkit.

Key benefits Permalink to this section

The SageMaker Inference Endpoint Integration feature offers the following benefits:

Advanced model support by using SageMaker to host and scale complex ML models that the AI Toolkit cannot run on the Splunk search head.
Improved scale and performance by offloading the heavy, high‑cardinality inference from the Splunk search head to managed SageMaker endpoints.
Faster operationalization by invoking SageMaker models directly from Splunk platform searches, dashboards, and alerts with a single ML-SPL command.

SageMaker feature permissions Permalink to this section

See the following table the permissions needed to perform SageMaker Inference Endpoint Integration feature operations:

Note: All users can run inference on registered models. Users without the edit_endpoints capability can run models but cannot register new models.

SageMaker model inference operation	Required permissions
Edit, create, test, and delete	edit_endpoints,``edit_storage_passwords, and `list_storage_passwords`
Use the `apply` command to invoke the SageMaker model	Search permissions and `list_storage_passwords`

SageMaker feature requirements Permalink to this section

You must meet the following requirements to use the SageMaker Inference Endpoint Integration feature:

You must be a user of the AWS SageMaker Service and are expected to manage your own AWS customer configurations.
- All costs associated with SageMaker training and inference are borne directly by the customer in their AWS account.
Completion of configuration steps within your instance of AWS SageMaker.
Completion of configuration steps from the ML models tab of the AI Toolkit.

SageMaker feature workflow overview Permalink to this section

See the following table for the high-level workflow of the SageMaker Inference Endpoint Integration feature:

Workflow step	Description
Build and deploy in AWS	Customers use their existing AWS accounts and SageMaker expertise to build, train, and deploy their models.
Register in the Splunk platform	An administrator securely registers the SageMaker model in the AI Toolkit's governed catalog. This is a one-time, secure setup using IAM roles, with no exposed credentials. Note: The AI Toolkit supports certain content types for the SageMaker Inference Endpoint. See Supported content types. .
Invoke the model in the Splunk platform	Pro-code users leverage the `apply` command in SPL to run the model against their Splunk platform data. Predictions appear directly in Splunk platform searches, dashboards, and alerts for instant operationalization. Syntax: `

Supported content types Permalink to this section

The AI Toolkit supports the following content types for the SageMaker Inference Endpoint:

Content type	Sample input feature mapping	Sample output feature mapping	Open API spec
application/json	JSONCopy `json<br>{ <br>"cpu_usage": "instances[].cpu_usage", <br>"memory_usage": "instances[].memory_usage", <br>"disk_io": "instances[].disk_io", <br>"network_latency": "instances[].network_latency", <br>"error_count": "instances[*].error_count" <br>}<br>`	JSONCopy `json<br>{ <br>"result[].prediction": "log_severity", <br>"result[].confidence": "confidence" <br>}<br>`	Note: Supports only json spec and version 3.0.0. JSONCopy json<br>{ <br> "openapi": "3.0.0", <br> "info": { <br> "title": "Log Event Severity Classification API", <br> "version": "1.0.0", <br> "description": "Classifies log events into severity levels with confidence scores" <br> }, <br> "paths": { <br> "/invocations": { <br> "post": { <br> "requestBody": { <br> "content": { <br> "application/json": { <br> "schema": { <br> "type": "object", <br> "properties": { <br> "instances": { <br> "type": "array", <br> "items": { <br> "type": "object", <br> "properties": { <br> "cpu_usage": { <br> "type": "number" <br> }, <br> "memory_usage": { <br> "type": "number" <br> }, <br> "disk_io": { <br> "type": "number" <br> }, <br> "network_latency": { <br> "type": "number" <br> }, <br> "error_count": { <br> "type": "integer" <br> } <br> }, <br> "required": [ <br> "cpu_usage", <br> "memory_usage", <br> "disk_io", <br> "network_latency", <br> "error_count" <br> ] <br> } <br> } <br> }, <br> "required": [ <br> "instances" <br> ] <br> } <br> } <br> } <br> }, <br> "responses": { <br> "200": { <br> "content": { <br> "application/json": { <br> "schema": { <br> "type": "object", <br> "properties": { <br> "result": { <br> "type": "array", <br> "items": { <br> "type": "object", <br> "properties": { <br> "prediction": { <br> "type": "integer" <br> }, <br> "confidence": { <br> "type": "number" <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br>}<br>
text/ csv	For CSV no Mapping is needed as the data converts to CSV without column header and sends it through payload. Provide the empty `{}`.	Sample the same way as input feature mapping.	Note: Supports only json spec and version 3.0.0. JSONCopy json<br>{ <br> "openapi": "3.0.0", <br> "paths": { <br> "/invocations": { <br> "post": { <br> "requestBody": { <br> "content": { <br> "text/csv": { <br> "schema": { <br> "type": "string" <br> }, <br> "example": "97,166,734,489\n84,523,892,347" <br> } <br> } <br> }, <br> "responses": { <br> "200": { <br> "content": { <br> "text/csv": { <br> "schema": { <br> "type": "string" <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br>}<br>

SPL examples Permalink to this section

See the following SPL examples of the apply command being called to invoke a SageMaker model:

Example 1 Permalink to this section

| makeresults count=6
| streamstats count
| eval cpu_usage = case(count=1, 45.2, count=2, 89.5, count=3, 23.1, count=4, 34.15, count=5, 31.64, count=6, 98.45)
| eval memory_usage = case(count=1, 62.3, count=2, 94.7, count=3, 38.5, count=4, 50.4, count=5, 43.61, count=6, 104.17)
| eval disk_io = case(count=1, 125.5, count=2, 876.2, count=3, 45.8, count=4, 85.65, count=5, 87.85, count=6, 963.82)
| eval network_latency = case(count=1, 12.3, count=2, 156.7, count=3, 8.2, count=4, 10.25, count=5, 8.61, count=6, 172.37)
| eval error_count = case(count=1, 2, count=2, 47, count=3, 0, count=4, 1, count=5, 1, count=6, 51)
| table cpu_usage memory_usage disk_io network_latency error_count
| fields - _time
| apply sg_metric_alert_classification runtime=sagemaker features="cpu_usage,memory_usage,disk_io,network_latency,error_count

Example 2 Permalink to this section

| makeresults count=6
| streamstats count
| eval cpu_usage = case(count=1, 45.2, count=2, 89.5, count=3, 23.1, count=4, 34.15, count=5, 31.64, count=6, 98.45)
| eval memory_usage = case(count=1, 62.3, count=2, 94.7, count=3, 38.5, count=4, 50.4, count=5, 43.61, count=6, 104.17)
| eval disk_io = case(count=1, 125.5, count=2, 876.2, count=3, 45.8, count=4, 85.65, count=5, 87.85, count=6, 963.82)
| eval network_latency = case(count=1, 12.3, count=2, 156.7, count=3, 8.2, count=4, 10.25, count=5, 8.61, count=6, 172.37)
| eval error_count = case(count=1, 2, count=2, 47, count=3, 0, count=4, 1, count=5, 1, count=6, 51)
| table cpu_usage memory_usage disk_io network_latency error_count
| fields - _time
| apply sg_classification-nested-model runtime=sagemaker features="cpu_usage,memory_usage,disk_io,network_latency,error_count"

Supported input and output feature mapping patterns Permalink to this section

The AI Toolkit supports the following content types for the SageMaker Inference Endpoint:

application/ json
text/ csv

When completing model registration use the following guidelines for the batch_size parameter:

The range is 1 to 10,000 records.
If batch_size=1 it is single record mode. Otherwise it is multi-record mode.

Batch Named Object > Scalar Array Permalink to this section

Pattern detection through object [*] field and output predictions [*]. Use when the request is an array of JSON objects and the response is a flat array of scalars.

Dataframe (rows > batch)	Maps	Request	Response	Final columns
Columns: `cpu_usage`, `memory_usage`, `error_count` Example rows: 75, 6.2, 1 42, 3.1, 0	`"input_feature_map": { "cpu_usage": "instances[].cpu_usage", "memory_usage": "instances[].memory_usage", "error_count": "instances[].error_count" }, "output_prediction_map": { "predictions[]": "alert_severity" }`	`{ "instances": [ {"cpu_usage": 75, "memory_usage": 6.2, "error_count": 1}, {"cpu_usage": 42, "memory_usage": 3.1, "error_count": 0} ] }`	`{ "predictions": ["HIGH", "LOW"] }`	`cpu_usage`, `memory_usage`, `error_count`, `alert_severity`

Batch Positional Array > Scalar Array Permalink to this section

Use when request wants [[f1,f2,...], ...] and response is flat per row. Validator orders columns by the map. CSV is allowed but JSON is shown here.

Dataframe	Maps	Request	Response	Final columns
Columns: `request_count`, `cpu_load`, `memory_gb` Rows: 120, 0.73, 4.0 300, 0.91, 8.0	`"input_feature_map": { "request_count": "instances[][0]", "cpu_load": "instances[][1]", "memory_gb": "instances[][2]" }, "output_prediction_map": { "predictions[]": "latency_ms" }`	`{ "instances": [[120, 0.73, 4.0], [300, 0.91, 8.0]] }`	`{ "predictions": [12.4, 38.7] }`	`request_count`, `cpu_load`, `memory_gb`, `latency_ms`

Batch Nested Object > Scalar Array Permalink to this section

Use when each item is a nested object. Path detection builds the nested structure. No hard coded parent keys required by the engine.

Dataframe	Maps	Request	Response	Final columns
Columns: `pod_cpu`, `pod_mem`, `node_pressure` Rows: 0.65, 2.1, 0.3 0.90, 4.2, 0.6	`"input_feature_map": { "pod_cpu": "instances[].k8s.pod.cpu", "pod_mem": "instances[].k8s.pod.memory", "node_pressure": "instances[].node.pressure" }, "output_prediction_map": { "predictions[]": "cluster_risk" }`	`{ "instances": [ {"k8s": {"pod": {"cpu": 0.65, "memory": 2.1}}, "node": {"pressure": 0.3}}, {"k8s": {"pod": {"cpu": 0.90, "memory": 4.2}}, "node": {"pressure": 0.6}} ] }`	`{ "predictions": [0.12, 0.88] }`	`pod_cpu`, `pod_mem`, `node_pressure`, `cluster_risk`

Batch Positional Array > 2D Array (multi-output) Permalink to this section

Use when each row returns multiple outputs. For example, score + confidence. Output map uses positional indices.

Dataframe	Maps	Request	Response	Final columns
Columns: `latency_p50`, `latency_p95`, `latency_p99`, `packet_loss`, `bandwidth_mbps`	`"input_feature_map": { "latency_p50": "instances[][0]", "latency_p95": "instances[][1]", "latency_p99": "instances[][2]", "packet_loss": "instances[][3]", "bandwidth_mbps": "instances[][4]" }, "output_prediction_map": { "predictions[][0]": "anomaly_score", "predictions[*][1]": "confidence" }`	`{ "instances": [ [33.2, 85.0, 120.1, 0.01, 200.0], [45.0, 120.0, 200.2, 0.04, 150.0] ] }`	`{ "predictions": [[0.12, 0.86], [0.98, 0.22]] }`	`latency_p50`, `latency_p95`, `latency_p99`, `packet_loss`, `bandwidth_mbps`, `anomaly_score`, `confidence`

Single Named Object > Single Value Permalink to this section

Use when real-time single record in, scalar out with no [*] present. Detection distinguishes single vs batch by the absence of [*].

Dataframe (1 row)	Maps	Request	Response	Final columns
Columns: `container_cpu`, `container_memory`, `pod_age_hours` Row: 0.82, 1.2, 36	`"input_feature_map": { "container_cpu": "container_cpu", "container_memory": "container_memory", "pod_age_hours": "pod_age_hours" }, "output_prediction_map": { "prediction": "health_score" }`	`{ "container_cpu": 0.82, "container_memory": 1.2, "pod_age_hours": 36 }`	`{ "prediction": 0.91 }`	`container_cpu`, `container_memory`, `pod_age_hours`, `health_score`

Single Nested Object > Single Value Permalink to this section

Use when one nested object is required; output is a nested scalar we rename using the output map.

Dataframe (1 row)	Maps	Request	Response	Final columns
Columns: `cpu_spike_percent`, `avg_cpu_usage`, `geo_distance_km` Row: 45, 0.31, 12.7	`"input_feature_map": { "cpu_spike_percent": "system.cpu_spike_percent", "avg_cpu_usage": "application.avg_cpu_usage", "geo_distance_km": "geo.distance_km" }, "output_prediction_map": { "score.value": "risk_score" }`	`{ "system": {"cpu_spike_percent": 45}, "application": {"avg_cpu_usage": 0.31}, "geo": {"distance_km": 12.7} }`	`{ "score": { "value": 0.73 } }`	`cpu_spike_percent`, `avg_cpu_usage`, `geo_distance_km`, `risk_score`

Single Positional Array > Single Value Permalink to this section

Use when single record as array [f0,f1,...] and scalar out. Parent keys are inferred by schema. No hard coded names.

Dataframe (1 row)	Maps	Request	Response	Final columns
Columns: `query_complexity`, `table_rows`, `index_count` Row: 0.7, 1500000, 8	`"input_feature_map": { "query_complexity": "[0]", "table_rows": "[1]", "index_count": "[2]" }, "output_prediction_map": { "value": "estimate_ms" }`	Example schema uses root key data. `{ "data": [[0.7, 1500000, 8]] }`	`{ "value": 42.0 }`	`query_complexity`, `table_rows`, `index_count`, `estimate_ms`

Batch Root Array Key > Array Permalink to this section

Use when the schema's root contains the batch array such as data[*], and output uses a different root such as result[*]. Parent keys come from schema or paths and are not hard coded.

Dataframe	Maps	Request	Response	Final columns
Columns: `throughput`, `latency` Rows: 200, 12.0 \| 120, 30.0	`"input_feature_map": { "throughput": "data[][0]", "latency": "data[][1]" }, "output_prediction_map": { "result[*]": "slo_breach_prob" }`	`{ "data": [[200, 12.0], [120, 30.0]] }`	`{ "result": [0.01, 0.77] }`	`throughput`, `latency`, `slo_breach_prob`

Batch Named Object > Object Array (multi-field) Permalink to this section

Use when response per item is an object with multiple fields. Output map renames them. Robust parser fills missing fields with NaN and pads or truncates to match input row count.

Dataframe	Maps	Request	Response	Final columns
Columns: `req_rate`, `err_rate`, `p95` Rows: 1200, 0.02, 180 \| 900, 0.05, 250	`"input_feature_map": { "req_rate": "inputs[].req_rate", "err_rate": "inputs[].err_rate", "p95": "inputs[].latency.p95" }, "output_prediction_map": { "predictions[].class": "predicted_class", "predictions[*].prob": "confidence" }`	`{ "inputs": [ {"req_rate": 1200, "err_rate": 0.02, "latency": {"p95": 180}}, {"req_rate": 900, "err_rate": 0.05, "latency": {"p95": 250}} ] }`	`{ "predictions": [ {"class": "NORMAL", "prob": 0.81}, {"class": "AT_RISK", "prob": 0.62} ] }`	`req_rate`, `err_rate`, `p95`, `predicted_class`, `confidence`

SageMaker feature configuration steps Permalink to this section

Configuration is a one-time, secure setup that uses IAM roles with no exposed credentials.

The following are the minimal AWS permissions this IAM role requires, along with an AllowAssumeRole permission to enable secure role assumption through STS:

"sagemaker:InvokeEndpoint"
"sagemaker:InvokeEndpointAsync"
"sagemaker:DescribeEndpoint"
"sagemaker:DescribeEndpointConfig"
"sagemaker:ListEndpoints"
"sagemaker:ListModels"
"sagemaker:DescribeModel"

Example IAM policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "SageMakerInferenceAccess",
      "Effect": "Allow",
      "Action": [
        "sagemaker:InvokeEndpoint",
        "sagemaker:InvokeEndpointAsync",
        "sagemaker:DescribeEndpoint",
        "sagemaker:DescribeEndpointConfig",
        "sagemaker:ListEndpoints",
        "sagemaker:ListModels",
        "sagemaker:DescribeModel"
      ],
      "Resource": "*"
    },
    {
      "Sid": "AllowAssumeRole",
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": "*"
    }
  ]
}

Complete the following steps:

Log into the AI Toolkit and navigate to the Models tab.
As show in the following image, from the +Model button, choose SageMaker:

This image shows the Models tab of the AI Toolkit. The +Model button on the far right is selected and the option for SageMaker is highlighted.

On the Add SageMaker model window, complete the following fields:

Field name	Description
Model name	Required field. The name of the model name as created in SageMaker. Model name is created using the AWS SageMaker "Create endpoint workflow". The model name must be unique and free of special characters.
Description	Optional field. Input a description to explain the model's purpose and intended use.
Endpoint	Required field. Endpoint is a SageMaker Inference Endpoint Name, created using the AWS Sagemaker "Create endpoint workflow".
AWS region	Required field. Taken from your AWS credentials.
AWS access key ID	Required field. Taken from your AWS credentials.
IAM role ARN	Required field. The IAM role used for the SageMaker inference API call. It assumes STS to generate temporary credentials and has the necessary permissions to invoke the SageMaker model.

Select Test connection to confirm the connection information is correctly added.
If you see a Connection successful message, continue to step 5.
If you see an Unable to establish connection message, check the information you added in step 3 and try again.
Complete the remaining fields:

Field name	Description
Input feature mapping	Required field.
Output feature mapping	Required field.
Open API for inference endpoint	Required field.
SPL results batch size	Required field. Number of rows the SageMaker deployed end point can accept for each inference invocation. Default is 1 and maximum is 10,000.

Select Add Model when done.

Source: /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/ai-toolkit-models/upload-and-inference-pre-trained-aws-sagemaker-models-in-the-ai-toolkit (upstream Splunk AITK 5.6.4 docs)

Key benefits Permalink to this section#

SageMaker feature permissions Permalink to this section#

SageMaker feature requirements Permalink to this section#

SageMaker feature workflow overview Permalink to this section#

Supported content types Permalink to this section#

SPL examples Permalink to this section#

Example 1 Permalink to this section#

Example 2 Permalink to this section#

Supported input and output feature mapping patterns Permalink to this section#

Batch Named Object > Scalar Array Permalink to this section#

Batch Positional Array > Scalar Array Permalink to this section#

Batch Nested Object > Scalar Array Permalink to this section#

Batch Positional Array > 2D Array (multi-output) Permalink to this section#

Single Named Object > Single Value Permalink to this section#

Single Nested Object > Single Value Permalink to this section#

Single Positional Array > Single Value Permalink to this section#

Batch Root Array Key > Array Permalink to this section#

Batch Named Object > Object Array (multi-field) Permalink to this section#

SageMaker feature configuration steps Permalink to this section#