Models

Upload and inference pre-trained AWS SageMaker models in the AI Toolkit

In version 5.6.4 and higher of the AI Toolkit you can invoke pre-trained AWS SageMaker machine learning (ML) models for inferencing in the toolkit.

In version 5.6.4 and higher of the AI Toolkit you can invoke pre-trained AWS SageMaker machine learning (ML) models for inferencing in the toolkit.

The SageMaker Inference Endpoint Integration feature lets AI Toolkit users invoke their own advanced, custom-built, AWS SageMaker–hosted models directly from Splunk platform searches, dashboards, and alerts, bringing model predictions into Splunk platform workflows using the familiar ML-SPL apply command.

Pro-code users can operationalize advanced ML workloads within the Splunk platform while leveraging SageMaker's managed infrastructure for scalable inference. This eliminates GPU, CPU, and Python library limitations, allowing for inference on large, complex, or custom ML models hosted in AWS, without overloading the search head.

Note: SageMaker models follow the same permission rules as other models you create in the AI Toolkit.

Key benefits Permalink to this section

The SageMaker Inference Endpoint Integration feature offers the following benefits:

  • Advanced model support by using SageMaker to host and scale complex ML models that the AI Toolkit cannot run on the Splunk search head.

  • Improved scale and performance by offloading the heavy, high‑cardinality inference from the Splunk search head to managed SageMaker endpoints.

  • Faster operationalization by invoking SageMaker models directly from Splunk platform searches, dashboards, and alerts with a single ML-SPL command.

SageMaker feature permissions Permalink to this section

See the following table the permissions needed to perform SageMaker Inference Endpoint Integration feature operations:

Note: All users can run inference on registered models. Users without the edit_endpoints capability can run models but cannot register new models.

SageMaker model inference operation Required permissions
Edit, create, test, and delete edit_endpoints,``edit_storage_passwords, and list_storage_passwords
Use the apply command to invoke the SageMaker model Search permissions and list_storage_passwords

SageMaker feature requirements Permalink to this section

You must meet the following requirements to use the SageMaker Inference Endpoint Integration feature:

  • You must be a user of the AWS SageMaker Service and are expected to manage your own AWS customer configurations.

    • All costs associated with SageMaker training and inference are borne directly by the customer in their AWS account.
  • Completion of configuration steps within your instance of AWS SageMaker.

  • Completion of configuration steps from the ML models tab of the AI Toolkit.

SageMaker feature workflow overview Permalink to this section

See the following table for the high-level workflow of the SageMaker Inference Endpoint Integration feature:

Workflow step Description
Build and deploy in AWS Customers use their existing AWS accounts and SageMaker expertise to build, train, and deploy their models.
Register in the Splunk platform An administrator securely registers the SageMaker model in the AI Toolkit's governed catalog. This is a one-time, secure setup using IAM roles, with no exposed credentials.

Note: The AI Toolkit supports certain content types for the SageMaker Inference Endpoint. See Supported content types.

.
Invoke the model in the Splunk platform Pro-code users leverage the apply command in SPL to run the model against their Splunk platform data. Predictions appear directly in Splunk platform searches, dashboards, and alerts for instant operationalization.

Syntax: `

Supported content types Permalink to this section

The AI Toolkit supports the following content types for the SageMaker Inference Endpoint:

Content type Sample input feature mapping Sample output feature mapping Open API spec
application/json JSONCopy
json<br>{ <br>"cpu_usage": "instances[*].cpu_usage", <br>"memory_usage": "instances[*].memory_usage", <br>"disk_io": "instances[*].disk_io", <br>"network_latency": "instances[*].network_latency", <br>"error_count": "instances[*].error_count" <br>}<br>
JSONCopy
json<br>{ <br>"result[*].prediction": "log_severity", <br>"result[*].confidence": "confidence" <br>}<br>
Note: Supports only json spec and version 3.0.0.

JSONCopy
json<br>{ <br> "openapi": "3.0.0", <br> "info": { <br> "title": "Log Event Severity Classification API", <br> "version": "1.0.0", <br> "description": "Classifies log events into severity levels with confidence scores" <br> }, <br> "paths": { <br> "/invocations": { <br> "post": { <br> "requestBody": { <br> "content": { <br> "application/json": { <br> "schema": { <br> "type": "object", <br> "properties": { <br> "instances": { <br> "type": "array", <br> "items": { <br> "type": "object", <br> "properties": { <br> "cpu_usage": { <br> "type": "number" <br> }, <br> "memory_usage": { <br> "type": "number" <br> }, <br> "disk_io": { <br> "type": "number" <br> }, <br> "network_latency": { <br> "type": "number" <br> }, <br> "error_count": { <br> "type": "integer" <br> } <br> }, <br> "required": [ <br> "cpu_usage", <br> "memory_usage", <br> "disk_io", <br> "network_latency", <br> "error_count" <br> ] <br> } <br> } <br> }, <br> "required": [ <br> "instances" <br> ] <br> } <br> } <br> } <br> }, <br> "responses": { <br> "200": { <br> "content": { <br> "application/json": { <br> "schema": { <br> "type": "object", <br> "properties": { <br> "result": { <br> "type": "array", <br> "items": { <br> "type": "object", <br> "properties": { <br> "prediction": { <br> "type": "integer" <br> }, <br> "confidence": { <br> "type": "number" <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br>}<br>
text/ csv For CSV no Mapping is needed as the data converts to CSV without column header and sends it through payload. Provide the empty {}. Sample the same way as input feature mapping. Note: Supports only json spec and version 3.0.0.

JSONCopy
json<br>{ <br> "openapi": "3.0.0", <br> "paths": { <br> "/invocations": { <br> "post": { <br> "requestBody": { <br> "content": { <br> "text/csv": { <br> "schema": { <br> "type": "string" <br> }, <br> "example": "97,166,734,489\n84,523,892,347" <br> } <br> } <br> }, <br> "responses": { <br> "200": { <br> "content": { <br> "text/csv": { <br> "schema": { <br> "type": "string" <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br> } <br>}<br>

SPL examples Permalink to this section

See the following SPL examples of the apply command being called to invoke a SageMaker model:

Example 1 Permalink to this section

| makeresults count=6
| streamstats count
| eval cpu_usage = case(count=1, 45.2, count=2, 89.5, count=3, 23.1, count=4, 34.15, count=5, 31.64, count=6, 98.45)
| eval memory_usage = case(count=1, 62.3, count=2, 94.7, count=3, 38.5, count=4, 50.4, count=5, 43.61, count=6, 104.17)
| eval disk_io = case(count=1, 125.5, count=2, 876.2, count=3, 45.8, count=4, 85.65, count=5, 87.85, count=6, 963.82)
| eval network_latency = case(count=1, 12.3, count=2, 156.7, count=3, 8.2, count=4, 10.25, count=5, 8.61, count=6, 172.37)
| eval error_count = case(count=1, 2, count=2, 47, count=3, 0, count=4, 1, count=5, 1, count=6, 51)
| table cpu_usage memory_usage disk_io network_latency error_count
| fields - _time
| apply sg_metric_alert_classification runtime=sagemaker features="cpu_usage,memory_usage,disk_io,network_latency,error_count

Example 2 Permalink to this section

| makeresults count=6
| streamstats count
| eval cpu_usage = case(count=1, 45.2, count=2, 89.5, count=3, 23.1, count=4, 34.15, count=5, 31.64, count=6, 98.45)
| eval memory_usage = case(count=1, 62.3, count=2, 94.7, count=3, 38.5, count=4, 50.4, count=5, 43.61, count=6, 104.17)
| eval disk_io = case(count=1, 125.5, count=2, 876.2, count=3, 45.8, count=4, 85.65, count=5, 87.85, count=6, 963.82)
| eval network_latency = case(count=1, 12.3, count=2, 156.7, count=3, 8.2, count=4, 10.25, count=5, 8.61, count=6, 172.37)
| eval error_count = case(count=1, 2, count=2, 47, count=3, 0, count=4, 1, count=5, 1, count=6, 51)
| table cpu_usage memory_usage disk_io network_latency error_count
| fields - _time
| apply sg_classification-nested-model runtime=sagemaker features="cpu_usage,memory_usage,disk_io,network_latency,error_count"

Supported input and output feature mapping patterns Permalink to this section

The AI Toolkit supports the following content types for the SageMaker Inference Endpoint:

  • application/ json

  • text/ csv

When completing model registration use the following guidelines for the batch_size parameter:

  • The range is 1 to 10,000 records.

  • If batch_size=1 it is single record mode. Otherwise it is multi-record mode.

Batch Named Object > Scalar Array Permalink to this section

Pattern detection through object [*] field and output predictions [*]. Use when the request is an array of JSON objects and the response is a flat array of scalars.

Dataframe (rows > batch) Maps Request Response Final columns
Columns: cpu_usage, memory_usage, error_count
Example rows:
75, 6.2, 1
42, 3.1, 0
"input_feature_map": { "cpu_usage": "instances[*].cpu_usage", "memory_usage": "instances[*].memory_usage", "error_count": "instances[*].error_count" }, "output_prediction_map": { "predictions[*]": "alert_severity" } { "instances": [ {"cpu_usage": 75, "memory_usage": 6.2, "error_count": 1}, {"cpu_usage": 42, "memory_usage": 3.1, "error_count": 0} ] } { "predictions": ["HIGH", "LOW"] } cpu_usage, memory_usage, error_count, alert_severity

Batch Positional Array > Scalar Array Permalink to this section

Use when request wants [[f1,f2,...], ...] and response is flat per row. Validator orders columns by the map. CSV is allowed but JSON is shown here.

Dataframe Maps Request Response Final columns
Columns: request_count, cpu_load, memory_gb
Rows:
120, 0.73, 4.0 300, 0.91, 8.0
"input_feature_map": { "request_count": "instances[*][0]", "cpu_load": "instances[*][1]", "memory_gb": "instances[*][2]" }, "output_prediction_map": { "predictions[*]": "latency_ms" } { "instances": [[120, 0.73, 4.0], [300, 0.91, 8.0]] } { "predictions": [12.4, 38.7] } request_count, cpu_load, memory_gb, latency_ms

Batch Nested Object > Scalar Array Permalink to this section

Use when each item is a nested object. Path detection builds the nested structure. No hard coded parent keys required by the engine.

Dataframe Maps Request Response Final columns
Columns: pod_cpu, pod_mem, node_pressure
Rows:
0.65, 2.1, 0.3
0.90, 4.2, 0.6
"input_feature_map": { "pod_cpu": "instances[*].k8s.pod.cpu", "pod_mem": "instances[*].k8s.pod.memory", "node_pressure": "instances[*].node.pressure" }, "output_prediction_map": { "predictions[*]": "cluster_risk" } { "instances": [ {"k8s": {"pod": {"cpu": 0.65, "memory": 2.1}}, "node": {"pressure": 0.3}}, {"k8s": {"pod": {"cpu": 0.90, "memory": 4.2}}, "node": {"pressure": 0.6}} ] } { "predictions": [0.12, 0.88] } pod_cpu, pod_mem, node_pressure, cluster_risk

Batch Positional Array > 2D Array (multi-output) Permalink to this section

Use when each row returns multiple outputs. For example, score + confidence. Output map uses positional indices.

Dataframe Maps Request Response Final columns
Columns: latency_p50, latency_p95, latency_p99, packet_loss, bandwidth_mbps "input_feature_map": { "latency_p50": "instances[*][0]", "latency_p95": "instances[*][1]", "latency_p99": "instances[*][2]", "packet_loss": "instances[*][3]", "bandwidth_mbps": "instances[*][4]" }, "output_prediction_map": { "predictions[*][0]": "anomaly_score", "predictions[*][1]": "confidence" } { "instances": [ [33.2, 85.0, 120.1, 0.01, 200.0], [45.0, 120.0, 200.2, 0.04, 150.0] ] } { "predictions": [[0.12, 0.86], [0.98, 0.22]] } latency_p50, latency_p95, latency_p99, packet_loss, bandwidth_mbps, anomaly_score, confidence

Single Named Object > Single Value Permalink to this section

Use when real-time single record in, scalar out with no [*] present. Detection distinguishes single vs batch by the absence of [*].

Dataframe (1 row) Maps Request Response Final columns
Columns: container_cpu, container_memory, pod_age_hours
Row: 0.82, 1.2, 36
"input_feature_map": { "container_cpu": "container_cpu", "container_memory": "container_memory", "pod_age_hours": "pod_age_hours" }, "output_prediction_map": { "prediction": "health_score" } { "container_cpu": 0.82, "container_memory": 1.2, "pod_age_hours": 36 } { "prediction": 0.91 } container_cpu, container_memory, pod_age_hours, health_score

Single Nested Object > Single Value Permalink to this section

Use when one nested object is required; output is a nested scalar we rename using the output map.

Dataframe (1 row) Maps Request Response Final columns
Columns: cpu_spike_percent, avg_cpu_usage, geo_distance_km
Row: 45, 0.31, 12.7
"input_feature_map": { "cpu_spike_percent": "system.cpu_spike_percent", "avg_cpu_usage": "application.avg_cpu_usage", "geo_distance_km": "geo.distance_km" }, "output_prediction_map": { "score.value": "risk_score" } { "system": {"cpu_spike_percent": 45}, "application": {"avg_cpu_usage": 0.31}, "geo": {"distance_km": 12.7} } { "score": { "value": 0.73 } } cpu_spike_percent, avg_cpu_usage, geo_distance_km, risk_score

Single Positional Array > Single Value Permalink to this section

Use when single record as array [f0,f1,...] and scalar out. Parent keys are inferred by schema. No hard coded names.

Dataframe (1 row) Maps Request Response Final columns
Columns: query_complexity, table_rows, index_count
Row: 0.7, 1500000, 8
"input_feature_map": { "query_complexity": "[0]", "table_rows": "[1]", "index_count": "[2]" }, "output_prediction_map": { "value": "estimate_ms" } Example schema uses root key data. { "data": [[0.7, 1500000, 8]] } { "value": 42.0 } query_complexity, table_rows, index_count, estimate_ms

Batch Root Array Key > Array Permalink to this section

Use when the schema's root contains the batch array such as data[*], and output uses a different root such as result[*]. Parent keys come from schema or paths and are not hard coded.

Dataframe Maps Request Response Final columns
Columns: throughput, latency
Rows: 200, 12.0 | 120, 30.0
"input_feature_map": { "throughput": "data[*][0]", "latency": "data[*][1]" }, "output_prediction_map": { "result[*]": "slo_breach_prob" } { "data": [[200, 12.0], [120, 30.0]] } { "result": [0.01, 0.77] } throughput, latency, slo_breach_prob

Batch Named Object > Object Array (multi-field) Permalink to this section

Use when response per item is an object with multiple fields. Output map renames them. Robust parser fills missing fields with NaN and pads or truncates to match input row count.

Dataframe Maps Request Response Final columns
Columns: req_rate, err_rate, p95
Rows: 1200, 0.02, 180 | 900, 0.05, 250
"input_feature_map": { "req_rate": "inputs[*].req_rate", "err_rate": "inputs[*].err_rate", "p95": "inputs[*].latency.p95" }, "output_prediction_map": { "predictions[*].class": "predicted_class", "predictions[*].prob": "confidence" } { "inputs": [ {"req_rate": 1200, "err_rate": 0.02, "latency": {"p95": 180}}, {"req_rate": 900, "err_rate": 0.05, "latency": {"p95": 250}} ] } { "predictions": [ {"class": "NORMAL", "prob": 0.81}, {"class": "AT_RISK", "prob": 0.62} ] } req_rate, err_rate, p95, predicted_class, confidence

SageMaker feature configuration steps Permalink to this section

Configuration is a one-time, secure setup that uses IAM roles with no exposed credentials.

The following are the minimal AWS permissions this IAM role requires, along with an AllowAssumeRole permission to enable secure role assumption through STS:

  • "sagemaker:InvokeEndpoint"

  • "sagemaker:InvokeEndpointAsync"

  • "sagemaker:DescribeEndpoint"

  • "sagemaker:DescribeEndpointConfig"

  • "sagemaker:ListEndpoints"

  • "sagemaker:ListModels"

  • "sagemaker:DescribeModel"

Example IAM policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "SageMakerInferenceAccess",
      "Effect": "Allow",
      "Action": [
        "sagemaker:InvokeEndpoint",
        "sagemaker:InvokeEndpointAsync",
        "sagemaker:DescribeEndpoint",
        "sagemaker:DescribeEndpointConfig",
        "sagemaker:ListEndpoints",
        "sagemaker:ListModels",
        "sagemaker:DescribeModel"
      ],
      "Resource": "*"
    },
    {
      "Sid": "AllowAssumeRole",
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": "*"
    }
  ]
}

Complete the following steps:

  1. Log into the AI Toolkit and navigate to the Models tab.

  2. As show in the following image, from the +Model button, choose SageMaker:

This image shows the Models tab of the AI Toolkit. The +Model button on the far right is selected and the option for SageMaker is highlighted.

  1. On the Add SageMaker model window, complete the following fields:
Field name Description
Model name Required field. The name of the model name as created in SageMaker. Model name is created using the AWS SageMaker "Create endpoint workflow".

The model name must be unique and free of special characters.
Description Optional field. Input a description to explain the model's purpose and intended use.
Endpoint Required field. Endpoint is a SageMaker Inference Endpoint Name, created using the AWS Sagemaker "Create endpoint workflow".
AWS region Required field. Taken from your AWS credentials.
AWS access key ID Required field. Taken from your AWS credentials.
IAM role ARN Required field. The IAM role used for the SageMaker inference API call. It assumes STS to generate temporary credentials and has the necessary permissions to invoke the SageMaker model.
  1. Select Test connection to confirm the connection information is correctly added.

  2. If you see a Connection successful message, continue to step 5.

  3. If you see an Unable to establish connection message, check the information you added in step 3 and try again.

  4. Complete the remaining fields:

Field name Description
Input feature mapping Required field.
Output feature mapping Required field.
Open API for inference endpoint Required field.
SPL results batch size Required field. Number of rows the SageMaker deployed end point can accept for each inference invocation. Default is 1 and maximum is 10,000.
  1. Select Add Model when done.

Source: /en/splunk-cloud-platform/apply-machine-learning/use-ai-toolkit/5.6.4/ai-toolkit-models/upload-and-inference-pre-trained-aws-sagemaker-models-in-the-ai-toolkit (upstream Splunk AITK 5.6.4 docs)

Press Cmd/Ctrl+K to focus search. Esc to close.

Type to search the portal.