Skip to content

Security

The operator is designed for a security-conscious platform team: minimal privilege, no data exfiltration, a signed supply chain, and offline licensing.

The chart installs one ClusterRole (and a ClusterRoleBinding to the operator’s service account). Every rule and its justification:

API groupResourcesVerbsWhy
autoscalinghorizontalpodautoscalersget, list, watch, patch, updateRead and tune HPAs. One of only two mutating grants.
keda.shscaledobjectsget, list, watch, patch, updateRead and tune KEDA ScaledObjects. The other mutating grant.
"" (core)secretsgetRead the offline license Secret at runtime to verify it. get only - no list/watch, so it cannot enumerate cluster Secrets.
appsdeploymentsget, list, watchRead-only context for the workload behind an HPA.
metrics.k8s.iopods, nodesget, listRead live utilization metrics. Read-only.
stepscale.ioscalingrecommendations, scalingrecommendations/statusget, list, watch, create, update, patch, deleteOwn the recommendation CRs the operator itself creates.
coordination.k8s.ioleasesget, create, update, patchLeader election so replicaCount: 2+ never double-applies.

The operator’s only write access to your workloads is to HPAs and ScaledObjects. It cannot modify Deployments, read arbitrary Secrets, or touch workload pods.

  • Set watchNamespaces to the specific namespaces you want tuned. (The ClusterRole is cluster-scoped because HPAs may live anywhere; the operator only acts on the namespaces you list.)
  • The secrets get grant is cluster-wide in the shipped chart; if you keep the license Secret in the operator’s own namespace, you can tighten this to a namespaced Role in your own overlay.
  • All metric collection, analysis, apply, and rollback happen in your cluster. No workload metrics, configuration, or cluster identifiers are sent to stepscale.
  • The operator keeps no external datastore. Metric history lives in memory and is rebuilt from your Prometheus (or HPA status) each run; recommendations live as CRs in your cluster.

The operator makes no outbound calls except, optionally, to the LLM provider you configure:

Provider settingOutbound calls
llm.provider=noneNone. Fully air-gapped.
llm.provider=openaiHTTPS to api.openai.com, carrying the analysis prompt and your API key.
llm.provider=anthropicHTTPS to api.anthropic.com, same.

Licensing involves no network traffic - it is verified offline. If your security policy forbids any egress, run with llm.provider=none; the rule engine still produces recommendations deterministically.

  • Signed image. Every release image is cosign-signed via keyless OIDC (no long-lived signing keys). Verify it before install - see Installation §3.1.
  • Distroless, non-root. The image is a distroless base with no shell or package manager. The container runs as a non-root user with readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, and all Linux capabilities dropped.
  • Subscription-gated updates. Pull access to images is tied to an active subscription, so patches and updates flow only to current customers.

The license is an ed25519-signed payload verified against a baked-in public key. A tampered payload or a wrong key fails verification and the operator falls back to analysis-only - it will not apply changes. Clock-rollback is guarded by comparing against a last-seen timestamp. See Licensing.