Security
The operator is designed for a security-conscious platform team: minimal privilege, no data exfiltration, a signed supply chain, and offline licensing.
8.1 RBAC - least privilege
Section titled “8.1 RBAC - least privilege”The chart installs one ClusterRole (and a ClusterRoleBinding to the operator’s service
account). Every rule and its justification:
| API group | Resources | Verbs | Why |
|---|---|---|---|
autoscaling | horizontalpodautoscalers | get, list, watch, patch, update | Read and tune HPAs. One of only two mutating grants. |
keda.sh | scaledobjects | get, list, watch, patch, update | Read and tune KEDA ScaledObjects. The other mutating grant. |
"" (core) | secrets | get | Read the offline license Secret at runtime to verify it. get only - no list/watch, so it cannot enumerate cluster Secrets. |
apps | deployments | get, list, watch | Read-only context for the workload behind an HPA. |
metrics.k8s.io | pods, nodes | get, list | Read live utilization metrics. Read-only. |
stepscale.io | scalingrecommendations, scalingrecommendations/status | get, list, watch, create, update, patch, delete | Own the recommendation CRs the operator itself creates. |
coordination.k8s.io | leases | get, create, update, patch | Leader election so replicaCount: 2+ never double-applies. |
The operator’s only write access to your workloads is to HPAs and ScaledObjects. It cannot modify Deployments, read arbitrary Secrets, or touch workload pods.
Narrowing the blast radius
Section titled “Narrowing the blast radius”- Set
watchNamespacesto the specific namespaces you want tuned. (The ClusterRole is cluster-scoped because HPAs may live anywhere; the operator only acts on the namespaces you list.) - The
secretsgetgrant is cluster-wide in the shipped chart; if you keep the license Secret in the operator’s own namespace, you can tighten this to a namespacedRolein your own overlay.
8.2 Data residency
Section titled “8.2 Data residency”- All metric collection, analysis, apply, and rollback happen in your cluster. No workload metrics, configuration, or cluster identifiers are sent to stepscale.
- The operator keeps no external datastore. Metric history lives in memory and is rebuilt from your Prometheus (or HPA status) each run; recommendations live as CRs in your cluster.
8.3 Egress
Section titled “8.3 Egress”The operator makes no outbound calls except, optionally, to the LLM provider you configure:
| Provider setting | Outbound calls |
|---|---|
llm.provider=none | None. Fully air-gapped. |
llm.provider=openai | HTTPS to api.openai.com, carrying the analysis prompt and your API key. |
llm.provider=anthropic | HTTPS to api.anthropic.com, same. |
Licensing involves no network traffic - it is verified offline. If your security policy
forbids any egress, run with llm.provider=none; the rule engine still produces
recommendations deterministically.
8.4 Supply chain and runtime hardening
Section titled “8.4 Supply chain and runtime hardening”- Signed image. Every release image is cosign-signed via keyless OIDC (no long-lived signing keys). Verify it before install - see Installation §3.1.
- Distroless, non-root. The image is a distroless base with no shell or package manager.
The container runs as a non-root user with
readOnlyRootFilesystem: true,allowPrivilegeEscalation: false, and all Linux capabilities dropped. - Subscription-gated updates. Pull access to images is tied to an active subscription, so patches and updates flow only to current customers.
8.5 Offline license integrity
Section titled “8.5 Offline license integrity”The license is an ed25519-signed payload verified against a baked-in public key. A tampered payload or a wrong key fails verification and the operator falls back to analysis-only - it will not apply changes. Clock-rollback is guarded by comparing against a last-seen timestamp. See Licensing.