Security

The operator is designed for a security-conscious platform team: minimal privilege, no data exfiltration, a signed supply chain, and offline licensing.

8.1 RBAC - least privilege

The chart installs one ClusterRole (and a ClusterRoleBinding to the operator’s service account). Every rule and its justification:

API group	Resources	Verbs	Why
`autoscaling`	`horizontalpodautoscalers`	`get, list, watch, patch, update`	Read and tune HPAs. One of only two mutating grants.
`keda.sh`	`scaledobjects`	`get, list, watch, patch, update`	Read and tune KEDA ScaledObjects. The other mutating grant.
`""` (core)	`secrets`	`get`	Read the offline license Secret at runtime to verify it. `get` only - no list/watch, so it cannot enumerate cluster Secrets.
`apps`	`deployments`	`get, list, watch`	Read-only context for the workload behind an HPA.
`metrics.k8s.io`	`pods, nodes`	`get, list`	Read live utilization metrics. Read-only.
`stepscale.io`	`scalingrecommendations`, `scalingrecommendations/status`	`get, list, watch, create, update, patch, delete`	Own the recommendation CRs the operator itself creates.
`coordination.k8s.io`	`leases`	`get, create, update, patch`	Leader election so `replicaCount: 2+` never double-applies.

The operator’s only write access to your workloads is to HPAs and ScaledObjects. It cannot modify Deployments, read arbitrary Secrets, or touch workload pods.

Narrowing the blast radius

Set watchNamespaces to the specific namespaces you want tuned. (The ClusterRole is cluster-scoped because HPAs may live anywhere; the operator only acts on the namespaces you list.)
The secrets get grant is cluster-wide in the shipped chart; if you keep the license Secret in the operator’s own namespace, you can tighten this to a namespaced Role in your own overlay.

8.2 Data residency

All metric collection, analysis, apply, and rollback happen in your cluster. No workload metrics, configuration, or cluster identifiers are sent to stepscale.
The operator keeps no external datastore. Metric history lives in memory and is rebuilt from your Prometheus (or HPA status) each run; recommendations live as CRs in your cluster.

8.3 Egress

The operator makes no outbound calls except, optionally, to the LLM provider you configure:

Provider setting	Outbound calls
`llm.provider=none`	None. Fully air-gapped.
`llm.provider=openai`	HTTPS to `api.openai.com`, carrying the analysis prompt and your API key.
`llm.provider=anthropic`	HTTPS to `api.anthropic.com`, same.

Licensing involves no network traffic - it is verified offline. If your security policy forbids any egress, run with llm.provider=none; the rule engine still produces recommendations deterministically.

8.4 Supply chain and runtime hardening

Signed image. Every release image is cosign-signed via keyless OIDC (no long-lived signing keys). Verify it before install - see Installation §3.1.
Distroless, non-root. The image is a distroless base with no shell or package manager. The container runs as a non-root user with readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, and all Linux capabilities dropped.
Subscription-gated updates. Pull access to images is tied to an active subscription, so patches and updates flow only to current customers.

8.5 Offline license integrity

The license is an ed25519-signed payload verified against a baked-in public key. A tampered payload or a wrong key fails verification and the operator falls back to analysis-only - it will not apply changes. Clock-rollback is guarded by comparing against a last-seen timestamp. See Licensing.