Skip to main content
Skip table of contents

Kubernetes Data Collector Agent

Overview

The K8s Agent (Data Collector) is a lightweight, non-intrusive component designed to provide comprehensive visibility into your containerized infrastructure. By deploying the agent, you bridge the gap between raw cluster telemetry and actionable operational insights.

  • K8s Agent: The K8s Agent (Data Collector) is a lightweight component running inside your Kubernetes cluster (or with kubeconfig pointing to it). It discovers cluster topology, collects metrics and process/network data, and sends this data to the Cloudamize data collector over HTTPS.

  • Data Collector: A Cloudamize-hosted API that receives agent payloads, authenticates them using a Customer Key unique to your Cloudamize Account.

  • Customer Key is the single secret from the Cloudamize Dashboard. It is used for Authentication and Encryption.

    One Customer Key can serve multiple clusters; each cluster is identified by a unique Cluster Name and optional Cluster ID.

Installation Prerequisites

  • The agent pod requires outbound HTTPS access to:

    • The collector endpoint COLLECTOR_BASE_URL, default: <https://collector.cloudamize.com> on port 443.

    • The Kubernetes API server (in-cluster via service account, or external kubeconfig).

    • The metrics-server in-cluster API (optional; used for CPU/memory metrics).

  • Permissions the Agent Needs (RBAC)

The agent runs under a dedicated ServiceAccount and requires a ClusterRole with read-only access to the listed resources, bound via a ClusterRoleBinding.

Below is the full ClusterRole:

CODE
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cmz-k8s-agent-reader
rules:
- apiGroups: [""]
  resources:
    - nodes
    - pods
    - services
    - endpoints
    - namespaces
    - persistentvolumes
    - persistentvolumeclaims
    - configmaps
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources:
    - deployments
    - replicasets
    - daemonsets
    - statefulsets
  verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"]
  resources:
    - networkpolicies
    - ingresses
  verbs: ["get", "list", "watch"]
- apiGroups: ["discovery.k8s.io"]
  resources:
    - endpointslices
  verbs: ["get", "list", "watch"]
- apiGroups: ["policy"]
  resources:
    - podsecuritypolicies
    - poddisruptionbudgets
  verbs: ["get", "list", "watch"]
- apiGroups: ["rbac.authorization.k8s.io"]
  resources:
    - roles
    - rolebindings
    - clusterroles
    - clusterrolebindings
  verbs: ["get", "list", "watch"]
- apiGroups: ["storage.k8s.io"]
  resources:
    - storageclasses
  verbs: ["get", "list", "watch"]
- apiGroups: ["metrics.k8s.io"]
  resources:
    - nodes
    - pods
  verbs: ["get", "list"]


Key points:

  • Principle of least privilege: Only get, list, and watch verbs are granted. No create, update, patch, or delete permissions are needed or granted.

  • Metrics are optional: If you do not run a metrics-server or do not want to expose it, omit the metrics.k8s.io block; the agent still sends discovery and other data but node/pod CPU/memory usage will be missing.

  • Namespace: The agent deploys in a dedicated namespace (cmz-k8s-agent by default) with a ClusterRoleBinding binding the ClusterRole to the agent's ServiceAccount.

  • Pod security: The agent pod runs as a non-root user, with a read-only root filesystem, all Linux capabilities dropped, and no privilege escalation allowed.

How to Deploy the K8s Agent

Prerequisites

  • Customer Key from the Cloudamize Dashboard.

  • kubectl is configured for your cluster.

  • Sufficient permissions to create a namespace, ServiceAccount, ClusterRole/ClusterRoleBinding, and Deployment (cluster admin or equivalent).

  • Outbound HTTPS (port 443) access from the cluster to the Cloudamize collector URL.

Option A — Helm (recommended):

The agent is published as a Helm chart on Amazon ECR Public. No helm repo add step is needed. Helm handles RBAC creation automatically.

Prerequisites: Helm 3.8+ (OCI support is built-in).

Step 1 — Install

Always-latest (easiest):

CODE
helm install cloudamize-agent \
  oci://public.ecr.aws/m5z6i3j5/cloudamize/helm/cloudamize-agent \
  --set customerKey=YOUR_KEY_FROM_DASHBOARD \ # Replace with your customer key
  --set clusterName=my-cluster # update the name of the cluster

Using a Kubernetes Secret instead of inlining the key (recommended for production):

CODE
kubectl create namespace cloudamize-agent
kubectl create secret generic cloudamize-agent-secret \
  --from-literal=CUSTOMER_KEY="your-customer-key" \
  -n cloudamize-agent
helm install cloudamize-agent \
  oci://public.ecr.aws/m5z6i3j5/cloudamize/helm/cloudamize-agent \
  --set existingSecret=cloudamize-agent-secret \
  --set clusterName=my-cluster # update the name of the cluster


Step 2 — Verify

CODE
kubectl get pods -n cloudamize-agent
kubectl logs -f deployment/cloudamize-agent -n cloudamize-agent

Upgrade

CODE
helm upgrade cloudamize-agent \
  oci://public.ecr.aws/m5z6i3j5/cloudamize/helm/cloudamize-agent \
  --reuse-values

Uninstall

CODE
helm uninstall cloudamize-agent

Option B — Plain manifests (kubectl fallback):

Manifests are published to Amazon ECR Public as OCI artifacts alongside the Helm chart.

Requires Oras (OCI Registry As Storage CLI).

Step 1 — Apply RBAC

Unlike Helm, the plain manifest path requires you to apply RBAC manually before deploying the agent:

CODE
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
  name: cmz-k8s-agent
  labels:
    name: cmz-k8s-agent
    app.kubernetes.io/name: cmz-data-collector-k8s-agent
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cmz-k8s-agent
  namespace: cmz-k8s-agent
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cmz-k8s-agent-reader
rules:
- apiGroups: [""]
  resources:
    - nodes
    - pods
    - services
    - endpoints
    - namespaces
    - persistentvolumes
    - persistentvolumeclaims
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources:
    - deployments
    - replicasets
    - daemonsets
    - statefulsets
  verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"]
  resources:
    - networkpolicies
    - ingresses
  verbs: ["get", "list", "watch"]
- apiGroups: ["storage.k8s.io"]
  resources:
    - storageclasses
  verbs: ["get", "list", "watch"]
- apiGroups: ["metrics.k8s.io"]
  resources:
    - nodes
    - pods
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cmz-k8s-agent-reader
subjects:
- kind: ServiceAccount
  name: cmz-k8s-agent
  namespace: cmz-k8s-agent
roleRef:
  kind: ClusterRole
  name: cmz-k8s-agent-reader
  apiGroup: rbac.authorization.k8s.io
EOF
CODE
This creates:
| Resource | Name | Purpose |
|---|---|---|
| `Namespace` | `cmz-k8s-agent` | Dedicated namespace for the agent |
| `ServiceAccount` | `cmz-k8s-agent` | Identity the agent pod runs as |
| `ClusterRole` | `cmz-k8s-agent-reader` | Read-only access to the Kubernetes resources listed in section 3 |
| `ClusterRoleBinding` | `cmz-k8s-agent-reader` | Binds the ClusterRole to the ServiceAccount |

Verify:

CODE
kubectl get namespace cmz-k8s-agent
kubectl get serviceaccount cmz-k8s-agent -n cmz-k8s-agent
kubectl get clusterrolebinding cmz-k8s-agent-reader

Step 2 — Pull the manifests

Always-latest:

CODE
oras pull public.ecr.aws/m5z6i3j5/cloudamize/manifests/cloudamize-agent:latest

Pinned to a specific release:

CODE
oras pull public.ecr.aws/m5z6i3j5/cloudamize/manifests/cloudamize-agent:0.0.10

Both produce a single file: cloudamize-agent-manifests.yaml.

Step 3 — Configure and deploy

Edit the two placeholder values in cloudamize-agent-manifests.yaml:

CODE
- name: CUSTOMER_KEY
  value: "YOUR_KEY_FROM_DASHBOARD"   # <-- Required, customer key
- name: CLUSTER_NAME
  value: "my-cluster"                # <-- Required

Then apply:

CODE
kubectl apply -f cloudamize-agent-manifests.yaml

For production, store the key in a Kubernetes Secret instead of inlining it:

CODE
kubectl create secret generic cmz-agent-secrets \
  --from-literal=CUSTOMER_KEY="your-customer-key" \
  -n cmz-k8s-agent

Then reference it in the manifest:

CODE
- name: CUSTOMER_KEY
  valueFrom:
    secretKeyRef:
      name: cmz-agent-secrets
      key: CUSTOMER_KEY

Verify

CODE
kubectl get pods -n cmz-k8s-agent
kubectl logs -f deployment/cmz-k8s-agent -n cmz-k8s-agent
kubectl port-forward svc/cmz-k8s-agent 8080:8080 -n cmz-k8s-agent
curl http://localhost:8080/health
curl http://localhost:8080/ready

Container image

The agent image is hosted on Amazon ECR Public and does not require authentication to pull:

Environment

Image

Agent Image

public.ecr.aws/m5z6i3j5/cloudamize/cmz-k8s-agent/prod:latest

Agent Helm Charts

public.ecr.aws/m5z6i3j5/cloudamize/helm/cloudamize-agent

Agent K8S Manifest

public.ecr.aws/m5z6i3j5/cloudamize/manifests/cloudamize-agent:latest

The :latest tag always points to the latest release. For reproducible deployments, pin to a specific image digest or SHA tag (Cloudamize uses the Git commit SHA as the image tag in CI).

Your cluster needs outbound access to public.ecr.aws on port 443 to pull the image on first deploy and on updates.

Environment variables reference

Variable

Required

Default

Description

CUSTOMER_KEY

Yes

Customer Key from Cloudamize Dashboard. Used for auth and deriving AES-256 encryption key.

CLUSTER_NAME

Yes

Human-readable name for this cluster (must be unique per customer).

CLUSTER_ID

No

Auto-derived from kube-system namespace UID

Override the cluster ID sent to the collector.

COLLECTOR_BASE_URL

No

<https://collector.cloudamize.com

Collector base URL. All endpoint paths derive from this.

COLLECTION_INTERVAL

No

30s

Default collection interval (overrides discovery and process schedulers).

NETWORK_INTERVAL

No

1m

Network collection interval.

METRICS_ENABLED

No

true

Set to false to disable metrics-server-based CPU/memory collection.

LOG_LEVEL

No

info

Log level: debug, info, warn, error.

COLLECTOR_TIMEOUT

No

HTTP timeout for requests to the collector (e.g., 30s).

RETRY_MAX_ATTEMPTS

No

3

Maximum retry attempts for failed collector requests.

RETRY_DELAY

No

1s

Initial retry delay (backoff multiplier: 1.5×, capped at RETRY_MAX_DELAY).

RETRY_MAX_DELAY

No

5s

Maximum retry delay between attempts.

SHUTDOWN_TIMEOUT

No

30s

Graceful shutdown timeout.

Multiple clusters

Use the same Customer Key for all clusters. Assign each cluster a unique CLUSTER_NAME (and optionally CLUSTER_ID) so the collector can differentiate them. For example:

CODE
# Cluster 1 – prod US East
- name: CUSTOMER_KEY
  value: "your-customer-key"
- name: CLUSTER_NAME
  value: "prod-us-east"
# Cluster 2 – staging
- name: CUSTOMER_KEY
  value: "your-customer-key"   # same key
- name: CLUSTER_NAME
  value: "staging"

Kubernetes APIs and Resources the Agent Uses

The agent only reads from the Kubernetes API; it does not create, update, or delete resources. Below are the accessed API groups and resources.

Core API (K8S)

Resource

Verbs

Purpose

nodes

get, list, watch

Cluster nodes and capacity

pods

get, list, watch

Pods across all namespaces

services

get, list, watch

Services and cluster topology

endpoints

get, list, watch

Service endpoints

namespaces

get, list, watch

Namespace metadata

persistentvolumes

get, list, watch

PV definitions

persistentvolumeclaims

get, list, watch

PVC usage

configmaps

get, list, watch

(Optional; used in full RBAC)

Apps API (apps)

Resource

Verbs

Purpose

deployments

get, list, watch

Workload metadata

replicasets

get, list, watch

ReplicaSet metadata

daemonsets

get, list, watch

DaemonSet metadata

statefulsets

get, list, watch

StatefulSet metadata

Networking API (networking.k8s.io)

Resource

Verbs

Purpose

networkpolicies

get, list, watch

Network policy discovery

ingresses

get, list, watch

Ingress resources

Discovery API (discovery.k8s.io)

Resource

Verbs

Purpose

endpointslices

get, list, watch

EndpointSlice for service/network mapping

Policy API (policy)

Resource

Verbs

Purpose

poddisruptionbudgets

get, list, watch

PDB metadata

podsecuritypolicies

get, list, watch

(If present; legacy clusters only)

RBAC API (rbac.authorization.k8s.io)

Resource

Verbs

Purpose

roles

get, list, watch

Role metadata

rolebindings

get, list, watch

RoleBinding metadata

clusterroles

get, list, watch

ClusterRole metadata

clusterrolebindings

get, list, watch

ClusterRoleBinding metadata

Storage API (storage.k8s.io)

Resource

Verbs

Purpose

storageclasses

get, list, watch

Storage class discovery

Metrics API (metrics.k8s.io)

Resource

Verbs

Purpose

nodes

get, list

Node CPU/memory usage (requires metrics-server)

pods

get, list

Pod CPU/memory usage (requires metrics-server)

Other

  • Server version: The agent calls the Kubernetes API server's version endpoint to discover cluster version.

  • APIService (metrics): On startup, the agent checks that the v1beta1.metrics.k8s.io APIService is registered and metrics-server is reachable. It logs a warning if not but continues without metrics.

Note: The agent does not read secrets or other sensitive resources by default. Minimal RBAC deployments may omit some resources (e.g., configmaps, policy, rbac), resulting in reduced metadata collection.

If you have any questions or encounter any issues, please contact our support team at helpdesk@cloudamize.com.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.