Deploy on Kubernetes

This guide covers deploying the Hiya Voice Verification engine on any conformant Kubernetes cluster — on-prem, self-managed, or any cloud distribution. For cloud-specific optimizations, see the dedicated guides for AWS EKS, GCP GKE, or Azure AKS.

Prerequisites

A running Kubernetes cluster (v1.24+)
kubectl configured to access the cluster
Container image pulled and authenticated — see Getting the Container Image
Runtime configuration values for API_KEY, ORG_HANDLE, PLATFORM_REGION, and MIN_ALLOCATION

Step 1 — Create an Image Pull Secret

Create a Kubernetes secret from the JSON key file provided by Hiya. This allows the cluster to pull the image from Google Artifact Registry.

kubectl create secret docker-registry hiya-registry \
  --docker-server=europe-docker.pkg.dev \
  --docker-username=_json_key \
  --docker-password="$(cat key.json)"

The pull secret must exist in the same namespace as the Deployment. To use it across multiple namespaces, recreate the secret in each one or use a tool like Sealed Secrets or External Secrets Operator.

Step 2 — Store Runtime Configuration

Store the required runtime environment variables in a Kubernetes secret:

kubectl create secret generic hiya-engine-config \
  --from-literal=api-key=<your-api-key> \
  --from-literal=org-handle=<your-org-handle> \
  --from-literal=platform-region=<eu-or-us> \
  --from-literal=min-allocation=1m

Use the following values:

API_KEY: create one with Create a key or copy it from the Audio Intel keys UI
ORG_HANDLE: get it from List your organizations or from your organization page in the Hiya UI
PLATFORM_REGION: set to eu or us, matching the region you use to log in to the Hiya platform
MIN_ALLOCATION: the allocation bag used by the container for minute consumption; Hiya recommends 1m or 5m

Step 3 — Create the Deployment

# hiya-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hiya-voice-verification
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hiya-voice-verification
  template:
    metadata:
      labels:
        app: hiya-voice-verification
    spec:
      imagePullSecrets:
        - name: hiya-registry
      containers:
        - name: engine
          image: europe-docker.pkg.dev/loccus-platform/onpremise-images/hiya-voice-verification:<version>
          ports:
            - containerPort: 8080
              name: health
            - containerPort: 8081
              name: ws
          env:
            - name: API_KEY
              valueFrom:
                secretKeyRef:
                  name: hiya-engine-config
                  key: api-key
            - name: ORG_HANDLE
              valueFrom:
                secretKeyRef:
                  name: hiya-engine-config
                  key: org-handle
            - name: PLATFORM_REGION
              valueFrom:
                secretKeyRef:
                  name: hiya-engine-config
                  key: platform-region
            - name: MIN_ALLOCATION
              valueFrom:
                secretKeyRef:
                  name: hiya-engine-config
                  key: min-allocation
          volumeMounts:
            - name: models-tmpfs
              mountPath: /opt/loccus/models
          startupProbe:
            grpc:
              port: 8080
            periodSeconds: 1
            failureThreshold: 30
          livenessProbe:
            grpc:
              port: 8080
            periodSeconds: 10
          resources:
            requests:
              memory: "6Gi"
              cpu: "2"
            limits:
              memory: "8Gi"
      volumes:
        - name: models-tmpfs
          emptyDir:
            medium: Memory
            sizeLimit: 8Gi

kubectl apply -f hiya-deployment.yaml

The emptyDir with medium: Memory is the Kubernetes equivalent of Docker's --tmpfs flag — models are loaded into RAM at startup and never written to persistent disk.

Step 4 — Create the Service

Expose the engine within the cluster so that client applications can reach it:

# hiya-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: hiya-voice-verification
spec:
  selector:
    app: hiya-voice-verification
  ports:
    - name: health
      protocol: TCP
      port: 8080
      targetPort: 8080
    - name: ws
      protocol: TCP
      port: 8081
      targetPort: 8081

kubectl apply -f hiya-service.yaml

The service will be available at hiya-voice-verification.<namespace>.svc.cluster.local on ports 8080 (health checks) and 8081 (WebSocket API).

Step 5 — Verify

Check that the pod is running and healthy:

kubectl get pods -l app=hiya-voice-verification
kubectl logs -l app=hiya-voice-verification --tail=20

The startup probe gives the engine up to 30 seconds to load its models. Once the pod shows Running and 1/1 ready, the engine is accepting requests.

Scaling

The engine is stateless. To scale, increase the replica count:

kubectl scale deployment hiya-voice-verification --replicas=3

Each pod runs the full engine independently. The Kubernetes Service automatically load-balances requests across all ready pods.

We recommend maintaining instances at around 50% CPU utilization to balance prompt response times with computational resource efficiency. See Scalability & Recovery for more details.

Updating Registry Credentials

When Hiya rotates your registry credentials, update the pull secret in-place:

kubectl create secret docker-registry hiya-registry \
  --docker-server=europe-docker.pkg.dev \
  --docker-username=_json_key \
  --docker-password="$(cat key.json)" \
  --dry-run=client -o yaml | kubectl apply -f -

Supported Container Runtimes

The engine image is OCI-compliant and works with any container runtime supported by your cluster:

Docker
containerd
CRI-O
gVisor

Recommended CPU Architectures

We strongly recommend 5th generation Intel Xeon Scalable Processors (Emerald Rapids) for superior performance and efficiency.

Architecture	Notes
Intel Emerald Rapids	Recommended
Intel Sapphire Rapids
Intel Ice Lake
Intel Cascade Lake
Intel Skylake
AMD EPYC Genoa
AMD EPYC Milan

Prerequisites​

Step 1 — Create an Image Pull Secret​

Step 2 — Store Runtime Configuration​

Step 3 — Create the Deployment​

Step 4 — Create the Service​

Step 5 — Verify​

Scaling​

Updating Registry Credentials​

Supported Container Runtimes​

Recommended CPU Architectures​