Deploy on Kubernetes
This guide covers deploying the Hiya Voice Verification engine on any conformant Kubernetes cluster — on-prem, self-managed, or any cloud distribution. For cloud-specific optimizations, see the dedicated guides for AWS EKS, GCP GKE, or Azure AKS.
Prerequisites
- A running Kubernetes cluster (v1.24+)
kubectlconfigured to access the cluster- Container image pulled and authenticated — see Getting the Container Image
- Runtime configuration values for
API_KEY,ORG_HANDLE,PLATFORM_REGION, andMIN_ALLOCATION
Step 1 — Create an Image Pull Secret
Create a Kubernetes secret from the JSON key file provided by Hiya. This allows the cluster to pull the image from Google Artifact Registry.
kubectl create secret docker-registry hiya-registry \
--docker-server=europe-docker.pkg.dev \
--docker-username=_json_key \
--docker-password="$(cat key.json)"
The pull secret must exist in the same namespace as the Deployment. To use it across multiple namespaces, recreate the secret in each one or use a tool like Sealed Secrets or External Secrets Operator.
Step 2 — Store Runtime Configuration
Store the required runtime environment variables in a Kubernetes secret:
kubectl create secret generic hiya-engine-config \
--from-literal=api-key=<your-api-key> \
--from-literal=org-handle=<your-org-handle> \
--from-literal=platform-region=<eu-or-us> \
--from-literal=min-allocation=1m
Use the following values:
API_KEY: create one with Create a key or copy it from the Audio Intel keys UIORG_HANDLE: get it from List your organizations or from your organization page in the Hiya UIPLATFORM_REGION: set toeuorus, matching the region you use to log in to the Hiya platformMIN_ALLOCATION: the allocation bag used by the container for minute consumption; Hiya recommends1mor5m
Step 3 — Create the Deployment
# hiya-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: hiya-voice-verification
spec:
replicas: 1
selector:
matchLabels:
app: hiya-voice-verification
template:
metadata:
labels:
app: hiya-voice-verification
spec:
imagePullSecrets:
- name: hiya-registry
containers:
- name: engine
image: europe-docker.pkg.dev/loccus-platform/onpremise-images/hiya-voice-verification:<version>
ports:
- containerPort: 8080
name: health
- containerPort: 8081
name: ws
env:
- name: API_KEY
valueFrom:
secretKeyRef:
name: hiya-engine-config
key: api-key
- name: ORG_HANDLE
valueFrom:
secretKeyRef:
name: hiya-engine-config
key: org-handle
- name: PLATFORM_REGION
valueFrom:
secretKeyRef:
name: hiya-engine-config
key: platform-region
- name: MIN_ALLOCATION
valueFrom:
secretKeyRef:
name: hiya-engine-config
key: min-allocation
volumeMounts:
- name: models-tmpfs
mountPath: /opt/loccus/models
startupProbe:
grpc:
port: 8080
periodSeconds: 1
failureThreshold: 30
livenessProbe:
grpc:
port: 8080
periodSeconds: 10
resources:
requests:
memory: "6Gi"
cpu: "2"
limits:
memory: "8Gi"
volumes:
- name: models-tmpfs
emptyDir:
medium: Memory
sizeLimit: 8Gi
kubectl apply -f hiya-deployment.yaml
The emptyDir with medium: Memory is the Kubernetes equivalent of Docker's --tmpfs flag — models are loaded into RAM at startup and never written to persistent disk.
Step 4 — Create the Service
Expose the engine within the cluster so that client applications can reach it:
# hiya-service.yaml
apiVersion: v1
kind: Service
metadata:
name: hiya-voice-verification
spec:
selector:
app: hiya-voice-verification
ports:
- name: health
protocol: TCP
port: 8080
targetPort: 8080
- name: ws
protocol: TCP
port: 8081
targetPort: 8081
kubectl apply -f hiya-service.yaml
The service will be available at hiya-voice-verification.<namespace>.svc.cluster.local on ports 8080 (health checks) and 8081 (WebSocket API).
Step 5 — Verify
Check that the pod is running and healthy:
kubectl get pods -l app=hiya-voice-verification
kubectl logs -l app=hiya-voice-verification --tail=20
The startup probe gives the engine up to 30 seconds to load its models. Once the pod shows Running and 1/1 ready, the engine is accepting requests.
Scaling
The engine is stateless. To scale, increase the replica count:
kubectl scale deployment hiya-voice-verification --replicas=3
Each pod runs the full engine independently. The Kubernetes Service automatically load-balances requests across all ready pods.
We recommend maintaining instances at around 50% CPU utilization to balance prompt response times with computational resource efficiency. See Scalability & Recovery for more details.
Updating Registry Credentials
When Hiya rotates your registry credentials, update the pull secret in-place:
kubectl create secret docker-registry hiya-registry \
--docker-server=europe-docker.pkg.dev \
--docker-username=_json_key \
--docker-password="$(cat key.json)" \
--dry-run=client -o yaml | kubectl apply -f -
Supported Container Runtimes
The engine image is OCI-compliant and works with any container runtime supported by your cluster:
- Docker
- containerd
- CRI-O
- gVisor
Recommended CPU Architectures
We strongly recommend 5th generation Intel Xeon Scalable Processors (Emerald Rapids) for superior performance and efficiency.
| Architecture | Notes |
|---|---|
| Intel Emerald Rapids | Recommended |
| Intel Sapphire Rapids | |
| Intel Ice Lake | |
| Intel Cascade Lake | |
| Intel Skylake | |
| AMD EPYC Genoa | |
| AMD EPYC Milan |