Deploy on Kubernetes
This guide covers deploying the Hiya Voice Verification engine on any conformant Kubernetes cluster — on-prem, self-managed, or any cloud distribution. For cloud-specific optimizations, see the dedicated guides for AWS EKS, GCP GKE, or Azure AKS.
Prerequisites
- A running Kubernetes cluster (v1.24+)
kubectlconfigured to access the cluster- Container image pulled and authenticated — see Getting the Container Image
- A valid
API_KEYfrom your Hiya account team
Step 1 — Create an Image Pull Secret
Create a Kubernetes secret from the JSON key file provided by Hiya. This allows the cluster to pull the image from Google Artifact Registry.
kubectl create secret docker-registry hiya-registry \
--docker-server=europe-docker.pkg.dev \
--docker-username=_json_key \
--docker-password="$(cat key.json)"
The pull secret must exist in the same namespace as the Deployment. To use it across multiple namespaces, recreate the secret in each one or use a tool like Sealed Secrets or External Secrets Operator.
Step 2 — Store the API Key
Store your runtime API key in a separate secret:
kubectl create secret generic hiya-engine-config \
--from-literal=api-key=<your-api-key>
Step 3 — Create the Deployment
# hiya-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: hiya-voice-verification
spec:
replicas: 1
selector:
matchLabels:
app: hiya-voice-verification
template:
metadata:
labels:
app: hiya-voice-verification
spec:
imagePullSecrets:
- name: hiya-registry
containers:
- name: engine
image: europe-docker.pkg.dev/loccus-platform/onpremise-images/engine-api-standalone:<version>
ports:
- containerPort: 8080
name: grpc
- containerPort: 8081
name: ws
env:
- name: API_KEY
valueFrom:
secretKeyRef:
name: hiya-engine-config
key: api-key
volumeMounts:
- name: models-tmpfs
mountPath: /opt/loccus/models
startupProbe:
grpc:
port: 8080
periodSeconds: 1
failureThreshold: 30
livenessProbe:
grpc:
port: 8080
periodSeconds: 10
resources:
requests:
memory: "6Gi"
cpu: "2"
limits:
memory: "8Gi"
volumes:
- name: models-tmpfs
emptyDir:
medium: Memory
sizeLimit: 4Gi
kubectl apply -f hiya-deployment.yaml
The emptyDir with medium: Memory is the Kubernetes equivalent of Docker's --tmpfs flag — models are loaded into RAM and never written to persistent disk.
Step 4 — Create the Service
Expose the engine within the cluster so that client applications can reach it:
# hiya-service.yaml
apiVersion: v1
kind: Service
metadata:
name: hiya-voice-verification
spec:
selector:
app: hiya-voice-verification
ports:
- name: grpc
protocol: TCP
port: 8080
targetPort: 8080
- name: ws
protocol: TCP
port: 8081
targetPort: 8081
kubectl apply -f hiya-service.yaml
The service will be available at hiya-voice-verification.<namespace>.svc.cluster.local:8080.
Step 5 — Verify
Check that the pod is running and healthy:
kubectl get pods -l app=hiya-voice-verification
kubectl logs -l app=hiya-voice-verification --tail=20
The startup probe gives the engine up to 30 seconds to load its models. Once the pod shows Running and 1/1 ready, the engine is accepting requests.
Scaling
The engine is stateless. To scale, increase the replica count:
kubectl scale deployment hiya-voice-verification --replicas=3
Each pod runs the full engine independently. The Kubernetes Service automatically load-balances gRPC requests across all ready pods.
We recommend maintaining instances at around 50% CPU utilization to balance prompt response times with computational resource efficiency. See Scalability & Recovery for more details.
Updating Registry Credentials
When Hiya rotates your registry credentials, update the pull secret in-place:
kubectl create secret docker-registry hiya-registry \
--docker-server=europe-docker.pkg.dev \
--docker-username=_json_key \
--docker-password="$(cat key.json)" \
--dry-run=client -o yaml | kubectl apply -f -
Supported Container Runtimes
The engine image is OCI-compliant and works with any container runtime supported by your cluster:
- Docker
- containerd
- CRI-O
- gVisor
Recommended CPU Architectures
We strongly recommend 5th generation Intel Xeon Scalable Processors (Emerald Rapids) for superior performance and efficiency.
| Architecture | Notes |
|---|---|
| Intel Emerald Rapids | Recommended |
| Intel Sapphire Rapids | |
| Intel Ice Lake | |
| Intel Cascade Lake | |
| Intel Skylake | |
| AMD EPYC Genoa | |
| AMD EPYC Milan |