Deployment & Scalability Analysis
This document outlines the architectural analysis for deploying the Visla Platform, addressing specific constraints related to VPN-enabled IoT devices (1NCE) and high-throughput ingestion.
β‘ The Golden Rules of Scalingβ
Before diving into the architecture, here is the summary of what scales and what does not.
| Component | Instances | Why? |
|---|---|---|
| OpenVPN (Client) | 1 Max (per credential) | Hard Constraint: A single VPN credential maps to a single static IP. Two instances with the same credential will conflict. To scale, you must add a NEW credential (new Ingestion Unit). |
| Decoder (Java) | 1 Max (per VPN) | Coupling: The decoder is tightly coupled to the VPN interface. Since the VPN is single-instance, the decoder listening on it must be too. |
| Redis | 1 Max (Cluster) | Central Source: All services must see the same state. Redis is extremely fast (3M+ msgs/sec), so horizontal scaling is rarely needed. |
| PostgreSQL | 1 Max (Primary) | Consistency: A single primary writer ensures data integrity. Read Replicas can be added later, but are rarely needed for IoT write loads. |
| Workers (Events/Pos) | Unlimited (β) | Stateless: Thanks to Redis Consumer Groups, you can run 10, 100, or 1000 copies. They will automatically share the workload. |
| WebSocket | Unlimited (β) | Stateless: Thanks to Redis Broadcast, any pod can serve any user. |
Core Constraint: The VPN Bottleneckβ
The primary architectural constraint is the 1NCE VPN Tunnel.
- Constraint: Each OpenVPN client connection requires a unique set of credentials (and thus a unique static IP within the 1NCE private network).
- Implication: You cannot simply load balance multiple instances of the VPN client behind a single virtual IP in a traditional way, because the routing is determined by the mobile network provider pointing to a specific VPN endpoint IP.
- Rule:
1 Unique VPN Credential = 1 Unique VPN Instance.
Solution: The "Ingestion Unit" Patternβ
To solve this, we define an atom of scalability called the Ingestion Unit.
One Ingestion Unit consists of:
- OpenVPN Client: Connected to 1NCE with specific credentials (e.g.,
user1). - Decoder Service (Netty): Running in the same network namespace as the VPN client (or routing through it).
To scale ingestion (e.g., going from 10k to 20k devices), we purely replicate this unit horizontally with new credentials:
- Unit A: Uses
user1.ovpn(Handles devices 1-10,000) - Unit B: Uses
user2.ovpn(Handles devices 10,001-20,000)
Architecture Overviewβ
We separate the system into two distinct layers decoupled by Redis Streams.
Data Flow & Duplication Preventionβ
A common concern with scaling processing workers is Message Duplication (e.g., two workers processing the same position and saving it twice).
Visla Platform prevents this natively using Redis Consumer Groups.
How it worksβ
- Ingestion: The Decoder receives raw TCP data, parses it, and sends an
XADDcommand to the streampositions:raw. - Load Balancing: Multiple instances of the
positionsservice (Workers) form a Consumer Group namedpositions-workers. - Exclusive Fetch: When a worker asks for messages using
XREADGROUP ... >, Redis delivers each message to only one consumer in the group.- If
Worker Agets Message ID1001,Worker Bwill never see it unlessWorker Acrashes and the message is reclaimed.
- If
Orchestrator Comparison: VM vs Nomad vs Kubernetesβ
Given the Ingestion Unit constraint, here is how the deployment maps to different infrastructures.
1. Kubernetes (K8s) - Complex but Standardβ
In K8s, the Ingestion Unit is modeled as a Pod containing two containers (Sidecar pattern).
- Pros: Industry standard, huge ecosystem (Cert-Manager, External-DNS), self-healing.
- Cons: Managing VPN capabilities (
NET_ADMIN) can be tricky on managed clusters (EKS/GKE). "Stateful" pods for VPN ingress are harder to manage than stateless API pods.
2. Nomad (HashiCorp) - The "Hybrid" Choiceβ
Nomad is excellent for mixed workloads. You can run the Ingestion Unit as a "Task Group".
- Pros: Much simpler than K8s. Single binary. Handles both Docker and raw generic binaries well. Networking is often more direct.
- Cons: Smaller ecosystem.
3. Docker Compose (VM) - The Recommended Solutionβ
For the current scale (0 - 50k devices), a vertically scaled VM is the most robust and practically manageable solution.
- Architecture: A single
docker-compose.ymlfile. - Scaling: Simply increase VM size (RAM/CPU).
- VPN:
network_mode: service:openvpnis natively supported and rock-solid.
Recommendationβ
- Start with Docker Compose on a single High-End VM.
- It can easily handle tens of thousands of connections if optimized (Netty is efficient).
- Use Managed Redis & Postgres.
- Don't host stateful databases yourself if possible. Let the cloud provider handle backups and HA.
- Migrate Logic to K8s Later.
- Move only the stateless consumers (
positions,events,web) to Kubernetes if you need auto-scaling. - Keep the Ingestion Unit (VPN+Decoder) on dedicated, stable VMs (Pet vs Cattle approach: Ingestion is a Pet, Workers are Cattle).
- Move only the stateless consumers (