Self-Host WebXR Meeting Rooms: Architecture & Scaling

Build and operate self-hosted WebXR meeting rooms after Meta's Workrooms shutdown—practical architecture, signaling, SFUs, Kubernetes scaling and CI/CD.

Self‑Hosting WebXR Meeting Rooms After Meta Kills Workrooms: An Architecture Guide

Hook: Your org relies on immersive meetings, but Meta shut down Workrooms in February 2026 and Horizon managed services are winding down—leaving teams facing downtime, vendor lock-in risk, and uncertain data controls. This guide shows how to build and operate your own WebXR meeting rooms with production-grade choices for rendering, signaling, media servers and scaling on Kubernetes.

Why self‑host now (2026 context)

Late 2025 and early 2026 reshuffled the enterprise XR landscape—Meta announced the shutdown of Workrooms (Feb 16, 2026) and ended Horizon managed services as it refocused Reality Labs. That created an urgent gap for companies that need reliable VR meetings, data residency, and cost predictability. At the same time, browser APIs matured: WebTransport, WebCodecs, WebGPU and broad support for AV1+SVC make self-hosted WebXR systems more feasible and performant than before.

Design goals and constraints

Start with clear non-functional requirements. Typical goals for a self-hosted WebXR meeting system in 2026:

Sub-50 ms perceived latency for head orientation and hand tracking within the same region
Secure ownership of user data and recordings for compliance
Predictable costs and autoscaled infrastructure
Integration with CI/CD and developer workflows
Support for browser and headset clients (WebXR/OpenXR bridges)

High-level architecture

Below is a pragmatic stack to self-host immersive rooms. Each component lists choices, tradeoffs and scaling guidance.

1) Client rendering: browser-based vs cloud-rendered

Two main approaches:

Client-side rendering (preferred when possible): Use Three.js, Babylon.js or PlayCanvas with WebXR + WebGPU. Advantages: low server cost, minimal bandwidth for scene state (positional updates only), excellent user interactivity. Use when clients (PC, browser, or WebXR-enabled headsets) have enough GPU and local compute.
Cloud-rendered frames (GPU stream): Use when target devices are thin (mobile or legacy headsets). The server renders frames and streams video (AV1 or H.264). This reduces client CPU but increases bandwidth and server GPU costs. Good for enterprise kiosk setups or when you need consistent visuals across heterogeneous gear.

2026 tip: Where supported, combine approaches—render environment locally but use cloud services for heavy effects or avatar compositing.

2) Signaling: WebSocket, WebTransport, or WebRTC datachannels?

Signaling coordinates room join, participant metadata, and session negotiation. Options:

WebSocket: Simple and widely supported. Use for initial room discovery and token exchange. Scales well with Redis Streams or NATS for pub/sub across nodes.
WebTransport (QUIC-based): Lower connection setup overhead and better resilience across lossy networks. Consider for binary, low-latency telemetry exchange (positional data). Browser support improved in 2025–2026.
WebRTC datachannels: Built-in P2P path if you want direct peer updates and lower hops. Use when NAT traversal works and sessions are small.

Recommendation: Use WebSocket for control plane (auth, room management) and WebTransport or WebRTC datachannels for real-time telemetry (headset pose, hand tracking). Keep the signaling service stateless and push room state into a fast data store like Redis.

3) Media server: SFU vs MCU and open‑source options

Choose your media topology based on features and cost:

SFU (Selective Forwarding Unit) like mediasoup, Janus, or Pion/ion: forwards tracks between peers, supports simulcast and SVC, low CPU. Best for most multi-party VR meetings where each user sends mic + optional camera and receives mixed or multiple streams.
MCU (Multipoint Control Unit): mixes streams into a single encoded stream (heavy encoding load). Use for simple clients that cannot decode many streams, or for recorded “director” streams.
Hybrid (SFU + cloud compositing): SFU for real-time interactivity, occasional MCU or cloud render to generate a composite livestream or recording.

Open-source picks in 2026:

mediasoup — battle-tested SFU, great for Node.js stacks, supports SVC and modern codecs
Janus — modular C server, stable and plugin-friendly
Pion (Go) — lightweight and embeddable for custom stacks
Kurento / Jitsi — for full-featured conferencing with MCU options

Don't forget TURN (coturn) for NAT traversal. Deploy TURN in each region to minimize relay latency.

4) Positional data & voice: separating concerns

Split transport for:

Positional telemetry: tiny datagrams at 30–90Hz (use WebTransport datagrams or unreliable WebRTC datachannels). Prioritize low latency over reliability; use packet loss concealment and motion extrapolation.
Voice & video: use SFU with adaptive codecs (Opus for audio, AV1 or VP9 with SVC for video). Simulcast helps poor network clients.

2026 trend: AV1 SVC saw growing support across browser vendors, making multi-quality streams cheaper and improving scalability for mixed-reality meetings.

Infrastructure: Kubernetes deployment patterns

Cluster layout

Control plane (stateless): signaling services, auth, API servers—standard deployment with HPA and service mesh if needed.
Media plane (stateful-ish): SFU pods, TURN servers—scale by room shard and region.
Render plane (GPU): cloud-render workers or avatar compositors on GPU nodes (NVIDIA or AMD). Use GPU-accelerated containers with device plugin.
State & caches: Redis for presence and heartbeat, Postgres for metadata, S3-compatible object store for recordings.

Kubernetes specific patterns

Use namespaces per environment and resource quotas to bound costs.
Deploy SFUs with PodDisruptionBudgets and headless services for peer connections.
Use HPA based on custom metrics: CPU alone is insufficient. Use Prometheus Adapter to scale on network egress, packet count, or SFU peer count.
For cold-start-heavy cloud rendering, use Virtual Kubelet or cluster autoscaler with GPU node pools to avoid paying idle GPUs.
Use KEDA to autoscale signaling/worker pods by Redis stream length or queue lag.

Example Kubernetes hints

Pod spec snippets (conceptual):

<!-- SFU pod: dedicate CPU & network; enable hostNetwork if necessary for TURN -->

Scaling strategies and room sharding

Large systems must shard rooms and distribute load. Practical approaches:

Region sharding: Assign rooms to a region based on participant geography. Keep participants in-region where possible to minimize RTT.
Room affinity: Route join requests to the SFU instance that owns the room; implement a consistent hashing scheme on the room ID to select a shard.
Horizontal scaling of SFU: Each SFU node hosts many rooms. Monitor per-node peer capacity and spin up new SFU pods when average peers per SFU crosses threshold.
Edge POPs: For global orgs, host small regional clusters (or use edge providers) and federate presence via a central control plane.

Operational note: SFU resource usage correlates strongly with encoded bitrate and number of simultaneous outgoing streams—measure during load tests.

Latency optimization tactics

Place TURN & SFU near users (edge or regional PoPs)
Use WebTransport/QUIC for reduced handshake overhead
Enable SVC/simulcast—send low-bitrate layers for distant participants
Extrapolate head poses client-side to mask network delay (client-side prediction)
Prioritize telemetry on unreliable datagrams with minimal encoding
Optimize ICE setup: persistent ICE candidates and keep-alives reduce reconnection time

Security, compliance and E2EE

Security is a primary reason to self-host. Key controls:

Authentication & Authorization: OAuth2/OIDC for SSO; short-lived tokens for WebRTC sessions; RBAC for admin APIs.
Transport security: DTLS + SRTP for media; TLS for signaling and APIs.
E2EE: True end-to-end encryption with SFU is tricky—SFUs need to access media for forwarding. Look into Insertable Streams (browser support improved in 2025–2026) for client-side encryption where feasible, or implement client-side selective encryption for private channels.
Data residency: Keep recordings in regional S3 buckets with server-side encryption; log minimal PII and use redaction pipelines for stored telemetry.

CI/CD and deployment workflows

Make deployments repeatable, auditable and reversible with GitOps and containerized builds.

Pipeline blueprint

Build: Multi-stage Docker builds for server, SFU, and GPU renderers. Use image scanning (Trivy) and sign images.
Test: Unit tests + contract tests. For WebXR, include smoke tests using headless browsers or custom harnesses that exercise WebXR endpoints and media negotiation.
Load testing: Use k6, WebRTC load generator (Pion-based harness), and scenario-based tests (e.g., 50 5-minute rooms with 8 peers each).
Deploy: GitOps with ArgoCD/Flux; progressive delivery with canary releases and automated rollback on error budgets.
Observability: Prometheus + Grafana for metrics, Loki or Elasticsearch for logs, Jaeger for tracing. Instrument media metrics: RTT, jitter, packet loss, active peers, media bitrates.

Example GitHub Actions steps (summary)

docker/build-push to container registry
helm lint & template
run k8s smoke tests in staging cluster
update GitOps repo and create PRs for ArgoCD to pick up

Monitoring & SLOs

Important SLOs for immersive rooms:

90th percentile RTT for telemetry & media
Audio continuity: percent of time audio is >95% available
Room join success rate
SFU CPU/GPU utilization & network egress limits

Set up alerts for: high packet loss, SFU saturation, TURN relay surges, and region-wide outages.

Cost control: predictable billing tactics

Media egress and GPU time are your biggest drivers. Control costs by:

Using SFU (not MCU) where possible
Autoscaling GPU pools and using spot instances for non-critical render tasks
Compression & SVC to reduce high-bitrate streams
Monitoring per-room egress so you can build quota or chargeback models

Migration checklist from Workrooms / Horizon

Meta's shutdown is the catalyst, but migration needs method: export user lists, map policies, preserve recordings (if exportable), and onboard devices to your own device management.

Inventory: users, rooms, policies, device fleet (headsets and versions).
Export: request exports of metadata/recordings from the vendor if available.
Prototype: build a single WebXR room, test latency and policy flows with pilot users.
Integrate SSO and device enrollment.
Rollout: region-by-region, use staged adoption and automation for client config.

Practical, actionable starter architecture (MVP)

This is what you should aim to deploy first—minimal components but production-ready.

Frontend: WebXR app (Three.js + WebXR, WebGPU optional) hosted on CDN
Auth: OIDC provider with short-lived claim tokens
Signaling: Go or Node WebSocket service with Redis pub/sub and persistent room metadata in Postgres
SFU: mediasoup cluster scaled by HPA/KEDA; each SFU behind a headless service
TURN: coturn per region in k8s DaemonSet or bare VM
State: Redis for presence & heartbeats; S3 for recordings
CI/CD: GitHub Actions -> Image registry -> ArgoCD for GitOps deploys
Observability: Prometheus, Grafana dashboards for media metrics

Deploy that stack to a single region, run load tests to determine per-node capacities, then copy and tune for other regions.

Operational playbook & DR

Run canary deploys and keep easy rollbacks (ArgoCD rollbacks)
Regularly exercise TURN failover and SFU restarts in staging
Automate backups of Postgres and object storage with retention policies
Practice incident runbooks for high packet loss and SFU overload

Testing and validation

Tests you must automate:

Unit & integration tests for signaling state transitions
Media negotiation tests (different codecs and simulcast setups)
Load tests that simulate realistic VR telemetry patterns (60–90Hz small packets + audio streams)
Chaos testing: kill SFU pods, network partitions, TURN loss

Future-proofing & 2026 predictions

Expect these trends to shape your roadmap:

More WebTransport adoption for telemetry and media control, reducing handshake latency.
Edge-native SFUs and managed edge compute offerings to host media near users.
Hardware-accelerated AV1 SVC decoding in headsets, lowering bandwidth of high-fidelity rooms.
Interoperability gains: better OpenXR+WebXR bridges, making multi-vendor headsets easier to support.

Key takeaways

You can replace Workrooms with a self-hosted WebXR stack that gives you control over latency, data and cost.
Prefer client rendering where possible; use SFU (mediasoup/Janus) for scalable media routing and TURN for NAT traversal.
Use Kubernetes + GitOps for repeatable deployments, and scale SFU shards by room affinity and region.
Instrument everything: media metrics, SLOs and billing data will keep your infra efficient.
Start small: a single-region MVP with CI/CD and load testing will expose capacity and latency bottlenecks fast.

Quick checklist to get started (30–90 day plan)

Day 0–7: Prototype a single WebXR room with local SFU and basic auth.
Day 8–30: Containerize, add Redis & Postgres, deploy to a dev k8s cluster. Add TURN and WebTransport proof-of-concept.
Day 31–60: Run load tests, instrument metrics, scale SFU. Integrate CI/CD and ArgoCD for GitOps.
Day 61–90: Hardening, E2EE options, region rollout and device enrollment automation.

Closing thoughts

Meta's decision to kill Workrooms removed a convenient vendor—but it also accelerated the shift to self-hosted, standards-based immersive collaboration. With WebTransport, WebCodecs, AV1 SVC and mature SFUs, 2026 is a practical time to take control. The architecture patterns in this guide balance latency, cost, and operational complexity so your engineering and DevOps teams can deploy reliable VR meetings at scale.

Call to action: Ready to pilot a self-hosted WebXR meeting room in your environment? Start with our downloadable Helm charts and a reference ArgoCD GitOps repo to get a dev cluster running in under an hour—request the repo or schedule a migration workshop with our team.