Home Documentation Load Balancing

Load Balancing & Worker Nodes

Scale stream processing across worker servers while the main xOTT server remains the orchestrator, dashboard, API, database owner, and public HLS output endpoint.

Overview

xOTT load balancing lets you add worker servers that run stream workloads away from the main application server. The main server still owns the panel, database, service accounts, orchestration, monitoring, licensing, and customer-facing output URLs.

This is true workload distribution, not only traffic forwarding. Eligible streams can be started on worker nodes so CPU, memory, stream process count, and network pressure are spread across the available infrastructure.

Public output behavior: HLS links remain on the main app server domain or IP. Worker node URLs are internal implementation details and should not be exposed to customers.

Architecture

The load balancing system uses a controller-worker model:

  • Main server: Runs the xOTT dashboard, API, database access, scheduler, placement engine, public HLS routes, and license checks.
  • Worker nodes: Run assigned stream processes, report health, and accept signed commands from the controller.
  • Placement engine: Chooses where a stream should run based on global settings, service defaults, stream overrides, node health, limits, and strategy.
  • Output layer: Keeps stream URLs stable through the main server even when the stream process runs on a worker.
Viewer / Player -> Main xOTT HLS URL -> Controller Output Route -> Assigned Worker Stream

Requirements

Prepare each worker server before adding it to xOTT:

  • Ubuntu server with root SSH access for the initial installation.
  • Enough CPU, RAM, disk, and bandwidth for the stream workload you plan to assign.
  • Network access from the main xOTT server to the worker API port.
  • Network access from the worker back to the main xOTT server for heartbeats and controller communication.
  • Firewall rules that allow only the required controller-to-worker traffic whenever possible.

Add Worker

Worker installation is handled from the xOTT Server page. Add the server IP, SSH port, root username, and root password, then start the installer. xOTT connects to the server, installs the worker runtime, writes the node configuration, registers the worker, and generates a signed node token for secure command authentication.

Password handling: The root password is used for bootstrap only. It should not be stored as a reusable worker credential after installation.

Recommended first check

  1. Add the worker from the Server page.
  2. Wait for the first heartbeat to appear.
  3. Click the health check icon for the node.
  4. Assign a single non-critical stream to the worker.
  5. Confirm the public HLS URL still uses the main server address.

Health & Heartbeats

Workers send regular heartbeat data to the main server. The controller uses these reports to decide whether a node is eligible for new stream assignments.

  • Heartbeat interval: Workers report health frequently, usually around every 15 seconds.
  • Stale threshold: A node that stops reporting for roughly 45 seconds is treated as unavailable for placement.
  • Metrics: CPU load, memory usage, stream count, node status, and last heartbeat time are used by the placement logic.
  • Health check: The Server page health check sends a signed command to confirm the worker API can be reached and authenticated.

Placement Policies

Stream placement is configurable at multiple levels. xOTT resolves the final assignment using this precedence:

  1. Stream-specific override
  2. Service account or service default
  3. Global load balancing policy

Placement modes

  • Auto / Inherit: Use the next applicable parent policy.
  • Main server only: Force the stream to run on the main xOTT server.
  • Any worker: Run the stream on any eligible healthy worker.
  • Specific worker: Pin the stream to one worker node.
  • Worker group: Restrict placement to nodes in selected groups.

Strategies

  • Lowest load: Prefer the eligible node with the lowest current resource pressure and stream count.
  • Round robin: Rotate assignments across eligible nodes for simple distribution.

Fallback behavior

  • Main server fallback: If no worker is eligible, run the stream on the main server.
  • Fail if unavailable: If the required worker or group is not available, do not start the stream.

Node Settings

Worker settings let you control capacity and routing without hardcoding service-specific behavior into the engine.

  • Groups: Assign nodes to groups such as us, eu, directv, or vix for policy targeting.
  • Max streams: Stop new assignments once the node reaches its stream process limit.
  • Max CPU: Prevent new assignments when CPU pressure exceeds the configured limit.
  • Max memory: Prevent new assignments when memory usage exceeds the configured limit.
  • Weight: Give stronger nodes more placement preference where supported by the strategy.
  • Drain mode: Keep existing streams running but stop sending new starts to the node during maintenance.

Stream Settings

Individual stream settings are the most precise control point. Use them when a channel needs a specific node, a strict region, or a controlled fallback policy that should not affect the rest of the service.

  • Use inherit for normal streams so global and service rules can keep balancing automatically.
  • Use specific worker for streams that need a fixed host due to proxy, account, or regional constraints.
  • Use worker group when several equivalent nodes can run the stream.
  • Use fail if unavailable only when running on the wrong node is worse than not starting the stream.

Service Defaults

Service-level defaults are useful when a provider, account group, or region should use a predictable set of worker nodes. For example, you can keep a service on a regional worker group while still letting individual streams override that behavior when needed.

This keeps the engine universal: the load balancer does not need custom code per provider. Service behavior is expressed through policies, groups, limits, and fallback settings.

Operations

Adding capacity

Add a new worker, wait for heartbeat health, assign groups and limits, then let the placement strategy begin sending eligible streams to the node.

Maintenance

Enable drain mode before planned maintenance. Existing streams can continue while new starts are sent elsewhere. After stopping or moving streams, complete the maintenance and disable drain mode.

Scaling guidance

  • Start with conservative max stream, CPU, and memory limits.
  • Use groups for geography, service families, or hardware classes.
  • Keep main server fallback enabled while validating a new worker.
  • Review failed stream starts for node eligibility problems before changing service scripts.

Troubleshooting

Worker does not heartbeat

  • Confirm the worker service is running on the node.
  • Confirm the controller URL in the worker environment points to the main xOTT server.
  • Check outbound firewall access from the worker to the main server.
  • Verify the node token is present in the worker environment.

Health check fails but heartbeat works

  • Confirm the main server can reach the worker API host and port.
  • Check any firewall rules between the controller and worker.
  • Verify the worker API is using the current signed command token.

Streams still run on the main server

  • Check whether the global, service, or stream policy is set to main server only.
  • Confirm at least one worker is healthy and not in drain mode.
  • Review max stream, CPU, and memory limits for the candidate workers.
  • Check whether fallback to main server is enabled because no worker was eligible.

Output URL shows the main server

This is expected. Public HLS output should remain on the main xOTT server even when the stream process runs on a worker node.

Security

Worker nodes use controller-issued node tokens and signed controller-to-worker commands. Keep worker APIs restricted to the main xOTT server where your network design allows it.

  • Do not expose worker API ports to the public internet unless firewall rules restrict access to the controller.
  • Rotate or reinstall a worker if its token may have been exposed.
  • Use the Server page installer instead of manually copying credentials between systems.
  • Keep the main server license active; workers are part of the licensed xOTT deployment, not standalone panels.
Security model: The main app remains the trusted orchestrator. Workers execute assigned stream operations and report state, but they do not replace the central panel, database, or licensing authority.