OS bundle download was buffering 1.2GB in RAM then writing → network
timeout or memory pressure killed it. Now:
Kiosk side:
- Streams directly to /var/tmp/betterframe/ in 256KB chunks
- On network error: resumes from last byte written (Range header)
- Up to 5 retries with 10s backoff between attempts
- Progress logged every ~50MB
- sha256 verified on the complete file on disk (not in memory)
Server side:
- /api/kiosk/os/download/:id supports Range: bytes=N- header
- Returns 206 Partial Content with Content-Range for resume
- streamBundle accepts start/end for partial reads via createReadStream
- Advertises Accept-Ranges: bytes on all responses
Replaces shared cluster_key for bundle encryption. Each kiosk gets a
unique 32-byte AES key generated at pairing time:
Server:
- confirmPairing generates randomBytes(32), stores encrypted with
server secret on kiosks.encrypt_key_encrypted column
- Delivers plaintext encrypt_key to kiosk in claim response (one-time)
- generateBundle prefers per-kiosk key over cluster_key for
encryptForCluster (same AES-256-GCM format, different key per kiosk)
Kiosk:
- ClaimResp gains encrypt_key field, stored encrypted at rest
- onvif_events prefers encrypt_key over cluster_key for decryption
- Backward compatible: old kiosks without encrypt_key still use
cluster_key (both delivered at pairing)
Security improvement: compromised SD card only exposes camera passwords
encrypted for THAT specific kiosk, not the entire fleet. Rotate by
deleting + re-pairing the compromised kiosk.
WebView "URL can't be shown" — Authorization header only applies to
the initial page load. CSS/JS/XHR/WebSocket sub-resources from the
loaded page don't inherit it → Angie auth_request rejects → page breaks.
Kiosk side: set_kiosk_cookie() injects betterframe_kiosk_key cookie
into WebKit's cookie jar via JS bridge before loading the URL. Cookie
persists across all sub-resource requests automatically.
Server side: extractBearerToken() now checks betterframe_kiosk_key
cookie as fallback when no Authorization header present. Same
verifyKioskKey path, just different transport.
Three bugs:
1. std::mem::forget(generation) leaked the Arc → old threads never
stopped on bundle reload. Now stored in a static Mutex; new start()
replaces it → old Arc drops → old Weak::upgrade() returns None.
2. CreatePullPoint Address uses namespace prefix (wsa5:Address,
a:Address, etc.). Parser only matched plain <Address>. New
extract_tag_ns tries common prefixes + fallback regex scan.
Also validates address starts with "http" and logs response
preview on failure for debugging.
3. Pull failure → immediate resubscribe with no delay → hammers camera.
Added 15s backoff after pull failure before resubscribe.
Full ONVIF event management overhaul:
DB: cameras gain event_source (auto|server|kiosk:<id>), event_sink
(auto|server|kiosk:<id>), and supported_event_topics (JSON array).
Server:
- GetEventProperties SOAP call in onvif.ts — queries camera for all
supported event topics (motion, ANPR, line crossing, etc.)
- POST /admin/cameras/:id/refresh-events route — runs GetEventProperties
via designated event source (kiosk WS relay or server direct)
- Camera edit form: event_source + event_sink dropdowns
- Camera detail: supported event topics table with refresh button
- Bundle includes event_source + event_sink so kiosk knows its role
Kiosk:
- onvif_events.rs respects event_source: only subscribes when "auto"
or "kiosk:<this_id>", skips when "server"
- Subscription status tracking: state (subscribing/active/failed),
last_event_at, error — reported in heartbeat for admin visibility
- BundleCamera gains event_source + event_sink fields
Auto logic for source: camera in kiosk's bundle → kiosk subscribes.
Auto logic for sink: TODO — same-subnet detection for WSBaseNotification.
Currently PullPoint only; push model is the next step.
The kiosk runs under NoNewPrivileges=yes (WebKit bwrap needs it). sudo
and nsenter both fail because they need privilege escalation which the
flag blocks. systemd-run --pipe spawns a SEPARATE service unit as root
in its own process tree, connected via stdin/stdout pipe. Not a child
of the kiosk process → NoNewPrivileges doesn't apply.
Also: enable rauc.service in pi-gen chroot (was never enabled → RAUC
daemon not running → rauc install fails → OS update silently broken).
Terminal spawns bash as bfkiosk (unprivileged) → can't read journal,
can't run rauc/systemctl, can't fix anything useful. Now runs
sudo bash --login (with fallback to plain bash if sudo unavailable).
Journal streaming: sudo journalctl instead of plain journalctl so
bfkiosk can read system journal without systemd-journal group.
Pi-gen image: drops /etc/sudoers.d/betterframe-kiosk granting bfkiosk
passwordless sudo. Gated by the on-screen code + lockout ladder, so
root access still requires physical presence.
Root cause: kiosk never stored cluster_key from pairing response.
Bundle ships onvif_password_encrypted (AES-256-GCM with cluster key).
decrypt_cluster was a stub returning None → empty password → WSSE auth
fails → CreatePullPoint rejected → no events ever.
Fix:
1. ClaimResp now includes cluster_key field
2. Stored encrypted at rest alongside kiosk_key (at_rest.rs)
3. Loaded at bundle render, passed to onvif_events::start()
4. decrypt_cluster implements full AES-256-GCM: parse v1.<iv>.<tag>.<ct>
format, base64url decode, decrypt with cluster key
Also: removed BF_ENABLE_ONVIF_EVENTS env gate — if camera is type=onvif
with onvif_host, subscribe. Gate was redundant with the type filter.
Also: bump Angie proxy_read_timeout to 600s on /api/admin/ for OS
bundle import (downloads ~1GB from GitHub, was timing out at 60s).
NOTE: existing paired kiosks won't have cluster_key stored. They need
to re-pair (delete + re-add) to receive it. New pairings get it
automatically.
Terminal: idle_add_local_once from non-GTK thread silently fails.
Forward ShowTerminalCode/DismissTerminalCode through WorkerMsg channel
which IS polled on the GTK main thread via timeout_add_local.
Journal: try --user-unit first, fall back to unfiltered journal if
permission denied (bfkiosk user may not be in systemd-journal group on
non-reflashed images). Send error line back to admin UI on spawn failure
instead of silent drop.
Three fixes:
1. Terminal code overlay replaces the main display window's child instead
of creating a new gtk::Window (cage compositor only shows one window).
Saves the previous child and restores on dismiss.
2. Code auto-expires after 60s — timeout does NOT increment lockout.
GTK overlay dismissed + pending_code cleared.
3. Journal-start handler already logs but relay might fail silently if
kiosk WS reconnected after admin debug WS connected.
Kiosk side (remote_debug.rs + ws_client.rs refactor):
- Journal streaming: server sends journal-start → kiosk spawns
journalctl -f, pipes lines back as journal-line messages via WS.
journal-stop kills the process. On-demand, not always-on.
- Terminal: server sends terminal-request → kiosk checks lockout +
firmware_channel == "dev" → generates 8-char code displayed on
screen as fullscreen overlay (NOT logged) → server relays admin's
code via terminal-auth → kiosk validates with constant-time compare
→ on success spawns bash, relays I/O as base64 terminal-data.
- Lockout: 3 failed codes per boot → lockout_count++. 3 lockouts
(9 total failures) → permanent (reflash only). Reboot resets
attempt counter, not lockout counter. Successful pairing resets all.
- ws_client.rs rewritten with split reader/writer + tokio::select!
for multiplexing incoming WS messages with outbound journal/terminal
data from sync threads.
Server side (coordinator-ws + routes-admin):
- New admin debug WS endpoint: /ws/admin/debug/:kioskId. Authenticated
via admin API key (query param) or session cookie. Relays messages
bidirectionally between admin browser ↔ kiosk.
- Admin pages: /admin/kiosks/:id/logs (journal viewer with start/
stop/clear) and /admin/kiosks/:id/terminal (code entry + terminal
area). Both open in new tabs from the kiosk detail page.
- Angie proxy config updated with /ws/admin/debug/ location block.
Security:
- Terminal only on dev channel
- Code displayed physically on screen, never logged or stored server-side
- Lockout: 3/boot, 3 lockouts = permanent, pairing resets
- Kiosk responds "locked" without specifying which lockout triggered
/opt/betterframe/kiosk/ now owned bfkiosk:bfkiosk so OTA can write
.new/.prev files. Marker path in Rust code aligned with rollback
script expectation (/var/lib/betterframe/kiosk/firmware-applying.json).
New kiosk/src/onvif_events.rs: for each ONVIF camera in the bundle,
creates a PullPoint subscription, polls every 3s, parses
NotificationMessage XML into structured JSON (topic + source key/values
+ data key/values + timestamp), and POSTs to /api/kiosk/event with
source_type=onvif + camera_id.
Forwards ALL event topics: motion, ANPR (LicensePlateRecognition),
line crossing, intrusion, digital input, analytics, tamper — everything
the camera exposes. Node-RED sorts what matters.
Subscription lifecycle:
- CreatePullPointSubscription with 60s InitialTerminationTime
- Renew every 55s before timeout
- Unsubscribe on bundle change / shutdown
- Auto-resubscribe on pull/renew failure (30s backoff)
- Generation tracking via Weak<()> so old workers self-terminate
when start() is called with a new bundle
WSSE PasswordDigest auth for SOAP calls — same scheme the server's
onvif.ts uses. sha1 crate added.
BundleCamera extended with onvif_host/port/username/password_encrypted
fields (server already ships them; kiosk just wasn't deserializing).
Gated by BF_ENABLE_ONVIF_EVENTS=1. Enabled by default in the pi-gen
image env file.
TODO: cluster-key-based decryption of onvif_password_encrypted. For
now relies on the RTSP URI having plaintext credentials embedded (which
the ONVIF import path already ensures via rtspWithCredentials).
New module kiosk/src/at_rest.rs. Derives an AES-256-GCM key via HKDF
from a Pi-bound value:
1. /proc/device-tree/serial-number (Pi 5 firmware exposes it)
2. /proc/cpuinfo Serial line (older kernels)
3. /etc/machine-id (non-Pi dev fallback)
File format: "BFE1" magic || 12-byte random nonce || ciphertext+tag.
Atomic write via tempfile + rename so a crash mid-write can't leave a
half-encrypted file.
Wired into kiosk/src/server.rs at every file I/O touching sensitive
state:
- kiosk.key (bearer token to BF server)
- local.key (LAN-side API auth key)
- bundle.json (cached bundle with RTSP credentials in URL form)
Migration: read paths tolerate legacy plaintext (kiosks upgraded from a
pre-at_rest build) AND re-store as ciphertext on the first read. One-
shot upgrade — subsequent boots skip the migration write.
Threat model defended: SD card extraction. Attacker who pulls the card
can't decrypt without also having the same physical Pi (CPU serial is
hardware-bound). Doesn't defeat an attacker who has both — at that
point they ARE the kiosk. Bar is raised from "trivially extract every
camera password" to "must steal the device intact."
Not defended: TPM-style attestation, remote attestation, sealed boot.
Pi 5 has no TPM and we don't ship a secure-boot config.
Tests in-module: round-trip short bytes, round-trip JSON, legacy
plaintext passthrough.
Phase 3 of the OS OTA pipeline. New module kiosk/src/os_update.rs polls
/api/kiosk/os/check with the kiosk's compatibility string and current OS
version (read from /etc/betterframe/os-compatibility +
/etc/betterframe/os-version, both written by the image build), downloads
the bundle, sha256-verifies the transport, and hands off to
`rauc install`. RAUC takes it from there: CMS signature verify against
/etc/rauc/keyring.pem, copy into inactive A/B slot, arm tryboot via the
custom bootloader backend, return. We then post /api/kiosk/os/applied
and `systemctl reboot` into the new slot.
Wired into the existing 60s heartbeat loop in ui.rs, gated by
BF_ENABLE_OS_OTA=1 (default OFF so dev kiosks on non-A/B images don't
keep trying + failing). Runs BEFORE the kiosk-binary check on each tick
so an OS bundle that ships an updated kiosk binary doesn't race the
firmware path.
On clean-boot heartbeat success we now also call `rauc status
mark-good` so the boot-attempts counter resets — three bad boots in a
row will auto-roll back without us needing a separate rollback path.
What's NOT in this commit:
- A/B partition layout in the pi-gen image (task #6, blocks actual
deployment — bundles can be served + accepted but `rauc install`
will refuse without two valid slots).
- Admin UI for managing releases + rollouts (task #4).
When admin opens an entity preview, find a kiosk whose active layout
references the camera (new repo.listKiosksRenderingCamera). Probe each
candidate's LAN snapshot endpoint with a 4s timeout. On success, stream
the bytes back with x-bf-snapshot-source: kiosk:<id>. Falls through to
the existing server-direct ffmpeg/gst pull only when no kiosk is reachable
or has the camera in its active layout.
Kiosk side adds /local/snapshot/:camera_id?key=<local_key>. Spawns a
one-shot gst-launch (rtspsrc → decodebin → jpegenc ! filesink
num-buffers=1) on a blocking worker so axum's reactor stays free.
Prefers sub stream for snapshots to keep bandwidth low. Single-frame
pipeline tears down after the first JPEG.
LAN IP picking extracted to shared/kiosk-lan.ts so route handler and
KioskLocalPanel agree on which interface to talk to (the previously-
duplicated logic in admin-pages stays for now since it also renders the
interface list).
Why a parallel pipeline instead of teeing the warm one: cross-thread
gtk4paintablesink → appsink sample extraction is non-trivial. A 1-frame
parallel pull is cheap when the kiosk's RTSP session to that camera is
already known to work (precondition: it's in the active layout).
WebView showed a grey/black half-square in the top-left — that's GDK's
"none" cursor rendered by WebKit's own surface, which ignores the
GTK-window CSS we set elsewhere. Inject a WebKit UserStyleSheet that
applies cursor:none !important to every page + frame at User priority,
overriding page-author CSS.
For the boot gap (cage start → first kiosk frame), set XCURSOR_SIZE=1
and WLR_NO_HARDWARE_CURSORS=1 in the systemd unit. SW fallback honors
the 1-pixel size; HW cursors don't, which is why a default arrow leaks
through on some Pi GPUs.
ONVIF-imported cameras with rtsp_url but no camera_streams rows showed
"(no stream)" in the kiosk because the bundle fallback was gated to
type=rtsp only. Drop the type check + backfill existing rows so old
imports get a main stream row created.
feat(kiosk-mgmt): report hostname + all network interfaces
Behind Docker/Angie the server only saw the proxy bridge IP (172.31.0.2).
Kiosk now shells `ip -j addr show`, reports every non-loopback IPv4/v6
with CIDR, MAC, and operstate. Plus `hostname` for verifying that
managed-config applies landed. Admin UI renders interface list with
LAN IPs preferred for the copy-paste local-LAN endpoint.
feat(managed-config): auto-sync hostname from kiosk name
When admin renames a managed-image kiosk, slugify the name → DNS-safe
hostname and bump managed_config_version so the kiosk applies it on
next heartbeat. Empty form hostname now falls back to slug too, so
DHCP shows the friendly name.
feat(events): forward firmware + OS update outcomes as kiosk.log
Kiosk POSTs `/api/kiosk/event` with topic=kiosk.log on firmware-apply
attempts. Server-side firmware/os-update endpoints also insert into
event_log so admins can audit upgrades without correlating per-source.
Wire schema heartbeat gains reported_hostname + network_interfaces for
Rust import parity.
Reverts misdiagnosis. pi-gen defaults to trixie since the Debian 13
release, which has gtk4 4.14 + libwebkitgtk-6.0 stock — no backports
needed. Build container, kiosk gtk feature gate, and pi-gen target all
realigned to trixie.
Actual reason last image run failed: our custom stage was missing the
mandatory prerun.sh (pi-gen calls it to seed ROOTFS_DIR from the
previous stage) and the EXPORT_IMAGE marker file (signals 'bake an
image at the end of this stage'). Both added.
Asset upload now globs deploy/*.img.xz so any extra exports stage2
produces ship alongside our customised one.
- WorkerMsg made pub + re-exported at crate root so local_server can send
through the UI channel.
- ed25519_dalek::pkcs8::DecodePublicKey trait import — needed for
VerifyingKey::from_public_key_pem call site.
- Workflow: pushes to master now auto-trigger a dev-channel build (in
addition to tag-pushes for stable/beta). Concurrency group cancels
superseded master builds; tag builds never cancel each other.
Pi OS Bookworm + Debian bookworm both ship libgtk-4 4.8.3. No code in
the kiosk uses 4.12+ APIs (compute_bounds, WidgetPaintable, Picture,
add_tick_callback, Fixed, set_content_fit are all <= 4.8). Swap
gtk4 feature v4_12 → v4_8 and drop the bookworm-backports juggling
in CI.
- server Dockerfile installs wget — bookworm-slim doesn't include it
by default, so the healthcheck CMD silently failed → Coolify marked
the container unhealthy.
- nodered healthcheck swapped to /nrdp/ (always 200 when runtime up)
via wget --spider; previous /nrdp/auth/login returned non-2xx when
adminAuth disabled.
- start_period bumped to 90s for nodered's flow load on smaller hosts.
- Kiosk discovery: cloud fallback now frame-eu.betterportal.net per
the managed-fleet endpoint.
When the active layout switches, cells that exist in both old + new (same
camera, same URL, same HTML) now slide + scale from their old screen
position to the new one over 350ms (ease-out cubic). Fresh cells fade in;
removed cells fade out where they were.
Implementation:
- Each cell widget gets a stable widget_name (cam:<id>:<selector>,
web:<url>, html:<hash>) so old/new can be matched.
- Before swap, capture each cell's bounds + a WidgetPaintable snapshot.
- New grid wrapped in an Overlay; a Fixed ghost layer hosts the animated
Picture widgets driven by add_tick_callback + ease-out cubic.
- Once the window finishes the animation timer, the overlay is unwrapped
back to a plain grid so subsequent renders don't accumulate layers.
Kiosk now exposes :18090 with two surfaces:
- GET /local/layout/:id?key=<kiosk_local_key>
Bookmark-friendly layout switch on this kiosk. Auth = kiosk-generated
local key (32 random bytes, hex, stored at <state_dir>/local.key).
- ANY /proxy/* — forwards to BF server with the request's Authorization
header preserved. Lets LAN clients reach a cloud-hosted BF server via
the kiosk's local socket; kiosk adds no auth of its own.
Heartbeat reports {local_key, local_port}; kiosks table grows
local_key/local_port/local_last_ip columns. Admin kiosk edit page now
shows the local URLs as a copy-paste block.
Override port: BF_KIOSK_LOCAL_PORT. Disable: BF_KIOSK_LOCAL_DISABLE=1.
Web and HTML cells were rebuilt + reloaded on every layout switch,
losing JS state and incurring a full page load each time. Mirror the
camera pool: hold WebViews in WARM_WEBVIEWS keyed by URL (or hash of
inline HTML), reuse on switch-back, unparent + cool on switch-away,
drop after the same cooling timer. Identical content in two layouts
shares one WebView.
Three related fixes:
1. Idle reverts (and any other kiosk-initiated layout switch) now POST
layout.changed to /api/kiosk/event. Previously the server only emitted
on admin-initiated switches, so Node-RED never saw the idle revert.
2. Server's /api/kiosk/event splays the payload to the top level when
the topic has a dedicated trigger node (layout.changed, kiosk.changed,
kiosk.status, display.power.changed, camera.changed). The trigger
nodes expect flat shapes matching the admin emit; the old wrapped
shape left every field undefined.
3. Auto-provisioning of bf-server-config in Node-RED: extend retry
window to ~5 min, log per attempt, force v2 API + full-deploy header
so credentials inline get accepted, surface response body on failure.
Pool was keyed by camera_id, so a cell flipping M→S tore down the old
pipeline and started fresh. With (camera_id, badge) keys the main and
sub variants live alongside each other: switching badge promotes the
new one to Warm and leaves the previous one to cool down via the normal
state machine, so flipping back inside the cooldown is instant.
ensure_warm no longer touches sibling badge entries. recompute_global_
state computes warm/hot sets as (cam, badge) pairs by calling
pick_stream per cell with its area fraction, so the planner sees what
ensure_warm will actually create.