- Track subscribed_at timestamp per camera in SubStatus
- Fix mark_event_received to use epoch seconds (was OS version string)
- needs_refresh() returns true when any sub is failed/stopped or >24h old
- Heartbeat loop calls maybe_refresh_onvif() every 60s — reloads
cameras from cached bundle and restarts onvif_events::start() which
kills old generation threads and creates fresh PullPoint subscriptions
- mark_event_received called on each successful event forward
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Event insert: if source_camera_id FK fails (stale kiosk sending old
integer IDs), retry with camera_id=NULL. Event still logs, just
without camera association. Stops 500 spam until kiosk updates.
- Kiosk cleanup on first healthy boot: remove stale OS update staging
files (>24h old) from /var/lib/betterframe/tmp/, and old firmware
.prev binaries (>7 days) from /opt/betterframe/kiosk/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- All bundle struct ID fields (kiosk_id, display_id, layout_id,
camera_id, stream_id, gpio_id) now String with de_flexible_id
deserializer accepting both JSON numbers and strings.
- PoolKey, DisplayState hashmap, WorkerMsg, ServerMsg all use String
IDs throughout. Zero u32 ID references remain.
- ONVIF event image proxy: kiosk detects PictureUri in event data,
downloads image from camera (basic/digest auth), base64 encodes,
attaches to event payload before forwarding to server.
- Add md5 crate for HTTP Digest auth on camera image fetch.
- ws_client: flexible_id_from_value helper for WS message ID parsing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add betterframe-expand-data systemd service: growpart + resize2fs on
BF_DATA (last partition) so it fills the full SD card on first boot.
Solves the "No space left on device" issue with OS update downloads.
- Change OS update staging dir from /var/tmp/betterframe to
/var/lib/betterframe/tmp (on BF_DATA partition, not rootfs).
- Wire firmware and OS update progress callbacks into the GTK overlay
banner — shows "OS Update v1.2.3: Downloading — 45%" etc.
- Add per-partition disk reporting in heartbeat (/, /boot/firmware,
/var/lib/betterframe) with total/used/free/percent.
- Display partition table on kiosk detail page in admin UI.
- PG + SQLite migrations for partitions_json column on kiosks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added WorkerMsg::UpdateProgress(Option<(label, percent)>) for
showing firmware/OS update progress as an overlay banner on the
display. Handler + label management in place. Actual progress
reporting from firmware.rs/os_update.rs to be wired next.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Version label was only on pairing screen. Now also shown on the
idle/awaiting-layout logo screen (bottom-left overlay).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SOAP errors now extract fault Reason/Text/Code from XML instead of
dumping raw envelope. Logs whether ONVIF password was decrypted
(has_pass=true/false). Added NTP config to pi-gen (pool.ntp.org +
Google/Cloudflare fallback) — WSSE PasswordDigest fails with clock
skew.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ONVIF: only subscribe to cameras actually in layout cells, not all
bundle cameras. Purge warm camera pool entries for cameras removed
from the bundle entirely — immediate stop, no cooling period.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Credentials embedded in RTSP URL can skip digest negotiation on
some cameras. Now extract user:pass from URL, set as user-id/user-pw
properties on rtspsrc, pass clean URL as location.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Version label at bottom-left of pairing screen shows firmware
version (compile-time) and OS version (from /etc/betterframe/
os-version). Spinner removed from pairing screen per request.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After _check confirms key still valid, code fell through to parse
the bf_kiosk_deleted JSON as a KioskBundle causing parse error.
Now returns None to skip bundle processing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HTTP 400 from camera gave no detail. Now includes first 500 chars
of response body in error message so Axiom shows the actual SOAP
fault reason.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
kiosk_id only set during claim. After restart, loaded from disk,
kiosk_id stayed empty. Now set from bundle.kiosk_id after fetch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kiosk checks for stable firmware update before pairing. If available,
downloads + verifies + swaps binary and restarts. No auth needed.
Server: GET /api/firmware/public/check (stable channel, no auth)
GET /api/firmware/public/download/:id (rate-limited, no auth)
Kiosk: check_public() + apply_public() in firmware.rs. Called from
ui.rs worker thread before entering pairing loop. kiosk_app_version
made pub for access from ui.rs.
Also includes kiosk_id deserialization fix (Value instead of String).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Server returns kiosk_id as integer (not yet migrated to UUIDv7).
ClaimResp.kiosk_id changed from Option<String> to Option<Value>
to handle both integer and string. This was causing a panic on
deserialization after successful pairing.
Also simplified Coolify version arg.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Server returns {bf_kiosk_deleted: true} (200) instead of 401 when
kiosk key not found on bundle/heartbeat. Kiosk then confirms via
GET /api/kiosk/_check — only wipes config if _check also returns
401. Prevents proxy glitches from nuking valid kiosks.
Flow: bf_kiosk_deleted signal → confirm via _check → 401 = wipe,
200 = ignore (false alarm).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After successful claim, kiosk_id from server response is stored
globally and included in all subsequent Axiom log entries for
kiosk identification.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kiosk binary now forwards all tracing logs to Axiom when
BF_AXIOM_KEY + BF_AXIOM_DATASET are set at compile time via
option_env!(). Batches up to 50 entries or flushes every 10s.
No-op when keys not baked in (local dev builds).
CI build.yml passes secrets as env vars for cargo build.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Smart URL actions: multi-step browser automation for web cells behind
login pages. Steps: navigate, fill (form fields), click, wait, wait_for
(element selector), javascript (raw eval). Passwords in fill steps
encrypted with per-kiosk key for transport.
Schema: server/src/schemas/wire/smart-url.ts defines step types.
Stored in layout_cells.options.smart_url (no migration needed).
Bundle: includes smart_url config per cell. Fill step values encrypted
at bundle generation time with per-kiosk key (or cluster key fallback).
Kiosk: execute_smart_url_steps() builds an async JS sequence from the
steps and injects via WebKit evaluate_javascript on LoadEvent::Finished.
Supports session expiry detection via login_detect_url.
Admin UI: step builder TODO (currently configure via cell options JSON).
Data model + kiosk execution + bundle transport are complete.
OS bundle download was buffering 1.2GB in RAM then writing → network
timeout or memory pressure killed it. Now:
Kiosk side:
- Streams directly to /var/tmp/betterframe/ in 256KB chunks
- On network error: resumes from last byte written (Range header)
- Up to 5 retries with 10s backoff between attempts
- Progress logged every ~50MB
- sha256 verified on the complete file on disk (not in memory)
Server side:
- /api/kiosk/os/download/:id supports Range: bytes=N- header
- Returns 206 Partial Content with Content-Range for resume
- streamBundle accepts start/end for partial reads via createReadStream
- Advertises Accept-Ranges: bytes on all responses
Replaces shared cluster_key for bundle encryption. Each kiosk gets a
unique 32-byte AES key generated at pairing time:
Server:
- confirmPairing generates randomBytes(32), stores encrypted with
server secret on kiosks.encrypt_key_encrypted column
- Delivers plaintext encrypt_key to kiosk in claim response (one-time)
- generateBundle prefers per-kiosk key over cluster_key for
encryptForCluster (same AES-256-GCM format, different key per kiosk)
Kiosk:
- ClaimResp gains encrypt_key field, stored encrypted at rest
- onvif_events prefers encrypt_key over cluster_key for decryption
- Backward compatible: old kiosks without encrypt_key still use
cluster_key (both delivered at pairing)
Security improvement: compromised SD card only exposes camera passwords
encrypted for THAT specific kiosk, not the entire fleet. Rotate by
deleting + re-pairing the compromised kiosk.
WebView "URL can't be shown" — Authorization header only applies to
the initial page load. CSS/JS/XHR/WebSocket sub-resources from the
loaded page don't inherit it → Angie auth_request rejects → page breaks.
Kiosk side: set_kiosk_cookie() injects betterframe_kiosk_key cookie
into WebKit's cookie jar via JS bridge before loading the URL. Cookie
persists across all sub-resource requests automatically.
Server side: extractBearerToken() now checks betterframe_kiosk_key
cookie as fallback when no Authorization header present. Same
verifyKioskKey path, just different transport.
Three bugs:
1. std::mem::forget(generation) leaked the Arc → old threads never
stopped on bundle reload. Now stored in a static Mutex; new start()
replaces it → old Arc drops → old Weak::upgrade() returns None.
2. CreatePullPoint Address uses namespace prefix (wsa5:Address,
a:Address, etc.). Parser only matched plain <Address>. New
extract_tag_ns tries common prefixes + fallback regex scan.
Also validates address starts with "http" and logs response
preview on failure for debugging.
3. Pull failure → immediate resubscribe with no delay → hammers camera.
Added 15s backoff after pull failure before resubscribe.
Full ONVIF event management overhaul:
DB: cameras gain event_source (auto|server|kiosk:<id>), event_sink
(auto|server|kiosk:<id>), and supported_event_topics (JSON array).
Server:
- GetEventProperties SOAP call in onvif.ts — queries camera for all
supported event topics (motion, ANPR, line crossing, etc.)
- POST /admin/cameras/:id/refresh-events route — runs GetEventProperties
via designated event source (kiosk WS relay or server direct)
- Camera edit form: event_source + event_sink dropdowns
- Camera detail: supported event topics table with refresh button
- Bundle includes event_source + event_sink so kiosk knows its role
Kiosk:
- onvif_events.rs respects event_source: only subscribes when "auto"
or "kiosk:<this_id>", skips when "server"
- Subscription status tracking: state (subscribing/active/failed),
last_event_at, error — reported in heartbeat for admin visibility
- BundleCamera gains event_source + event_sink fields
Auto logic for source: camera in kiosk's bundle → kiosk subscribes.
Auto logic for sink: TODO — same-subnet detection for WSBaseNotification.
Currently PullPoint only; push model is the next step.
The kiosk runs under NoNewPrivileges=yes (WebKit bwrap needs it). sudo
and nsenter both fail because they need privilege escalation which the
flag blocks. systemd-run --pipe spawns a SEPARATE service unit as root
in its own process tree, connected via stdin/stdout pipe. Not a child
of the kiosk process → NoNewPrivileges doesn't apply.
Also: enable rauc.service in pi-gen chroot (was never enabled → RAUC
daemon not running → rauc install fails → OS update silently broken).
Terminal spawns bash as bfkiosk (unprivileged) → can't read journal,
can't run rauc/systemctl, can't fix anything useful. Now runs
sudo bash --login (with fallback to plain bash if sudo unavailable).
Journal streaming: sudo journalctl instead of plain journalctl so
bfkiosk can read system journal without systemd-journal group.
Pi-gen image: drops /etc/sudoers.d/betterframe-kiosk granting bfkiosk
passwordless sudo. Gated by the on-screen code + lockout ladder, so
root access still requires physical presence.
Root cause: kiosk never stored cluster_key from pairing response.
Bundle ships onvif_password_encrypted (AES-256-GCM with cluster key).
decrypt_cluster was a stub returning None → empty password → WSSE auth
fails → CreatePullPoint rejected → no events ever.
Fix:
1. ClaimResp now includes cluster_key field
2. Stored encrypted at rest alongside kiosk_key (at_rest.rs)
3. Loaded at bundle render, passed to onvif_events::start()
4. decrypt_cluster implements full AES-256-GCM: parse v1.<iv>.<tag>.<ct>
format, base64url decode, decrypt with cluster key
Also: removed BF_ENABLE_ONVIF_EVENTS env gate — if camera is type=onvif
with onvif_host, subscribe. Gate was redundant with the type filter.
Also: bump Angie proxy_read_timeout to 600s on /api/admin/ for OS
bundle import (downloads ~1GB from GitHub, was timing out at 60s).
NOTE: existing paired kiosks won't have cluster_key stored. They need
to re-pair (delete + re-add) to receive it. New pairings get it
automatically.
Terminal: idle_add_local_once from non-GTK thread silently fails.
Forward ShowTerminalCode/DismissTerminalCode through WorkerMsg channel
which IS polled on the GTK main thread via timeout_add_local.
Journal: try --user-unit first, fall back to unfiltered journal if
permission denied (bfkiosk user may not be in systemd-journal group on
non-reflashed images). Send error line back to admin UI on spawn failure
instead of silent drop.
Three fixes:
1. Terminal code overlay replaces the main display window's child instead
of creating a new gtk::Window (cage compositor only shows one window).
Saves the previous child and restores on dismiss.
2. Code auto-expires after 60s — timeout does NOT increment lockout.
GTK overlay dismissed + pending_code cleared.
3. Journal-start handler already logs but relay might fail silently if
kiosk WS reconnected after admin debug WS connected.
Kiosk side (remote_debug.rs + ws_client.rs refactor):
- Journal streaming: server sends journal-start → kiosk spawns
journalctl -f, pipes lines back as journal-line messages via WS.
journal-stop kills the process. On-demand, not always-on.
- Terminal: server sends terminal-request → kiosk checks lockout +
firmware_channel == "dev" → generates 8-char code displayed on
screen as fullscreen overlay (NOT logged) → server relays admin's
code via terminal-auth → kiosk validates with constant-time compare
→ on success spawns bash, relays I/O as base64 terminal-data.
- Lockout: 3 failed codes per boot → lockout_count++. 3 lockouts
(9 total failures) → permanent (reflash only). Reboot resets
attempt counter, not lockout counter. Successful pairing resets all.
- ws_client.rs rewritten with split reader/writer + tokio::select!
for multiplexing incoming WS messages with outbound journal/terminal
data from sync threads.
Server side (coordinator-ws + routes-admin):
- New admin debug WS endpoint: /ws/admin/debug/:kioskId. Authenticated
via admin API key (query param) or session cookie. Relays messages
bidirectionally between admin browser ↔ kiosk.
- Admin pages: /admin/kiosks/:id/logs (journal viewer with start/
stop/clear) and /admin/kiosks/:id/terminal (code entry + terminal
area). Both open in new tabs from the kiosk detail page.
- Angie proxy config updated with /ws/admin/debug/ location block.
Security:
- Terminal only on dev channel
- Code displayed physically on screen, never logged or stored server-side
- Lockout: 3/boot, 3 lockouts = permanent, pairing resets
- Kiosk responds "locked" without specifying which lockout triggered
/opt/betterframe/kiosk/ now owned bfkiosk:bfkiosk so OTA can write
.new/.prev files. Marker path in Rust code aligned with rollback
script expectation (/var/lib/betterframe/kiosk/firmware-applying.json).
New kiosk/src/onvif_events.rs: for each ONVIF camera in the bundle,
creates a PullPoint subscription, polls every 3s, parses
NotificationMessage XML into structured JSON (topic + source key/values
+ data key/values + timestamp), and POSTs to /api/kiosk/event with
source_type=onvif + camera_id.
Forwards ALL event topics: motion, ANPR (LicensePlateRecognition),
line crossing, intrusion, digital input, analytics, tamper — everything
the camera exposes. Node-RED sorts what matters.
Subscription lifecycle:
- CreatePullPointSubscription with 60s InitialTerminationTime
- Renew every 55s before timeout
- Unsubscribe on bundle change / shutdown
- Auto-resubscribe on pull/renew failure (30s backoff)
- Generation tracking via Weak<()> so old workers self-terminate
when start() is called with a new bundle
WSSE PasswordDigest auth for SOAP calls — same scheme the server's
onvif.ts uses. sha1 crate added.
BundleCamera extended with onvif_host/port/username/password_encrypted
fields (server already ships them; kiosk just wasn't deserializing).
Gated by BF_ENABLE_ONVIF_EVENTS=1. Enabled by default in the pi-gen
image env file.
TODO: cluster-key-based decryption of onvif_password_encrypted. For
now relies on the RTSP URI having plaintext credentials embedded (which
the ONVIF import path already ensures via rtspWithCredentials).
New module kiosk/src/at_rest.rs. Derives an AES-256-GCM key via HKDF
from a Pi-bound value:
1. /proc/device-tree/serial-number (Pi 5 firmware exposes it)
2. /proc/cpuinfo Serial line (older kernels)
3. /etc/machine-id (non-Pi dev fallback)
File format: "BFE1" magic || 12-byte random nonce || ciphertext+tag.
Atomic write via tempfile + rename so a crash mid-write can't leave a
half-encrypted file.
Wired into kiosk/src/server.rs at every file I/O touching sensitive
state:
- kiosk.key (bearer token to BF server)
- local.key (LAN-side API auth key)
- bundle.json (cached bundle with RTSP credentials in URL form)
Migration: read paths tolerate legacy plaintext (kiosks upgraded from a
pre-at_rest build) AND re-store as ciphertext on the first read. One-
shot upgrade — subsequent boots skip the migration write.
Threat model defended: SD card extraction. Attacker who pulls the card
can't decrypt without also having the same physical Pi (CPU serial is
hardware-bound). Doesn't defeat an attacker who has both — at that
point they ARE the kiosk. Bar is raised from "trivially extract every
camera password" to "must steal the device intact."
Not defended: TPM-style attestation, remote attestation, sealed boot.
Pi 5 has no TPM and we don't ship a secure-boot config.
Tests in-module: round-trip short bytes, round-trip JSON, legacy
plaintext passthrough.
Phase 3 of the OS OTA pipeline. New module kiosk/src/os_update.rs polls
/api/kiosk/os/check with the kiosk's compatibility string and current OS
version (read from /etc/betterframe/os-compatibility +
/etc/betterframe/os-version, both written by the image build), downloads
the bundle, sha256-verifies the transport, and hands off to
`rauc install`. RAUC takes it from there: CMS signature verify against
/etc/rauc/keyring.pem, copy into inactive A/B slot, arm tryboot via the
custom bootloader backend, return. We then post /api/kiosk/os/applied
and `systemctl reboot` into the new slot.
Wired into the existing 60s heartbeat loop in ui.rs, gated by
BF_ENABLE_OS_OTA=1 (default OFF so dev kiosks on non-A/B images don't
keep trying + failing). Runs BEFORE the kiosk-binary check on each tick
so an OS bundle that ships an updated kiosk binary doesn't race the
firmware path.
On clean-boot heartbeat success we now also call `rauc status
mark-good` so the boot-attempts counter resets — three bad boots in a
row will auto-roll back without us needing a separate rollback path.
What's NOT in this commit:
- A/B partition layout in the pi-gen image (task #6, blocks actual
deployment — bundles can be served + accepted but `rauc install`
will refuse without two valid slots).
- Admin UI for managing releases + rollouts (task #4).
When admin opens an entity preview, find a kiosk whose active layout
references the camera (new repo.listKiosksRenderingCamera). Probe each
candidate's LAN snapshot endpoint with a 4s timeout. On success, stream
the bytes back with x-bf-snapshot-source: kiosk:<id>. Falls through to
the existing server-direct ffmpeg/gst pull only when no kiosk is reachable
or has the camera in its active layout.
Kiosk side adds /local/snapshot/:camera_id?key=<local_key>. Spawns a
one-shot gst-launch (rtspsrc → decodebin → jpegenc ! filesink
num-buffers=1) on a blocking worker so axum's reactor stays free.
Prefers sub stream for snapshots to keep bandwidth low. Single-frame
pipeline tears down after the first JPEG.
LAN IP picking extracted to shared/kiosk-lan.ts so route handler and
KioskLocalPanel agree on which interface to talk to (the previously-
duplicated logic in admin-pages stays for now since it also renders the
interface list).
Why a parallel pipeline instead of teeing the warm one: cross-thread
gtk4paintablesink → appsink sample extraction is non-trivial. A 1-frame
parallel pull is cheap when the kiosk's RTSP session to that camera is
already known to work (precondition: it's in the active layout).