Commit graph

355 commits

Author SHA1 Message Date
Mitchell R
46fcbe5197
fix(os-update): missing format arg in sha256 error message 2026-05-23 01:53:33 +02:00
Mitchell R
595521db88
feat(os-ota): resumable chunked download with Range header support
OS bundle download was buffering 1.2GB in RAM then writing → network
timeout or memory pressure killed it. Now:

Kiosk side:
  - Streams directly to /var/tmp/betterframe/ in 256KB chunks
  - On network error: resumes from last byte written (Range header)
  - Up to 5 retries with 10s backoff between attempts
  - Progress logged every ~50MB
  - sha256 verified on the complete file on disk (not in memory)

Server side:
  - /api/kiosk/os/download/:id supports Range: bytes=N- header
  - Returns 206 Partial Content with Content-Range for resume
  - streamBundle accepts start/end for partial reads via createReadStream
  - Advertises Accept-Ranges: bytes on all responses
2026-05-23 01:44:34 +02:00
Mitchell R
53739ada20
feat(ws): offline message queue per kiosk (100 cap, drain on reconnect) 2026-05-23 01:40:34 +02:00
Mitchell R
a414f98c56
feat(events): dedup ONVIF events within 2s window (Hikvision double-fire fix) 2026-05-23 01:39:22 +02:00
Mitchell R
a92e927b3b
feat(cameras): periodic offline detection via TCP probe + camera.offline events 2026-05-23 01:38:23 +02:00
Mitchell R
caf6095b6e
feat(security): per-kiosk encryption keys for camera passwords
Replaces shared cluster_key for bundle encryption. Each kiosk gets a
unique 32-byte AES key generated at pairing time:

Server:
  - confirmPairing generates randomBytes(32), stores encrypted with
    server secret on kiosks.encrypt_key_encrypted column
  - Delivers plaintext encrypt_key to kiosk in claim response (one-time)
  - generateBundle prefers per-kiosk key over cluster_key for
    encryptForCluster (same AES-256-GCM format, different key per kiosk)

Kiosk:
  - ClaimResp gains encrypt_key field, stored encrypted at rest
  - onvif_events prefers encrypt_key over cluster_key for decryption
  - Backward compatible: old kiosks without encrypt_key still use
    cluster_key (both delivered at pairing)

Security improvement: compromised SD card only exposes camera passwords
encrypted for THAT specific kiosk, not the entire fleet. Rotate by
deleting + re-pairing the compromised kiosk.
2026-05-23 01:36:43 +02:00
Mitchell R
9bbbdd19ea
feat(kiosk): camera error overlay with warning icon + name + reason (replaces black rectangle) 2026-05-23 01:32:47 +02:00
Mitchell R
0b3eaa3ef7
perf(bundle): ETag content-hash — 304 Not Modified when bundle unchanged 2026-05-23 01:31:38 +02:00
Mitchell R
890271d4c8
feat(store): event_log + audit_log rotation (30d/90d TTL + 100k row cap, 6h interval) 2026-05-23 01:30:26 +02:00
Mitchell R
2d157e900d
feat(cameras): health indicator on list page (green/yellow/red dot + status badge) 2026-05-23 01:29:05 +02:00
Mitchell R
592bdad10b
fix(webview): set kiosk auth cookie for sub-resource requests
WebView "URL can't be shown" — Authorization header only applies to
the initial page load. CSS/JS/XHR/WebSocket sub-resources from the
loaded page don't inherit it → Angie auth_request rejects → page breaks.

Kiosk side: set_kiosk_cookie() injects betterframe_kiosk_key cookie
into WebKit's cookie jar via JS bridge before loading the URL. Cookie
persists across all sub-resource requests automatically.

Server side: extractBearerToken() now checks betterframe_kiosk_key
cookie as fallback when no Authorization header present. Same
verifyKioskKey path, just different transport.
2026-05-23 01:23:56 +02:00
Mitchell R
a513d165dc
fix(terminal): match pairing screen layout but red warning theme for code overlay 2026-05-23 01:16:39 +02:00
Mitchell R
864e66fbc8
feat(multi-tenant): schema-per-tenant model + PostgreSQL migration DDL
Prep for multi-tenant PostgreSQL:

shared/tenant.ts: tenant model, schema name derivation, search_path
SQL helper. Schema-per-tenant: each tenant gets tenant_<uuid> schema,
public schema holds tenant registry + global admins.

migrations-pg.ts: two migration sets:
  - PUBLIC_MIGRATIONS: tenants + global_admins + schema_migrations tables
  - TENANT_MIGRATIONS: full BetterFrame table set in PG-native types
    (SERIAL, TIMESTAMPTZ, JSONB, native BOOLEAN). Mirrors SQLite schema
    1:1 but with PG conventions.

DbAdapter + SqliteAdapter + PgAdapter already existed. Next steps:
  1. Repository async conversion (155 sync calls → await adapter.*)
  2. Tenant provisioning endpoint (create schema + run migrations)
  3. Request middleware: session → tenant_id → SET search_path
  4. Global admin UI for tenant management
2026-05-23 01:15:49 +02:00
Mitchell R
0be9665458
feat(os-ota): add Push OS update now button + os_check WS message 2026-05-23 01:07:34 +02:00
Mitchell R
d6e65a4168
fix(onvif-events): fix generation leak + namespace Address parsing + backoff
Three bugs:
1. std::mem::forget(generation) leaked the Arc → old threads never
   stopped on bundle reload. Now stored in a static Mutex; new start()
   replaces it → old Arc drops → old Weak::upgrade() returns None.

2. CreatePullPoint Address uses namespace prefix (wsa5:Address,
   a:Address, etc.). Parser only matched plain <Address>. New
   extract_tag_ns tries common prefixes + fallback regex scan.
   Also validates address starts with "http" and logs response
   preview on failure for debugging.

3. Pull failure → immediate resubscribe with no delay → hammers camera.
   Added 15s backoff after pull failure before resubscribe.
2026-05-23 00:58:11 +02:00
Mitchell R
b1e8e00eb1
feat(onvif): event routing config + GetEventProperties + subscription status
Full ONVIF event management overhaul:

DB: cameras gain event_source (auto|server|kiosk:<id>), event_sink
(auto|server|kiosk:<id>), and supported_event_topics (JSON array).

Server:
  - GetEventProperties SOAP call in onvif.ts — queries camera for all
    supported event topics (motion, ANPR, line crossing, etc.)
  - POST /admin/cameras/:id/refresh-events route — runs GetEventProperties
    via designated event source (kiosk WS relay or server direct)
  - Camera edit form: event_source + event_sink dropdowns
  - Camera detail: supported event topics table with refresh button
  - Bundle includes event_source + event_sink so kiosk knows its role

Kiosk:
  - onvif_events.rs respects event_source: only subscribes when "auto"
    or "kiosk:<this_id>", skips when "server"
  - Subscription status tracking: state (subscribing/active/failed),
    last_event_at, error — reported in heartbeat for admin visibility
  - BundleCamera gains event_source + event_sink fields

Auto logic for source: camera in kiosk's bundle → kiosk subscribes.
Auto logic for sink: TODO — same-subnet detection for WSBaseNotification.
Currently PullPoint only; push model is the next step.
2026-05-23 00:38:54 +02:00
Mitchell R
70bdc3bb8b
fix(cursor): correct Xcursor binary format (was missing version field)
Previous generator packed 5 fields in the image chunk header but Xcursor
format needs 9 (header_size, type, nominal, version, w, h, xhot, yhot,
delay). Missing version field → malformed → wlroots ignored it → fell
back to default visible cursor. Now writes correct 68-byte Xcursor with
all 9 header fields. Added more cursor names (x_cursor, pirate, sides).

Also: terminal UI shows bash-style cwd$ prompt, separates command from
output visually, auto-detects pwd after each command for prompt update.
2026-05-23 00:22:28 +02:00
Mitchell R
ee980509c7
fix(ci): retry firmware auto-import on TLS/transient failure 2026-05-23 00:05:21 +02:00
Mitchell R
0aaa1d931a
perf(rauc): switch from verity to plain bundle format (skip hash tree) 2026-05-22 23:59:47 +02:00
Mitchell R
750ff1eab2
fix(terminal): plain bash as bfkiosk, no sudo/root + journal via group 2026-05-22 23:35:40 +02:00
Mitchell R
16412d5ad6
fix(terminal+journal): use systemd-run to escape NoNewPrivileges
The kiosk runs under NoNewPrivileges=yes (WebKit bwrap needs it). sudo
and nsenter both fail because they need privilege escalation which the
flag blocks. systemd-run --pipe spawns a SEPARATE service unit as root
in its own process tree, connected via stdin/stdout pipe. Not a child
of the kiosk process → NoNewPrivileges doesn't apply.

Also: enable rauc.service in pi-gen chroot (was never enabled → RAUC
daemon not running → rauc install fails → OS update silently broken).
2026-05-22 23:34:49 +02:00
Mitchell R
6244fe26e0
fix(terminal+journal): run as root via sudo + add bfkiosk NOPASSWD sudoers
Terminal spawns bash as bfkiosk (unprivileged) → can't read journal,
can't run rauc/systemctl, can't fix anything useful. Now runs
sudo bash --login (with fallback to plain bash if sudo unavailable).

Journal streaming: sudo journalctl instead of plain journalctl so
bfkiosk can read system journal without systemd-journal group.

Pi-gen image: drops /etc/sudoers.d/betterframe-kiosk granting bfkiosk
passwordless sudo. Gated by the on-screen code + lockout ladder, so
root access still requires physical presence.
2026-05-22 23:30:13 +02:00
Mitchell R
4cf9704350
fix(onvif-events): store cluster_key at pairing + implement AES-256-GCM decrypt
Root cause: kiosk never stored cluster_key from pairing response.
Bundle ships onvif_password_encrypted (AES-256-GCM with cluster key).
decrypt_cluster was a stub returning None → empty password → WSSE auth
fails → CreatePullPoint rejected → no events ever.

Fix:
1. ClaimResp now includes cluster_key field
2. Stored encrypted at rest alongside kiosk_key (at_rest.rs)
3. Loaded at bundle render, passed to onvif_events::start()
4. decrypt_cluster implements full AES-256-GCM: parse v1.<iv>.<tag>.<ct>
   format, base64url decode, decrypt with cluster key

Also: removed BF_ENABLE_ONVIF_EVENTS env gate — if camera is type=onvif
with onvif_host, subscribe. Gate was redundant with the type filter.

Also: bump Angie proxy_read_timeout to 600s on /api/admin/ for OS
bundle import (downloads ~1GB from GitHub, was timing out at 60s).

NOTE: existing paired kiosks won't have cluster_key stored. They need
to re-pair (delete + re-add) to receive it. New pairings get it
automatically.
2026-05-22 22:18:25 +02:00
Mitchell R
d4ac406f58
fix(ci): wait for GitHub CDN before OS bundle auto-import (504 race) 2026-05-22 22:11:31 +02:00
Mitchell R
a1727547df
feat(harden): transparent cursor + full VT lockdown + auto-reboot + purge all setup wizards
1. Transparent cursor theme: 1x1 pixel Xcursor for every shape, set as
   system default via XCURSOR_THEME=betterframe-empty. Nuclear fix for
   Pi 5 GPU ignoring XCURSOR_SIZE.

2. Full VT lockdown: mask ALL gettys (tty1-6 + templates), logind
   NAutoVTs=0 + ReserveVT=0, mask emergency/rescue targets. Ctrl+Alt+Fx
   reaches nothing. No login screen ever.

3. Auto-reboot: FailureAction=reboot-force + StartLimitAction=reboot-force
   on kiosk unit. If cage/app can't stay running → system reboots rather
   than showing a blank screen or login prompt.

4. Purge ALL Pi setup wizards: piwiz, userconf-pi, rpi-first-boot-wizard,
   initial-setup, pi-greeter, rpd-plym-splash. Nuke autostart files,
   mask systemd units. "Configure your Raspberry" never shows.
2026-05-22 21:38:42 +02:00
Mitchell R
6d577b5411
fix(terminal+journal): forward via WorkerMsg (GTK thread) + journal fallback
Terminal: idle_add_local_once from non-GTK thread silently fails.
Forward ShowTerminalCode/DismissTerminalCode through WorkerMsg channel
which IS polled on the GTK main thread via timeout_add_local.

Journal: try --user-unit first, fall back to unfiltered journal if
permission denied (bfkiosk user may not be in systemd-journal group on
non-reflashed images). Send error line back to admin UI on spawn failure
instead of silent drop.
2026-05-22 21:08:24 +02:00
Mitchell R
7425fa9c63
fix(terminal): overlay on existing window (cage single-window) + 60s timeout
Three fixes:
1. Terminal code overlay replaces the main display window's child instead
   of creating a new gtk::Window (cage compositor only shows one window).
   Saves the previous child and restores on dismiss.
2. Code auto-expires after 60s — timeout does NOT increment lockout.
   GTK overlay dismissed + pending_code cleared.
3. Journal-start handler already logs but relay might fail silently if
   kiosk WS reconnected after admin debug WS connected.
2026-05-22 21:00:05 +02:00
Mitchell R
9ebdc894a1
fix(terminal): get channel from server heartbeat response, not env/build 2026-05-22 20:51:18 +02:00
Mitchell R
98723f21b8
fix(terminal): detect dev channel from build version string, not env var 2026-05-22 20:49:41 +02:00
Mitchell R
76f725c149
fix(coordinator): use config.cookieName directly, not envStr 2026-05-22 20:42:48 +02:00
Mitchell R
14ee081f61
fix(config): add cookieName to coordinator-ws sec-config (was null → 401) 2026-05-22 20:41:42 +02:00
Mitchell R
5198a681eb
debug(ws): log admin debug WS auth failure details 2026-05-22 20:39:19 +02:00
Mitchell R
31ba05b703
fix(debug-ws): route via /admin/ws/debug/ so Angie forwards correctly 2026-05-22 20:28:26 +02:00
Mitchell R
aff76b41f9
fix(kiosk): report os_version in heartbeat (was never sent) 2026-05-22 20:25:29 +02:00
Mitchell R
1f0bcd1084
fix(remote-debug): successful auth resets lockout + drop empty WS token param 2026-05-22 20:23:20 +02:00
Mitchell R
c5068615ee
feat(remote-debug): journal streaming + secure terminal via WebSocket
Kiosk side (remote_debug.rs + ws_client.rs refactor):
  - Journal streaming: server sends journal-start → kiosk spawns
    journalctl -f, pipes lines back as journal-line messages via WS.
    journal-stop kills the process. On-demand, not always-on.
  - Terminal: server sends terminal-request → kiosk checks lockout +
    firmware_channel == "dev" → generates 8-char code displayed on
    screen as fullscreen overlay (NOT logged) → server relays admin's
    code via terminal-auth → kiosk validates with constant-time compare
    → on success spawns bash, relays I/O as base64 terminal-data.
  - Lockout: 3 failed codes per boot → lockout_count++. 3 lockouts
    (9 total failures) → permanent (reflash only). Reboot resets
    attempt counter, not lockout counter. Successful pairing resets all.
  - ws_client.rs rewritten with split reader/writer + tokio::select!
    for multiplexing incoming WS messages with outbound journal/terminal
    data from sync threads.

Server side (coordinator-ws + routes-admin):
  - New admin debug WS endpoint: /ws/admin/debug/:kioskId. Authenticated
    via admin API key (query param) or session cookie. Relays messages
    bidirectionally between admin browser ↔ kiosk.
  - Admin pages: /admin/kiosks/:id/logs (journal viewer with start/
    stop/clear) and /admin/kiosks/:id/terminal (code entry + terminal
    area). Both open in new tabs from the kiosk detail page.
  - Angie proxy config updated with /ws/admin/debug/ location block.

Security:
  - Terminal only on dev channel
  - Code displayed physically on screen, never logged or stored server-side
  - Lockout: 3/boot, 3 lockouts = permanent, pairing resets
  - Kiosk responds "locked" without specifying which lockout triggered
2026-05-22 20:13:39 +02:00
Mitchell R
e0b9955522
fix(admin): only show Live Events panel for ONVIF cameras 2026-05-22 19:48:41 +02:00
Mitchell R
90a8f256d5
fix(docker): remove COPY .git — Coolify excludes it from build context
Coolify doesn't include .git in Docker build context, causing build
failure. Revert to ARG-based version stamping: compose passes
BF_SERVER_VERSION from Coolify's SOURCE_COMMIT/COOLIFY_GIT_COMMIT
env vars as a build arg, Dockerfile writes it to .bf-version. Removed
git from builder apt install (no longer needed).
2026-05-22 19:30:18 +02:00
Mitchell R
ee281fc9dc
fix(ci): always build kiosk binary + image on every master push 2026-05-22 18:37:23 +02:00
Mitchell R
05ca368f29
fix(onvif): import discovered cameras as type=onvif with credentials
importDiscoveredCamera was hardcoded to type="rtsp", losing ONVIF
identity. Camera edit showed RTSP fields, ONVIF event subscription
skipped (checks cam_type=="onvif"), re-discovery impossible.

Now creates type="onvif" with onvif_host/port/username/password stored
on the camera row. Streams still go into camera_streams (unchanged).
Bundle ships onvif fields → kiosk subscribes to PullPoint events.

Also passes host + port as hidden form fields from discover results
page so the add handler has them available. Basic manual camera
creation via UI stays rtsp-only (simpler); discovery flow produces
onvif type.
2026-05-22 18:30:41 +02:00
Mitchell R
2e40e78413
fix(admin): mask passwords in stream RTSP URIs on camera detail page 2026-05-21 16:29:24 +02:00
Mitchell R
4870426158
fix(rauc): use CA cert for bundle verify + don't fail build on verify error 2026-05-21 16:22:36 +02:00
Mitchell R
516a4ca4a0
fix(firmware): grant bfkiosk write access to binary dir + align marker path
/opt/betterframe/kiosk/ now owned bfkiosk:bfkiosk so OTA can write
.new/.prev files. Marker path in Rust code aligned with rollback
script expectation (/var/lib/betterframe/kiosk/firmware-applying.json).
2026-05-21 16:03:42 +02:00
Mitchell R
7d81891b0e
fix(version): derive server version from git at Docker build time
Coolify pulls from GitHub and runs docker compose build — no guaranteed
env vars like SOURCE_COMMIT. Previous approach relied on ARG/ENV
passthrough that silently defaulted to "dev".

Fix: install git in the builder stage, COPY .git into context, run
git describe --tags --always to derive the version, write it to
/app/server/.bf-version. version.ts reads this file as a fallback
between env vars and the "dev" literal.

Chain: BF_SERVER_VERSION env → BF_BUILD_VERSION env → .bf-version file
→ COOLIFY_GIT_COMMIT env → SOURCE_COMMIT env → "dev".

Also: fix .gitignore for rauc-signing/ (was under wrong path).
2026-05-21 16:02:21 +02:00
Mitchell R
653f2ce910
chore(rauc): regenerate CA cert as ECDSA P-256 2026-05-21 15:49:15 +02:00
Mitchell R
c4ce9e7880
fix(rauc): switch signing keys from Ed25519 to ECDSA P-256
RAUC uses OpenSSL CMS signing. CMS doesn't support Ed25519 on
OpenSSL < 3.2 — Ubuntu 24.04 ships 3.0.13 → "pkey nid=1087" error.
ECDSA P-256 is universally supported in CMS, fast, and small.

Operator must regenerate keys + re-set GitHub secrets:
  rm -rf rauc-signing
  bash scripts/gen-rauc-signing-keys.sh
  cp rauc-signing/ca-cert.pem deploy/rauc/ca-cert.pem
  git add + commit + push
  Update BF_RAUC_SIGNING_CERT + BF_RAUC_SIGNING_KEY secrets
2026-05-21 15:45:26 +02:00
Mitchell R
6e10913380
fix(admin): cell edit no longer corrupts grid when spans change
Use hx-retarget/hx-reswap response headers to replace full grid
when cell dimensions change (overlap resolution may move other cells).
Single-cell swap when only content fields change.
2026-05-21 15:12:55 +02:00
Mitchell R
b05cdfc153
fix(ci): skip kiosk+image build when only server code changes
Master pushes now check git diff for kiosk/, deploy/, .github/ changes.
Server-only commits skip the expensive Rust cross-compile + pi-gen image.
Tag pushes and workflow_dispatch always build everything.
2026-05-21 15:12:48 +02:00
Mitchell R
157bdd49bb
fix(repartition): label rootfs ext4 + bootfs FAT before dd into image
Kernel dropped to initramfs because root=LABEL=BF_ROOT_A in cmdline.txt
but the ext4 filesystem had no label set (pi-gen's default is unlabeled).
dd copies raw bytes — any label must be set on the standalone file BEFORE
writing into the output image.

Add e2label BF_ROOT_A on rootfs.ext4 + fatlabel BF_BOOT_A / BF_BOOT_B
on each bootfs copy after patching cmdline.txt but before dd.
2026-05-21 15:10:22 +02:00
Mitchell R
78538cef9c
fix(rauc): remove invalid type=image from manifest (RAUC rejects it) 2026-05-21 14:49:20 +02:00