Commit graph

196 commits

Author SHA1 Message Date
Mitchell R
69e4bcb14a
ci(pi-gen): tonistiigi/binfmt --install arm64 (F flag, kernel-resident QEMU)
apt's qemu-user-static + update-binfmts produces a registration that
pi-gen's nested Docker container still couldn't see. Switch to the
canonical tonistiigi/binfmt approach: privileged container that
installs QEMU statically with the F (fix-binary) flag, so the kernel
opens the qemu-aarch64-static binary at registration time and uses it
for all subsequent arm64 execs — independent of which container the
exec happens in.

Plus diagnostic: ls /proc/sys/fs/binfmt_misc + cat qemu-aarch64
detail, so next run's log surfaces whether registration actually
landed.
2026-05-20 00:31:42 +02:00
Mitchell R
ab955e12da
ci(pi-gen): install qemu-user-static via apt instead of setup-qemu-action
docker/setup-qemu-action registers binfmt via a privileged side container;
pi-gen-action's own nested Docker container doesn't inherit the
registration. Result: arm64 ELFs in the pi-gen chroot still fail to
exec, exit 1 before any stage runs.

apt-installed qemu-user-static + binfmt-support writes persistent
binfmt_misc entries to the kernel that propagate to every container
share. Pair with update-binfmts --enable qemu-aarch64 and a sanity
ls -la /proc/sys/fs/binfmt_misc/qemu-aarch64.
2026-05-20 00:23:16 +02:00
Mitchell R
3746f685be
ci: bump action versions to latest + add QEMU arm64 binfmt for pi-gen
Real cause of last pi-gen failure was surfaced by verbose-output:
  WARNING: Only a native build environment is supported.
  arm64: not supported on this machine/kernel

ubuntu-latest is x86_64; pi-gen builds an arm64 image and chroots into
it during stages, requiring binfmt_misc handlers for arm64. Add
docker/setup-qemu-action before the pi-gen step.

While here, audit + bump every action version (pinned to current
majors):
  actions/checkout            v4 → v6
  actions/upload-artifact     v4 → v7
  actions/download-artifact   v4 → v8
  softprops/action-gh-release v2 → v3
  docker/setup-qemu-action    @v4 (new)
  usimd/pi-gen-action         @v1 (already current major)
  dtolnay/rust-toolchain      @stable (rolling channel — keep)
2026-05-20 00:11:45 +02:00
Mitchell R
0f664fe1c1
ci(pi-gen): verbose pi-gen output + IMG_SUFFIX in EXPORT_IMAGE for diagnostics 2026-05-19 23:57:26 +02:00
Mitchell R
b7ec18e52e
ci(pi-gen): trixie everywhere + missing prerun.sh + EXPORT_IMAGE marker
Reverts misdiagnosis. pi-gen defaults to trixie since the Debian 13
release, which has gtk4 4.14 + libwebkitgtk-6.0 stock — no backports
needed. Build container, kiosk gtk feature gate, and pi-gen target all
realigned to trixie.

Actual reason last image run failed: our custom stage was missing the
mandatory prerun.sh (pi-gen calls it to seed ROOTFS_DIR from the
previous stage) and the EXPORT_IMAGE marker file (signals 'bake an
image at the end of this stage'). Both added.

Asset upload now globs deploy/*.img.xz so any extra exports stage2
produces ship alongside our customised one.
2026-05-19 05:19:32 +02:00
Mitchell R
7097de6f19
ci: include flashable .img.xz on every release, dev included
Repo is public → unlimited Actions minutes, so the 30-60 min pi-gen
bake doesn't have a cost gate. Master pushes now produce the full
asset set (binaries + image), same as tag releases.
2026-05-19 05:09:04 +02:00
Mitchell R
3f20d03520
ci: block-style with: in build.yml checkout steps (flow-style + ${{ }} parser conflict) 2026-05-19 05:04:20 +02:00
Mitchell R
8f457c5ca9
ci: single reusable build.yml + release.yml orchestrator (auto-tag on master)
Replaces release-kiosk.yml + release-image.yml with two coupled workflows:

  release.yml — entrypoint. Computes version/channel/tag:
    - master push → semver patch bump from latest stable tag, append
      -dev.<shortsha>, create lightweight tag + prerelease record
    - v* tag push → use tag verbatim, channel from suffix (-beta./-dev. or
      stable), create release if missing
    Then invokes build.yml via uses: ./.github/workflows/build.yml.

  build.yml — reusable (workflow_call). Single source of truth for asset
    production:
    - kiosk binary matrix (aarch64, x86_64) in debian:trixie-slim
    - flashable .img.xz via pi-gen using the aarch64 artifact (gated by
      build-image input; master pushes default false to keep dev cycles
      fast, tag pushes default true for a full release)
    Both jobs attach to the release at tag_name=${{ inputs.tag }}.

Concurrency: master-branch runs cancel superseded peers; tag runs never
cancel. CI auto-import to a running BF server (BF_AUTOIMPORT_URL +
BF_AUTOIMPORT_API_KEY repo secrets) still wired.
2026-05-19 04:58:23 +02:00
Mitchell R
9699036bb2
feat(release): pi-gen image build pipeline (flashable .img.xz on tag push)
New workflow .github/workflows/release-image.yml takes a tagged kiosk
release binary, layers it onto Raspberry Pi OS Trixie Lite via a custom
pi-gen stage, and publishes the resulting .img.xz back to the GitHub
Release.

Custom stage deploy/pi-gen/stage-betterframe-client/:
  - 00-install-packages: cage, seatd, plymouth, gtk4 runtime, gstreamer,
    libwebkitgtk-6.0, wlr-randr, ca-certificates
  - 01-install-kiosk: drops the prebuilt kiosk binary, systemd unit,
    cage PAM stack, firmware-rollback hook, plymouth theme. Creates
    bfkiosk user, sets multi-user.target, masks all display managers,
    purges piwiz, edits cmdline/config for the BF splash. Mirrors
    setup-pi-kiosk.sh but baked into the image.

End state: rpi-imager → SD → boot → pairing screen on the HDMI display,
no operator setup steps. Kiosk auto-discovers server via discover_server()
(localhost → mDNS → frame-eu.betterportal.net).

Heavy build (~30-60 min on GH-hosted Ubuntu) so tag-push triggered, not
master. Workflow_dispatch also supports baking an existing release tag's
binary into a fresh image without re-tagging.
2026-05-19 04:34:21 +02:00
Mitchell R
093f4947a1
chore(kiosk): silence dead_code warnings on intentionally-held fields 2026-05-19 04:30:42 +02:00
Mitchell R
d9c59d9276
fix(kiosk): export WorkerMsg, import DecodePublicKey trait; CI master-push → dev
- WorkerMsg made pub + re-exported at crate root so local_server can send
  through the UI channel.
- ed25519_dalek::pkcs8::DecodePublicKey trait import — needed for
  VerifyingKey::from_public_key_pem call site.
- Workflow: pushes to master now auto-trigger a dev-channel build (in
  addition to tag-pushes for stable/beta). Concurrency group cancels
  superseded master builds; tag builds never cancel each other.
2026-05-19 04:25:59 +02:00
Mitchell R
411d9900a9
chore: target latest-stable everywhere — Debian Trixie + gtk4 v4_14
- CI workflow container: debian:trixie-slim (was bookworm-slim)
- Server image base: node:23-trixie-slim (was bookworm-slim)
- Kiosk Cargo.toml: gtk4 features v4_14 (was v4_8) — matches Trixie's
  stock gtk 4.14 without backports juggling
- setup-pi-kiosk.sh header: Trixie+ target (was Bookworm+)

Glibc matches across Pi OS Trixie, Coolify host (Trixie), CI build
container — no symbol drift at runtime.
2026-05-19 04:21:14 +02:00
Mitchell R
b2f61d2bc9
fix(kiosk): build against stock bookworm gtk 4.8.3 (drop v4_12 feature)
Pi OS Bookworm + Debian bookworm both ship libgtk-4 4.8.3. No code in
the kiosk uses 4.12+ APIs (compute_bounds, WidgetPaintable, Picture,
add_tick_callback, Fixed, set_content_fit are all <= 4.8). Swap
gtk4 feature v4_12 → v4_8 and drop the bookworm-backports juggling
in CI.
2026-05-19 04:18:54 +02:00
Mitchell R
fa4c1684a3
fix(deploy+kiosk): server healthcheck wget, nodered spider, cloud discovery
- server Dockerfile installs wget — bookworm-slim doesn't include it
  by default, so the healthcheck CMD silently failed → Coolify marked
  the container unhealthy.
- nodered healthcheck swapped to /nrdp/ (always 200 when runtime up)
  via wget --spider; previous /nrdp/auth/login returned non-2xx when
  adminAuth disabled.
- start_period bumped to 90s for nodered's flow load on smaller hosts.
- Kiosk discovery: cloud fallback now frame-eu.betterportal.net per
  the managed-fleet endpoint.
2026-05-19 04:15:25 +02:00
Mitchell R
a523e678c7
fix(nodered): base is Alpine — use apk + su-exec, not apt + gosu 2026-05-19 04:06:36 +02:00
Mitchell R
eb1ac8245a
fix(nodered): install gosu, swap su-exec → gosu (debian base, not alpine) 2026-05-19 04:04:53 +02:00
Mitchell R
f087fdc056
fix(nodered): entrypoint runs as root to fix stale /data state, drops to node-red via su-exec
Previous deploy left /data/settings.js as a DIRECTORY (Docker auto-mkdir
from a failed bind mount earlier). cp from non-root user then failed
'Permission denied' writing inside it.

Entrypoint now:
- Detects + rm -rf the stale directory
- Seeds /data/settings.js from /usr/src/bf-settings.js
- Chowns /data to node-red
- exec su-exec node-red:node-red to drop privileges before npm start
2026-05-19 04:00:58 +02:00
Mitchell R
7baa1a07f9
fix(nodered): seed /data/settings.js via entrypoint wrapper
The /data named volume hides anything Dockerfile COPYs into /data, so
the previous CMD override pointing at /usr/src/bf-settings.js didn't
help — Node-RED's launch script still looks for /data/settings.js by
default, which doesn't exist after the volume overlays.

Solution: entrypoint wrapper copies /usr/src/bf-settings.js to
/data/settings.js on first boot when missing, then exec's npm start.
Subsequent boots keep the user-edited version in the volume.
2026-05-19 03:57:42 +02:00
Mitchell R
6473f0fc95
fix(firmware): diagnostic dump + smart-quote / BOM / multi-quote handling
Adds aggressive normalisation to tryParsePrivateKey:
- Strip UTF-8 BOM
- Replace smart quotes (" " ' ') with ASCII
- Strip multiple layers of wrapping quotes
- Combine escape-unfold with quote-strip (env vars that quote AND escape)
- Strip whitespace inside base64 candidate before decode

On parse failure, dumps length + head/tail samples + first-byte hex so
the operator can spot exactly what shape the env var arrived in.
2026-05-18 22:52:35 +02:00
Mitchell R
936e6170a6
feat(store): Postgres adapter foundation + BF_DB selector (phase 1)
Lays groundwork for sqlite|postgres backend selection without yet
converting Repository. Adds:

- db-adapter.ts: async DbAdapter interface (run/get/all/exec/transaction)
- sqlite-adapter.ts: wraps node:sqlite sync API in Promise-returning shape,
  caches prepared statements
- pg-adapter.ts: pg Pool + ? → $N placeholder rewrite + RETURNING-id
  capture + savepoint-nested transactions
- service-store config: driver (sqlite|postgres), pgUrl
- BF_DB env override, plumbed via envStr

Selecting BF_DB=postgres throws at init() until the Repository is
converted off DatabaseSync. This commit ships the foundation only.

Next phases (separate commits):
  2. Convert Repository methods sync → async via DbAdapter
  3. Update every caller to await
  4. Split MIGRATIONS into sqlite + portable / pg-specific sets
  5. UUIDv7 IDs for new tables on PG path

Adds deps: pg ^8.13.1, uuidv7 ^1.0.2, @types/pg ^8.20.0
2026-05-18 22:50:48 +02:00
Mitchell R
8082571b03
fix(firmware): tolerate mangled PEM in BF_FIRMWARE_SIGNING_KEY env
Coolify / docker compose env injection routinely strips real newlines or
wraps in quotes, causing createPrivateKey to throw ERR_OSSL_UNSUPPORTED
and crashing the server before it can even start.

tryParsePrivateKey now attempts: literal, \n→LF, CRLF→LF, quote-stripped,
base64-decoded, and single-line PEM re-wrapped to 64-col. On total
failure, logs a clear warning and falls back to on-disk / generated key
instead of crashing.
2026-05-18 22:47:07 +02:00
Mitchell R
d242f0eb12
feat(deploy): docker-compose.coolify.yml variant (no host ports, Traefik fronts) 2026-05-18 22:39:28 +02:00
Mitchell R
c8fa5d95a2
fix(deploy): bake configs into images — no host bind mounts
Coolify deployments don't always carry the full source tree on disk
at the bind-mount source path. Mounting a missing file lets Docker
auto-create a directory at the target, which then fails to mount over
the file the image expects.

Fix: bake config files into the images themselves:
- Dockerfile.server COPYs deploy/docker/sec-config.yaml → /app/server/.
  Env vars (BF_*) still override at runtime per env-overrides.ts.
- New Dockerfile.angie wraps nginx:alpine + baked betterframe.docker.conf.
- Dockerfile.nodered COPYs nodered-settings.js to /usr/src/bf-settings.js
  (outside the /data volume) and uses --settings to point at it.

Compose drops the three bind mounts; volumes are now strictly
runtime state (DB + secrets, Node-RED flows). Users who want a
different sec-config still get full control via env overrides or
Coolify's Storage UI.
2026-05-18 12:18:46 +02:00
Mitchell R
024d380d7e
ci(release-kiosk): pull libgtk-4-dev from bookworm-backports (need >=4.12) 2026-05-18 12:05:46 +02:00
Mitchell R
a7abef1bba
fix(deploy): move docker-compose.yml to repo root
Coolify passes --project-directory <repo-root> so relative paths in
compose resolved from there, not from the compose file's directory.
context: ../.. then climbed to / and lstat /deploy failed.

Moving compose to repo root makes every relative path
project-dir-relative regardless of who's invoking compose. Local
'docker compose up' from repo root and Coolify's
--project-directory + -f both resolve identically.

Coolify users: update the resource's compose path to 'docker-compose.yml'
(was 'deploy/docker/docker-compose.yml'). Existing named volumes carry
over since the named: directive keeps them.
2026-05-18 12:05:09 +02:00
Mitchell R
f3c5504b4f
feat(deploy): env-overridable volume names + host port for Coolify
BF_DATA_VOLUME_NAME, NODERED_DATA_VOLUME_NAME, BF_HOST_PORT keep the
compose public while letting per-deployment specifics (host paths,
multiple staging/prod instances on one host, alternate edge ports)
land in Coolify's env tab. Defaults preserve current behaviour.
2026-05-18 11:50:51 +02:00
Mitchell R
afc560bbf5
ci(release-kiosk): whitelist workspace as safe.directory (container UID mismatch) 2026-05-18 11:40:45 +02:00
Mitchell R
6bad53da37 feat(kiosk): per-cell morph animation on layout swap
When the active layout switches, cells that exist in both old + new (same
camera, same URL, same HTML) now slide + scale from their old screen
position to the new one over 350ms (ease-out cubic). Fresh cells fade in;
removed cells fade out where they were.

Implementation:
- Each cell widget gets a stable widget_name (cam:<id>:<selector>,
  web:<url>, html:<hash>) so old/new can be matched.
- Before swap, capture each cell's bounds + a WidgetPaintable snapshot.
- New grid wrapped in an Overlay; a Fixed ghost layer hosts the animated
  Picture widgets driven by add_tick_callback + ease-out cubic.
- Once the window finishes the animation timer, the overlay is unwrapped
  back to a plain grid so subsequent renders don't accumulate layers.
2026-05-18 11:15:30 +02:00
Mitchell R
70ecdd1b03 docs: dual-license declaration + vendored AGPL-3.0 text
LICENSE.md states AGPL-3.0-only OR Commercial dual license (matches the
SPDX expression in every package.json + Cargo.toml). LICENSE-AGPL.txt
is the canonical FSF text. LICENSE-COMMERCIAL.md covers when a
commercial license is required and how to obtain one.
2026-05-15 04:47:46 +02:00
Mitchell R
f22ca6b51a ci(release-kiosk): build in debian:bookworm-slim container to match Pi glibc 2026-05-15 01:05:43 +02:00
Mitchell R
6b63d71e3e ci(release-kiosk): use ubuntu-2404 runners (jammy lacks libwebkitgtk-6.0-dev) 2026-05-15 01:04:30 +02:00
Mitchell R
150972a272 fix(server): move rate-limit creation inside register fns (BSB schema extractor)
Schema extractor evaluates module top-level statically; createRateLimiter
calls at module scope threw ReferenceError during bsb-plugin-cli build.
Lifting into the per-route register functions keeps build clean.

Also: standardise display.standby/wake audit hooks.
2026-05-14 07:49:57 +02:00
Mitchell R
17f8c7ce02 feat(server): generic MQTT telemetry bridge (off by default) 2026-05-14 07:46:56 +02:00
Mitchell R
aa4e91491b feat(server): backup + restore (AES-256-GCM, PBKDF2, admin UI) 2026-05-14 07:44:01 +02:00
Mitchell R
a6c1fb4d8d feat(server): rate limit (login + pair) + CSP/security headers 2026-05-14 07:40:22 +02:00
Mitchell R
3ec2f3bf85 feat(server): audit log — schema, helper, admin UI, hooks for login/pair/firmware 2026-05-14 07:38:18 +02:00
Mitchell R
d1fd128ea0 feat(server): env-var overrides for sec-config keys + docker healthchecks 2026-05-14 07:33:10 +02:00
Mitchell R
69cd0391b5 feat(ota): phase 3 — rollouts + automated rollback
Rollouts (server side):
- /admin/firmware/rollouts page lists + creates campaigns. Pick release,
  target kiosk_ids (empty = whole channel), percentage (1-100).
- Active rollouts override channel-latest in /api/kiosk/firmware/check.
- Deterministic bucket via sha256(rollout_id:kiosk_id) % 100 — same kiosk
  consistently lands in the same bucket across re-checks.
- Pause / resume / complete state controls.

Rollback (kiosk side):
- Before swap, kiosk writes firmware-applying.json marker.
- After clean boot + first successful heartbeat, marker deleted.
- New ExecStartPre hook (/usr/local/sbin/betterframe-firmware-rollback.sh)
  runs every service start; stale marker (>120s) + .prev present →
  restore .prev. Pairs with systemd's StartLimit to catch crash loops.
2026-05-14 07:28:20 +02:00
Mitchell R
6a8f6d76af feat(kiosk): LAN-side local HTTP server (GET layout API + admin proxy)
Kiosk now exposes :18090 with two surfaces:

- GET /local/layout/:id?key=<kiosk_local_key>
  Bookmark-friendly layout switch on this kiosk. Auth = kiosk-generated
  local key (32 random bytes, hex, stored at <state_dir>/local.key).

- ANY /proxy/* — forwards to BF server with the request's Authorization
  header preserved. Lets LAN clients reach a cloud-hosted BF server via
  the kiosk's local socket; kiosk adds no auth of its own.

Heartbeat reports {local_key, local_port}; kiosks table grows
local_key/local_port/local_last_ip columns. Admin kiosk edit page now
shows the local URLs as a copy-paste block.

Override port: BF_KIOSK_LOCAL_PORT. Disable: BF_KIOSK_LOCAL_DISABLE=1.
2026-05-14 07:24:21 +02:00
Mitchell R
e5009fdd14 feat(ota): replacement pairing + firmware OTA (admin UI, kiosk client, CI) 2026-05-13 20:56:42 +02:00
Mitchell R
2bfecb2819 feat(deploy): apt full-upgrade on every setup run
Adds an OS + dist upgrade step before the BetterFrame install logic so
re-running the script keeps the host current. Uses
  --force-confdef --force-confold
so package maintainer scripts never block on prompts, and follows with
autoremove + autoclean. Kernel/libc updates set /var/run/reboot-required
which the existing REBOOT_NEEDED guard picks up → auto-reboot at end.

BF_SKIP_UPGRADE=1 bypasses the upgrade for fast iteration.
2026-05-13 13:08:36 +02:00
Mitchell R
8bd831c183 feat(kiosk): warm pool for WebView cells
Web and HTML cells were rebuilt + reloaded on every layout switch,
losing JS state and incurring a full page load each time. Mirror the
camera pool: hold WebViews in WARM_WEBVIEWS keyed by URL (or hash of
inline HTML), reuse on switch-back, unparent + cool on switch-away,
drop after the same cooling timer. Identical content in two layouts
shares one WebView.
2026-05-13 13:07:01 +02:00
Mitchell R
b10958def7 fix(nodered): kiosk-side layout.changed events + provisioning retries
Three related fixes:

1. Idle reverts (and any other kiosk-initiated layout switch) now POST
   layout.changed to /api/kiosk/event. Previously the server only emitted
   on admin-initiated switches, so Node-RED never saw the idle revert.

2. Server's /api/kiosk/event splays the payload to the top level when
   the topic has a dedicated trigger node (layout.changed, kiosk.changed,
   kiosk.status, display.power.changed, camera.changed). The trigger
   nodes expect flat shapes matching the admin emit; the old wrapped
   shape left every field undefined.

3. Auto-provisioning of bf-server-config in Node-RED: extend retry
   window to ~5 min, log per attempt, force v2 API + full-deploy header
   so credentials inline get accepted, surface response body on failure.
2026-05-13 13:03:51 +02:00
Mitchell R
77b58c07fd feat(kiosk): track main/sub pipelines independently in warm pool
Pool was keyed by camera_id, so a cell flipping M→S tore down the old
pipeline and started fresh. With (camera_id, badge) keys the main and
sub variants live alongside each other: switching badge promotes the
new one to Warm and leaves the previous one to cool down via the normal
state machine, so flipping back inside the cooldown is instant.

ensure_warm no longer touches sibling badge entries. recompute_global_
state computes warm/hot sets as (cam, badge) pairs by calling
pick_stream per cell with its area fraction, so the planner sees what
ensure_warm will actually create.
2026-05-13 13:00:35 +02:00
Mitchell R
d5bd64d05c feat(deploy): self-update + auto-reboot on boot-file changes
Two ergonomics fixes so one invocation does the right thing:

  1. After git pull, re-exec the script if the installer itself changed
     in the pull. Previously you'd need a second run to pick up new
     logic. BF_REEXEC=1 guard prevents loops.

  2. Track REBOOT_NEEDED when cmdline.txt / config.txt get edited or
     /var/run/reboot-required appears (apt kernel/libc update). At end
     of run, auto-reboot after a 10s cancellable window. Override with
     BF_NO_REBOOT=1.
2026-05-13 12:58:20 +02:00
Mitchell R
786febbb9b fix(kiosk): strip caps so WebKit's bwrap sandbox can start
WebKitGTK launches bubblewrap for its web-content process; bwrap refuses
to run when the parent process still carries unexpected CAP_* bits ("but
not setuid, old file caps config?"). Setting CapabilityBoundingSet= +
AmbientCapabilities= empty and NoNewPrivileges=yes gives bwrap a clean
caps slate to drop from, so the sandbox initialises and web/dashboard
cells render instead of crashing the kiosk.
2026-05-13 12:53:31 +02:00
Mitchell R
f2dd5b9386 feat(kiosk): show empty display reference 2026-05-13 04:04:03 +02:00
Mitchell R
7c88d7f733 fix(displays): use kiosk-local indices
Kiosk heartbeat reports local display positions so the server can sync physical outputs without consuming global display indices.

Migrate displays.index away from global uniqueness because display numbering is only meaningful within a kiosk.
2026-05-13 03:57:12 +02:00
Mitchell R
54d4dfefa8 Fix kiosk fan control state updates 2026-05-13 03:47:34 +02:00
Mitchell R
d018b34955 fix(displays): sync layout attachment UI 2026-05-13 03:46:58 +02:00