Commits on Source 13

  • Pierros Papadeas's avatar
    Fix build-agent stall: only create test-hil job when HIL runner exists · 3ced1124
    Pierros Papadeas authored
    
    
    The test-hil job was being created in every pipeline (matching MR and
    default branch rules) but stuck pending forever because no hil-tagged
    runner is registered yet. Since the job existed but never completed,
    build-agent's needs: dependency blocked even with optional: true.
    
    Fix: gate test-hil on $TALOS_HIL_RUNNER variable. The job is only
    created when this variable is set (configured on the RPi runner's
    project/group CI variables). Without it, the job is excluded from the
    pipeline entirely, and build-agent proceeds normally.
    
    Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
    3ced1124
  • Pierros Papadeas's avatar
    v0.4.0: Security, observability, and architecture modernization · 28b5c56d
    Pierros Papadeas authored
    
    
    21-item release implementing security hardening, observability, and
    complete architecture modernization.
    
    SECURITY:
    - MQTT authentication enforced (allow_anonymous false) with password file
    - MQTT topic ACLs restrict agents to their own station prefix
    - Agent authenticates with station_id/key as MQTT credentials
    - Director authenticates with MQTT_USER/MQTT_PASS env vars
    - MQTT TLS support with cert generation script (ops/scripts/gen_tls_certs.sh)
    - CORS middleware with configurable CORS_ORIGINS allowlist
    - Rate limiting on /auth/login (5/min) and /auth/verify (10/min)
    - Legacy endpoints (/missions/*, /stations/create) deprecated with headers
    
    OBSERVABILITY:
    - Prometheus metrics on Core (/metrics) and Director (:8001/metrics)
    - Director HTTP health endpoint (:8001/health) with tick-age monitoring
    - Structured JSON logging (TALOS_LOG_FORMAT=json) across all services
    - Grafana dashboard with pre-built panels (ops/grafana/)
    - Monitoring stack docker-compose overlay (Prometheus + Grafana)
    - Docker build caching in CI (--cache-from)
    
    ARCHITECTURE:
    - core/main.py split from 2357 lines into 17-module package
      (app.py, config.py, deps.py, mqtt_client.py, sync.py, models.py,
      and 11 route modules under core/routes/)
    - Agent rewritten to asyncio + aiomqtt with dataclass state,
      built-in reconnection, and MQTT authentication
    - 1261 lines of inline JS extracted into 10 ES module files
      under core/static/js/
    - Paho MQTT JS replaced with MQTT.js for browser connections
    - Legacy Mission code removed from Director (_tick_legacy deleted)
    
    New dependencies: prometheus-client, python-json-logger, slowapi, aiomqtt
    
    Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
    28b5c56d
  • Pierros Papadeas's avatar
    Fix CI failures: update test fixtures for async agent, fix lint/mypy · 17865342
    Pierros Papadeas authored
    
    
    - test_agent_hardware.py: replace paho-mqtt monkey-patch with env var
      based subprocess launch (async agent reads TALOS_BROKER_HOST/PORT)
    - conftest_hil.py: same fix for HIL agent fixture
    - .gitlab-ci.yml: add aiomqtt + python-json-logger to test-agent-hardware
    - shared/logging_config.py: add type annotation to fix mypy error
    - director/director.py: move logging import to top-level (ruff E402)
    - Auto-fix ruff import sorting in test files
    
    Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
    17865342
  • Pierros Papadeas's avatar
    Complete v0.4.0: MQTT auth enforcement, dependency updates, README refresh · 435ca326
    Pierros Papadeas authored
    
    
    - mosquitto.conf: enforce allow_anonymous false + password_file + acl_file
    - core/config.py: use structured logging from shared.logging_config
    - core/routes/legacy.py: deprecation headers on all legacy endpoints
    - core/requirements.txt: add prometheus-client, python-json-logger, slowapi
    - director/requirements.txt: add prometheus-client, python-json-logger
    - ops/docker-compose.yml: director healthcheck uses HTTP endpoint
    - README.md: updated project structure and tech stack for v0.4.0
    
    Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
    435ca326
  • Pierros Papadeas's avatar
    docs: update README and DEVELOPMENT.md for v0.4.0 architecture · 65002a98
    Pierros Papadeas authored
    
    
    - README: updated project structure showing core/ package split, new
      dependencies, monitoring stack, and current feature list
    - DEVELOPMENT.md: updated Makefile targets, project structure diagram,
      test tier descriptions reflecting 17-module core package
    
    Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
    65002a98
  • Pierros Papadeas's avatar
    Fix 500 error: resolve circular import in core.routes.auth · fa2862dc
    Pierros Papadeas authored
    
    
    core.routes.auth imported limiter from core.app, but core.app imports
    core.routes.auth to register its router -- circular dependency. This
    caused ImportError in production (CI detected as 500) even though
    tests passed because test fixtures import core.main which triggers
    core.app first, masking the cycle.
    
    Fix: move limiter creation to core.config (no circular deps) and
    import from there in both core.app and core.routes.auth.
    
    Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
    fa2862dc
  • Pierros Papadeas's avatar
    Add circular import detection to CI lint stage · b88bf548
    Pierros Papadeas authored
    
    
    New test_imports.py spawns a fresh Python subprocess for each module
    to verify it can be imported independently. This catches circular
    dependencies that are masked when modules load in a specific order
    (the root cause of the v0.4.0 500 error).
    
    25 modules tested: all core.*, shared.*, director.* modules.
    Runs in lint stage as import-check job; test-unit depends on it.
    
    Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
    b88bf548
  • Pierros Papadeas's avatar
    Add CLAUDE.md: architectural invariants and agent instructions · 35a9c2e5
    Pierros Papadeas authored
    
    
    This file is read at the start of every agent session. It captures
    rules that were learned the hard way (circular imports, test isolation,
    import order masking) so future sessions don't repeat mistakes.
    
    Key invariants documented:
    - No imports from core.app in route modules (circular import rule)
    - CI runs test suites in isolation (not combined)
    - ROADMAP.md is the single task list (no inventing new work)
    - Agent uses aiomqtt (not paho-mqtt)
    - core/main.py is a thin shim (no new code)
    - Templates use external JS modules (no inline JS)
    
    Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
    35a9c2e5
  • Pierros Papadeas's avatar
    docs: v0.5 research documents + archive v0.3 research · fef0edfe
    Pierros Papadeas authored
    
    
    New research for v0.5+ planning:
    - 00-executive-summary.md: strategic priorities post-v0.4.0
    - 01-scheduling-architecture.md: OR-Tools CP-SAT, conflict detection
    - 02-performance-scaling.md: dSGP4, background threading, load targets
    - 03-data-resilience.md: CelesTrak fallback, TimescaleDB telemetry
    - 04-visualization-standards.md: CesiumJS, HTMX, CCSDS OMM/TDM
    - 05-satnogs-coexistence.md: hardware mutex, IQ capture, observations
    - 06-technology-decisions.md: tech evaluations and recommendations
    
    Archived v0.3 research to docs/research/archive/v0.3/
    Updated mkdocs.yml navigation.
    
    Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
    fef0edfe
  • Pierros Papadeas's avatar
    Enforce consistency: session discipline, scope control, smaller releases · 004ee969
    Pierros Papadeas authored
    
    
    CLAUDE.md:
    - Added "Session Discipline" section: one session = one version,
      scope control rules, agent orchestration file boundaries
    - Added "Lessons Learned" section documenting v0.4.0 incidents
      (circular import, agent test breakage, HIL runner stall)
    - Updated current state with test counts and research status
    
    ROADMAP.md:
    - Split v0.5.0 (13 items, 8 weeks) into three focused releases:
      v0.5.0 (7 items: performance + data resilience)
      v0.5.1 (4 items: scheduling)
      v0.5.2 (2 items: batch propagation + telemetry)
    - Max 7 items per release to reduce blast radius
    
    Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
    004ee969
  • Pierros Papadeas's avatar
    docs: defer Prometheus/Grafana deployment to v0.6.0 · b1327b35
    Pierros Papadeas authored
    
    
    Items 21 and 24 have code/config written (metrics endpoints, Grafana
    dashboard JSON, Prometheus scrape config, docker-compose overlay) but
    the monitoring stack is not deployed. Corrected status to "DONE (code
    only)" and added item 66 to v0.6.0 Infrastructure for actual
    deployment (Grafana Cloud or self-hosted).
    
    Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
    b1327b35
  • Pierros Papadeas's avatar
    ci: skip lint/test jobs on docs-only changes · 1bfb3f45
    Pierros Papadeas authored
    
    
    Added .skip-if-docs-only rule template that checks changes: paths.
    When a commit only touches docs/**, *.md, mkdocs.yml, CLAUDE.md,
    LICENSE, .editorconfig, or .gitattributes, all lint and test jobs
    are skipped. Only pages (MkDocs rebuild) runs.
    
    This prevents a full 19-job pipeline on documentation edits.
    
    Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
    1bfb3f45
  • Pierros Papadeas's avatar
    fix: dashboard 500 — remove legacy Mission query causing JSON serialization crash · 4c491a1a
    Pierros Papadeas authored
    
    
    The /dashboard route queried the legacy Mission table and passed the
    SQLModel object directly to the template context. The TALOS_CONFIG
    JS bridge tried to serialize it to JSON, causing:
    
        TypeError: Object of type Mission is not JSON serializable
    
    Fix: remove the Mission import and query entirely (legacy code that
    should have been cleaned up in v0.4.0 item 32). The dashboard now
    uses Campaigns exclusively.
    
    Co-Authored-By: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
    4c491a1a
Loading
Loading