Loading
Commits on Source 13
-
Pierros Papadeas authored
The test-hil job was being created in every pipeline (matching MR and default branch rules) but stuck pending forever because no hil-tagged runner is registered yet. Since the job existed but never completed, build-agent's needs: dependency blocked even with optional: true. Fix: gate test-hil on $TALOS_HIL_RUNNER variable. The job is only created when this variable is set (configured on the RPi runner's project/group CI variables). Without it, the job is excluded from the pipeline entirely, and build-agent proceeds normally. Co-Authored-By:Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-
Pierros Papadeas authored
21-item release implementing security hardening, observability, and complete architecture modernization. SECURITY: - MQTT authentication enforced (allow_anonymous false) with password file - MQTT topic ACLs restrict agents to their own station prefix - Agent authenticates with station_id/key as MQTT credentials - Director authenticates with MQTT_USER/MQTT_PASS env vars - MQTT TLS support with cert generation script (ops/scripts/gen_tls_certs.sh) - CORS middleware with configurable CORS_ORIGINS allowlist - Rate limiting on /auth/login (5/min) and /auth/verify (10/min) - Legacy endpoints (/missions/*, /stations/create) deprecated with headers OBSERVABILITY: - Prometheus metrics on Core (/metrics) and Director (:8001/metrics) - Director HTTP health endpoint (:8001/health) with tick-age monitoring - Structured JSON logging (TALOS_LOG_FORMAT=json) across all services - Grafana dashboard with pre-built panels (ops/grafana/) - Monitoring stack docker-compose overlay (Prometheus + Grafana) - Docker build caching in CI (--cache-from) ARCHITECTURE: - core/main.py split from 2357 lines into 17-module package (app.py, config.py, deps.py, mqtt_client.py, sync.py, models.py, and 11 route modules under core/routes/) - Agent rewritten to asyncio + aiomqtt with dataclass state, built-in reconnection, and MQTT authentication - 1261 lines of inline JS extracted into 10 ES module files under core/static/js/ - Paho MQTT JS replaced with MQTT.js for browser connections - Legacy Mission code removed from Director (_tick_legacy deleted) New dependencies: prometheus-client, python-json-logger, slowapi, aiomqtt Co-Authored-By:Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-
Pierros Papadeas authored
- test_agent_hardware.py: replace paho-mqtt monkey-patch with env var based subprocess launch (async agent reads TALOS_BROKER_HOST/PORT) - conftest_hil.py: same fix for HIL agent fixture - .gitlab-ci.yml: add aiomqtt + python-json-logger to test-agent-hardware - shared/logging_config.py: add type annotation to fix mypy error - director/director.py: move logging import to top-level (ruff E402) - Auto-fix ruff import sorting in test files Co-Authored-By:Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-
Pierros Papadeas authored
- mosquitto.conf: enforce allow_anonymous false + password_file + acl_file - core/config.py: use structured logging from shared.logging_config - core/routes/legacy.py: deprecation headers on all legacy endpoints - core/requirements.txt: add prometheus-client, python-json-logger, slowapi - director/requirements.txt: add prometheus-client, python-json-logger - ops/docker-compose.yml: director healthcheck uses HTTP endpoint - README.md: updated project structure and tech stack for v0.4.0 Co-Authored-By:Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-
Pierros Papadeas authored
- README: updated project structure showing core/ package split, new dependencies, monitoring stack, and current feature list - DEVELOPMENT.md: updated Makefile targets, project structure diagram, test tier descriptions reflecting 17-module core package Co-Authored-By:Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-
Pierros Papadeas authored
core.routes.auth imported limiter from core.app, but core.app imports core.routes.auth to register its router -- circular dependency. This caused ImportError in production (CI detected as 500) even though tests passed because test fixtures import core.main which triggers core.app first, masking the cycle. Fix: move limiter creation to core.config (no circular deps) and import from there in both core.app and core.routes.auth. Co-Authored-By:Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-
Pierros Papadeas authored
New test_imports.py spawns a fresh Python subprocess for each module to verify it can be imported independently. This catches circular dependencies that are masked when modules load in a specific order (the root cause of the v0.4.0 500 error). 25 modules tested: all core.*, shared.*, director.* modules. Runs in lint stage as import-check job; test-unit depends on it. Co-Authored-By:Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-
Pierros Papadeas authored
This file is read at the start of every agent session. It captures rules that were learned the hard way (circular imports, test isolation, import order masking) so future sessions don't repeat mistakes. Key invariants documented: - No imports from core.app in route modules (circular import rule) - CI runs test suites in isolation (not combined) - ROADMAP.md is the single task list (no inventing new work) - Agent uses aiomqtt (not paho-mqtt) - core/main.py is a thin shim (no new code) - Templates use external JS modules (no inline JS) Co-Authored-By:Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-
Pierros Papadeas authored
New research for v0.5+ planning: - 00-executive-summary.md: strategic priorities post-v0.4.0 - 01-scheduling-architecture.md: OR-Tools CP-SAT, conflict detection - 02-performance-scaling.md: dSGP4, background threading, load targets - 03-data-resilience.md: CelesTrak fallback, TimescaleDB telemetry - 04-visualization-standards.md: CesiumJS, HTMX, CCSDS OMM/TDM - 05-satnogs-coexistence.md: hardware mutex, IQ capture, observations - 06-technology-decisions.md: tech evaluations and recommendations Archived v0.3 research to docs/research/archive/v0.3/ Updated mkdocs.yml navigation. Co-Authored-By:Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-
Pierros Papadeas authored
CLAUDE.md: - Added "Session Discipline" section: one session = one version, scope control rules, agent orchestration file boundaries - Added "Lessons Learned" section documenting v0.4.0 incidents (circular import, agent test breakage, HIL runner stall) - Updated current state with test counts and research status ROADMAP.md: - Split v0.5.0 (13 items, 8 weeks) into three focused releases: v0.5.0 (7 items: performance + data resilience) v0.5.1 (4 items: scheduling) v0.5.2 (2 items: batch propagation + telemetry) - Max 7 items per release to reduce blast radius Co-Authored-By:Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-
Pierros Papadeas authored
Items 21 and 24 have code/config written (metrics endpoints, Grafana dashboard JSON, Prometheus scrape config, docker-compose overlay) but the monitoring stack is not deployed. Corrected status to "DONE (code only)" and added item 66 to v0.6.0 Infrastructure for actual deployment (Grafana Cloud or self-hosted). Co-Authored-By:Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-
Pierros Papadeas authored
Added .skip-if-docs-only rule template that checks changes: paths. When a commit only touches docs/**, *.md, mkdocs.yml, CLAUDE.md, LICENSE, .editorconfig, or .gitattributes, all lint and test jobs are skipped. Only pages (MkDocs rebuild) runs. This prevents a full 19-job pipeline on documentation edits. Co-Authored-By:Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-
Pierros Papadeas authored
The /dashboard route queried the legacy Mission table and passed the SQLModel object directly to the template context. The TALOS_CONFIG JS bridge tried to serialize it to JSON, causing: TypeError: Object of type Mission is not JSON serializable Fix: remove the Mission import and query entirely (legacy code that should have been cleaned up in v0.4.0 item 32). The dashboard now uses Campaigns exclusively. Co-Authored-By:Claude Opus 4.6 (1M context) <noreply@anthropic.com>