First usable release of a signal-first job discovery pipeline for infrastructure and systems roles. --- jobpipe ingests RSS/OPML job feeds, normalizes entries, scores them using heuristic plus ML-assisted ranking, and stores results in SQLite for querying. Target roles: * Infrastructure / Platform / SRE * Rust / Nix / distributed systems * Remote-first engineering roles --- * OPML-based configuration * Parallel RSS crawling (Tokio async) * Sources: * Hacker News (filtered queries) * Reddit job feeds * GitHub issue feeds * Remote job boards (RemoteOK, WWR, Remotive) --- * Deduplication (title + link key) * Time filtering (--days) * Normalized job fields: * title * link * summary * pub_date * source --- Heuristic scoring in Rust: Positive signals: * Infrastructure / platform / SRE roles * Kubernetes, Terraform, Rust, Nix * Distributed systems * Remote-first roles * Open source / nonprofit / security domains Penalties: * Support / helpdesk roles * QA / automation-heavy roles * ML / AI / frontend noise * Shift / NOC patterns * Geo bias filters --- * Logistic regression over labeled jobs * Keyword feature extraction * Outputs Rust-ready weights * Label via SQL (jobs.want = 0/1) --- Schema: jobs( id, title, link UNIQUE, summary, pub_date, source, score, want ) * Persistent storage * Label preservation * Queryable ranking --- Example: sqlite3 jobs.db <<EOF > top_jobs.txt SELECT title, score, link FROM jobs ORDER BY score DESC LIMIT 20; EOF --- label.sql -> jobs.want -> train.py -> score.rs * Minimum: 10 labeled jobs * Recommended: 30+ * Outputs keyword weights --- main.rs crawler.rs opml.rs model.rs score.rs db.rs output.rs --- jobpipe [OPTIONS] <OPML_FILE> Options: -d, --days <DAYS> default 15 -f, --format <FORMAT> plain | md | org | sql --- * Exact dedupe only (no fuzzy matching) * Rebuild required after scoring changes * Model not persisted (manual paste) * Small dataset may overfit * No alerting yet --- Near-term: * Dynamic weights (no rebuild) * Better dedupe (fuzzy match) Mid-term: * Alerts (email / Telegram) * Company scoring Long-term: * Embedded model inference * Dataset + experiment tracking * UI --- Alpha * End-to-end pipeline works * Stable for personal use * Interfaces may change --- Shift from: collecting job feeds to: building a personalized job ranking system