First usable release of a signal-first job discovery pipeline for infrastructure and systems roles.

---

jobpipe ingests RSS/OPML job feeds, normalizes entries, scores them using heuristic plus ML-assisted ranking, and stores results in SQLite for querying.

Target roles:

* Infrastructure / Platform / SRE
* Rust / Nix / distributed systems
* Remote-first engineering roles

---

* OPML-based configuration
* Parallel RSS crawling (Tokio async)
* Sources:

  * Hacker News (filtered queries)
  * Reddit job feeds
  * GitHub issue feeds
  * Remote job boards (RemoteOK, WWR, Remotive)

---

* Deduplication (title + link key)
* Time filtering (--days)
* Normalized job fields:

  * title
  * link
  * summary
  * pub_date
  * source

---

Heuristic scoring in Rust:

Positive signals:

* Infrastructure / platform / SRE roles
* Kubernetes, Terraform, Rust, Nix
* Distributed systems
* Remote-first roles
* Open source / nonprofit / security domains

Penalties:

* Support / helpdesk roles
* QA / automation-heavy roles
* ML / AI / frontend noise
* Shift / NOC patterns
* Geo bias filters

---

* Logistic regression over labeled jobs
* Keyword feature extraction
* Outputs Rust-ready weights
* Label via SQL (jobs.want = 0/1)

---

Schema:

jobs(
id,
title,
link UNIQUE,
summary,
pub_date,
source,
score,
want
)

* Persistent storage
* Label preservation
* Queryable ranking

---

Example:

sqlite3 jobs.db <<EOF > top_jobs.txt
SELECT title, score, link FROM jobs ORDER BY score DESC LIMIT 20;
EOF

---

label.sql -> jobs.want -> train.py -> score.rs

* Minimum: 10 labeled jobs
* Recommended: 30+
* Outputs keyword weights

---

main.rs
crawler.rs
opml.rs
model.rs
score.rs
db.rs
output.rs

---

jobpipe [OPTIONS] <OPML_FILE>

Options:
-d, --days <DAYS>        default 15
-f, --format <FORMAT>    plain | md | org | sql

---

* Exact dedupe only (no fuzzy matching)
* Rebuild required after scoring changes
* Model not persisted (manual paste)
* Small dataset may overfit
* No alerting yet

---

Near-term:

* Dynamic weights (no rebuild)
* Better dedupe (fuzzy match)

Mid-term:

* Alerts (email / Telegram)
* Company scoring

Long-term:

* Embedded model inference
* Dataset + experiment tracking
* UI

---

Alpha

* End-to-end pipeline works
* Stable for personal use
* Interfaces may change

---

Shift from:
collecting job feeds

to:
building a personalized job ranking system