Tags

Tags give the ability to mark specific points in history as being important
  • alpha

    fd226757 · minor ·
    First usable release of a signal-first job discovery pipeline for infrastructure and systems roles.
    
    ---
    
    jobpipe ingests RSS/OPML job feeds, normalizes entries, scores them using heuristic plus ML-assisted ranking, and stores results in SQLite for querying.
    
    Target roles:
    
    * Infrastructure / Platform / SRE
    * Rust / Nix / distributed systems
    * Remote-first engineering roles
    
    ---
    
    * OPML-based configuration
    * Parallel RSS crawling (Tokio async)
    * Sources:
    
      * Hacker News (filtered queries)
      * Reddit job feeds
      * GitHub issue feeds
      * Remote job boards (RemoteOK, WWR, Remotive)
    
    ---
    
    * Deduplication (title + link key)
    * Time filtering (--days)
    * Normalized job fields:
    
      * title
      * link
      * summary
      * pub_date
      * source
    
    ---
    
    Heuristic scoring in Rust:
    
    Positive signals:
    
    * Infrastructure / platform / SRE roles
    * Kubernetes, Terraform, Rust, Nix
    * Distributed systems
    * Remote-first roles
    * Open source / nonprofit / security domains
    
    Penalties:
    
    * Support / helpdesk roles
    * QA / automation-heavy roles
    * ML / AI / frontend noise
    * Shift / NOC patterns
    * Geo bias filters
    
    ---
    
    * Logistic regression over labeled jobs
    * Keyword feature extraction
    * Outputs Rust-ready weights
    * Label via SQL (jobs.want = 0/1)
    
    ---
    
    Schema:
    
    jobs(
    id,
    title,
    link UNIQUE,
    summary,
    pub_date,
    source,
    score,
    want
    )
    
    * Persistent storage
    * Label preservation
    * Queryable ranking
    
    ---
    
    Example:
    
    sqlite3 jobs.db <<EOF > top_jobs.txt
    SELECT title, score, link FROM jobs ORDER BY score DESC LIMIT 20;
    EOF
    
    ---
    
    label.sql -> jobs.want -> train.py -> score.rs
    
    * Minimum: 10 labeled jobs
    * Recommended: 30+
    * Outputs keyword weights
    
    ---
    
    main.rs
    crawler.rs
    opml.rs
    model.rs
    score.rs
    db.rs
    output.rs
    
    ---
    
    jobpipe [OPTIONS] <OPML_FILE>
    
    Options:
    -d, --days <DAYS>        default 15
    -f, --format <FORMAT>    plain | md | org | sql
    
    ---
    
    * Exact dedupe only (no fuzzy matching)
    * Rebuild required after scoring changes
    * Model not persisted (manual paste)
    * Small dataset may overfit
    * No alerting yet
    
    ---
    
    Near-term:
    
    * Dynamic weights (no rebuild)
    * Better dedupe (fuzzy match)
    
    Mid-term:
    
    * Alerts (email / Telegram)
    * Company scoring
    
    Long-term:
    
    * Embedded model inference
    * Dataset + experiment tracking
    * UI
    
    ---
    
    Alpha
    
    * End-to-end pipeline works
    * Stable for personal use
    * Interfaces may change
    
    ---
    
    Shift from:
    collecting job feeds
    
    to:
    building a personalized job ranking system