D
datalakehouse

Projects with this topic

View Architecture Lakehouse project

Alexandre TOTO / Architecture Lakehouse

Projet de référence d'une architecture Lakehouse moderne appliquée à la détection de fraude bancaire.

Simule un environnement de production avec trois sources de données hétérogènes (fichiers CSV, base PostgreSQL, streaming Kafka/Redpanda) ingérées en continu vers un stockage objet S3-compatible (MinIO).

Stack technique :
Ingestion batch : Apache Spark (PySpark) + Delta Lake Ingestion streaming : Spark Structured Streaming + Redpanda (Kafka) Orchestration : Apache Airflow Transformation : dbt (DuckDB) Stockage : MinIO (S3), Delta Lake (Bronze/Silver), Parquet (Gold) Exploration : DuckDB / DBeaver
Architecture en médaillon (Medallion Architecture) :
Bronze : données brutes, sources séparées Silver : données nettoyées, déduplication inter-sources Gold : agrégats métier (fraude par heure)
L'ensemble de la stack tourne en local via Docker Compose.

datalakehouse minio kafka PostgreSQL Python dbt airflow

0

Updated Mar 16, 2026

0 0 0 0

Updated Mar 16, 2026