Projects with this topic
-
PYthon Radio Astronomy anaLYSis and Image Synthesis
Updated -
Cristian Vasu Data Portfolio / Scalable Machine Learning with SparkML - Census Income Classification
Built a complete machine learning pipeline in SparkML using the Adult Census dataset (~48k rows, 14 features). Implemented data preprocessing, feature encoding, cross-validation, and model training with Logistic Regression and Random Forest. Evaluated models with metrics such as AUC and F1-score. Reflected on scalability trade-offs and optimizations in distributed ML.
Updated -
End-to-end design of a Hadoop-based ecosystem for healthcare data at scale (50 TB, IoT streams, medical imaging). Proposed a 10-node cluster architecture integrating HDFS, Spark, Hive, NiFi, Kafka, and Docker with HIPAA-compliant security (Kerberos, TLS, Apache Ranger). Delivered a proof-of-concept Docker deployment and professional proposal document.
Updated -
Analyzed decades of historical weather station data (1920–1940) using Hadoop MapReduce. Filtered operable stations, computed descriptive statistics (min, max, mean, median), and produced reports/graphs. Designed modular MRJobs to chain tasks together for scalable processing.
Updated