Projects with this topic
-
Cristian Vasu Data Portfolio / Scalable Machine Learning with SparkML - Census Income Classification
Built a complete machine learning pipeline in SparkML using the Adult Census dataset (~48k rows, 14 features). Implemented data preprocessing, feature encoding, cross-validation, and model training with Logistic Regression and Random Forest. Evaluated models with metrics such as AUC and F1-score. Reflected on scalability trade-offs and optimizations in distributed ML.
Updated -
Unified project demonstrating both batch analytics and real-time streaming pipelines with Apache Spark:
Batch (PySpark/Jupyter): Processed S&P 500 stock data, applied transformations, and ran distributed computations.
Streaming (Spark + Kafka): Built a streaming pipeline to consume Kafka topics, process messages in real-time, and visualize outputs.
Deployed using Docker and Jupyter for reproducibility.
Updated -
Tumult Core is a collection of composable components for implementing algorithms to perform differentially private computations.
Updated -
Tumult Analytics is a Python library for privately computing aggregate queries on tabular data. It is built atop the Tumult Core library.
Updated -
-
Deploying PySpark Jobs on Azure HDInsight Spark Cluster (CI/CD)
Updated -
Funnel provide an easy to use, easy to read framework to create very complex data selections over pandas DataFrames
Updated -
"Cloud container data analytics, statistical modeling, and machine learning on distributed databases". "A free opensource alternative to SPSS, SAS, MATLAB, PowerBI, Tableau and Alteryx". Runs on Linux, Windows, MacOS, and in the cloud via containers.
LaTeX statistics sas spss matlab Python R spark cloud gcp Oracle azure Amazon Web S... Kubernetes containers Docker ML machine lear... regression clustering TiDB Yugabyte MySQL MariaDB SQL sparkr pyspark RStudio - KNIME Anal... Apache Spark... PyTorch MXNet Chainer keras gluon Scikit-learn... ONNX MLOps - Anac... NumPy Ipython) StatsModels pytest dask Koalas API -... Tornado - Py... Altair Bokeh Jupyter Voila Plotly/Dash matplotlib Seaborn - C#... SASPy - R: T... ggplot2 shiny dash Sparklyr BlueSky Stat... Jamovi - Int... vs code Vim - Tableau TabPy Tableau Buil... Python) - PL... SQL Developer PostgreSQL MySQL/MariaDB pgAdmin4 dbeaver MySQL Workbench Spark SQL Delta Lake Angular 2+ React .NET Core JavaScript (JS) Typescript (TS) Blazor Razor html5 CSS3 AWS EC2 Servers docker-compose podman Red Hat Ente... Oracle Linux fedora centos Ubuntu (WSL 2) debian Kestrel nginx Apache web s... jira Git Gitlab CI/CD... Code Climate... Ansible helm Terraform Cloudera Dat... nifi blender godot MS OfficeUpdated -
-
-
Updated
-
Yelp open dataset explorer using spark and cassandra
Updated