Explore projects
-
New machine learning algorithms based on the minimum nescience principle
Updated -
DP3 is an algorithm for distributed and shared-memory parallel Frequent Itemsets Mining.
Updated -
NLTK for sentiment analysis given a Twitter streaming for a word. Configuration scripts for MongoDB and twitter streaming.
Updated -
Stack Exchange releases "data dumps" of all its publicly available content roughly every three months via archive.org.
This project is an example and a framework for building ETL for this data with Apache Spark and Java.
Updated -
Amit Kamat / Map-Reduce-Ukraine
MIT LicenseThis project aggregates trending data from Ukraine based Twitter accounts. The raw aggregated data is cleansed before analysis using some Big-data methods. The purpose of this project is to familiarize myself with the workings of Hadoop for HDFS and Map-Reduce infrastructure.
Updated -
Giacomo Marciani / flink-app
MIT LicenseScaffolding for data stream processing applications, leveraging Apache Flink.
Updated -
Giacomo Marciani / mapreduce-app
MIT LicenseScaffolding for Map/Reduce applications, leveraging Apache Hadoop.
Updated -
Mike Spadaru / taskmaster
MIT LicenseTaskmaster is a light-weight open-source software framework that aims to simplify distribution of big data processing and analysis tasks over multiple worker nodes.
Updated