Explore projects
-
Mike Spadaru / taskmaster
MIT LicenseTaskmaster is a light-weight open-source software framework that aims to simplify distribution of big data processing and analysis tasks over multiple worker nodes.
Updated -
Giacomo Marciani / mapreduce-app
MIT LicenseScaffolding for Map/Reduce applications, leveraging Apache Hadoop.
Updated -
Giacomo Marciani / flink-app
MIT LicenseScaffolding for data stream processing applications, leveraging Apache Flink.
Updated -
Amit Kamat / Map-Reduce-Ukraine
MIT LicenseThis project aggregates trending data from Ukraine based Twitter accounts. The raw aggregated data is cleansed before analysis using some Big-data methods. The purpose of this project is to familiarize myself with the workings of Hadoop for HDFS and Map-Reduce infrastructure.
Updated -
Stack Exchange releases "data dumps" of all its publicly available content roughly every three months via archive.org.
This project is an example and a framework for building ETL for this data with Apache Spark and Java.
Updated -
NLTK for sentiment analysis given a Twitter streaming for a word. Configuration scripts for MongoDB and twitter streaming.
Updated -
DP3 is an algorithm for distributed and shared-memory parallel Frequent Itemsets Mining.
Updated -
New machine learning algorithms based on the minimum nescience principle
Updated -
Miguel Andreu / hadoop-premier-league
GNU General Public License v3.0 onlyThis project was an exercise for the Master in Big Data Engineering and Data Science at "Universidad Autónoma de Madrid". See the readme.md for more information.
Updated -
Darko Britvec / Geospatial Distributed Index - Spark Streaming
Apache License 2.0Spatial join of geospatial data from Kafka streams using Apache Spark (Spark Streaming).
Updated -
Práctica del módulo Big Data Processing (Spark y Scala) del V Bootcamp BD & ML de Keepcoding
Updated -
Workshop dictado por Jesús Méndez (https://pe.linkedin.com/in/jmendezgal) y Antonio Cachuán (https://linkedin.com/in/antoniocachuan/) los temas de Apache Druid, Certificarte en GCP y nuestro Data Engineering Program
Updated -
Application of Machine Learning for Identification and Prediction of Success of Crowdfunding Projects
Updated -
From Data ASOS (https://mesonet.agron.iastate.edu/request/download.phtml), Analysis of aviation data to underline some patterns
Updated -
Workshop de Big Data a cargo de Jimmy Farfán docente del curso online "Desarrollo de Aplicaciones de Big Data en Hadoop". Si requieren más información o cualquier duda pueden ubicarnos en facebook como Data Hack Formation.
Updated -
rychly-edu / theses / dist-forensic-digital-data-repo
Apache License 2.0Distributed storage for digital forensic data with data/metadata repository, API for queries and incoming/outgoing data, indexing, plug-in system for yet unsupported data-types, etc.
Updated -
Daniel Snider / crawler
GNU Affero General Public License v3.0A Python app for scanning large data sets of URLs for a given signature and storing the results to an ElasticSearch index. Useful applications for CERTs and security researchers, maybe others.
Updated -
-
Execute Hadoop and Spark applications on the BigData@Polito cluster with a single command
Updated -
Neuroscience Lab / BNDF
Apache License 2.0Structured Big data framework based on Apache Spark for storing and manipulating large scale multi channel neurophysiological recording data
Updated