Explore projects
-
-
Updated
-
Updated
-
Updated
-
Trabajo Final de Grado de Ingeniería en Informática de la Facultad Politécnica de la Universidad Nacional de Asunción.
Updated -
Davide Peressoni / Approximate silhouette
Apache License 2.0Implementation of Altieri, F., Pietracaprina, A., Pucci, G., & Vandin, F. (2021). Scalable distributed approximation of internal measures for clustering evaluation. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM) (pp. 648-656). Society for Industrial and Applied Mathematics.
Updated -
-
-
Projet d'analyse de 3 jeux de données en clustering. Les résultats serviront à la rédaction d'un article sur le blog d'Octo.
Updated -
Theoretical Chemistry Jena / Quantum Chemistry / ConClusion
GNU Affero General Public License v3.0Cluster analysis of conformere ensembles
Updated -
containalytics / containalytics
GNU Affero General Public License v3.0"Cloud container data analytics, statistical modeling, and machine learning on distributed databases". "A free opensource alternative to SPSS, SAS, MATLAB, PowerBI, Tableau and Alteryx". Runs on Linux, Windows, MacOS, and in the cloud via containers.
Updated -
Collection of completed data-mining (university course) on python
Updated -
Clustering documents to get an overview of a corpus
Updated -
Orange-OpenSource / documentare / documentare-simdoc
GNU General Public License v2.0 or laterLibrary and tools for similarity measurement, classification and clustering of digital content and segmentation images from digitized document
Updated -
Luis Miguel Mejía Suárez / documents-cluster
Apache License 2.0This repo presents performance comparisons between a serial implementation, a MPI based and a Spark based implementation of a document clustering algorithm
Updated -
Elena Tea / DPCfam
OtherThis repository contiains the implementation of DPC-based algorithm as described in Russo, E.T., Laio, A. & Punta, M. Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation. BMC Bioinformatics 22, 121 (2021). https://doi.org/10.1186/s12859-021-04013-x. Note that the implementation has been written with the puropose of analysing, on a traditional workstation (8GB ram, 4-8 cores), query datasets with up to 5000 proteins, as those analysed in the reference paper.
Updated -
DPCfam Workstation version. Runs on Linux-based systems. Developed and tested on Ubuntu 18. DPCfamW uses the moodycamel::ConcurrentQueue library ( https://github.com/cameron314/concurrentqueue ) freely available provided citation (Simplified BSD license). This version replicates the pipeline used in to anlayze UniRef50 (v. 2017_07) as in Unsupervised protein family classification by Density Peak clustering, Russo ET, 2020, PhD Thesis ( http://hdl.handle.net/20.500.11767/116345 ), but with smaller datasets. Largest dataset we analysed is the TESTproteins_cd50.fasta datased we provide in this package. Due to memory bounds we do not guarantee that the abalysis of largest datasets is acheivable with this version.
Updated -
Portfolio / EDA + Water Samples Clusters
Apache License 2.0Search for patterns in river water samples data
Updated -
QHPC / EECluster
BSD 2-Clause "Simplified" LicenseEECluster is software tool for managing the energy-efficient allocation of the cluster resources. EECluster uses a Hybrid Genetic Fuzzy System as the decision-making mechanism. See pirweb.edv.uniovi.es/eecluster.
Updated -
Implementation of fuzzy k-means (with extragrades) clustering in Rust
Updated