AZSCIENCENET

An effective algorithm has been developed for clustering large amounts of data 2021-01-07 15:33:00 / NEW PUBLICATIONS

The article “Parallel batch k-means for big data clustering”, co-authored by the staff of the Institute of Information Technology of ANAS, was published in the prestigious journal “Computers & Industrial Engineering”. Indexed to the Web of Science, the journal is included Q1 and has an Impact Factor of 4.135.

The authors of the article are the vice-president of ANAS, director of the Institute of Information Technology, academician Rasim Alguliyev, head of the department, corresponding member of ANAS Ramiz Aliguliyev and leading researcher, associate professor Lyudmila Sukhostat.

An article prepared within the framework of SOCAR's grant project for the analysis of large amounts of data developed a parallel packet k-means algorithm. The proposed algorithm has the ability to analyze large amounts of data within the memory and computing resources of computers. This algorithm first divides large volumes of data into packets, taking into account the computer's memory and computing resources, and then clusters the packets in parallel using the k-means algorithm. The results obtained after parallel processing are re-clustered together. The cluster centers obtained in the last stage are considered as cluster centers of the whole data set, and then the initial data are grouped according to these centers. In terms of the value of the objective function, it sometimes loses a few percent to the k-means algorithm. The analysis has shown that there are opportunities to address this shortcoming, which will be explored in future studies. Statistical analysis of the results showed that the proposed algorithm is quite stable, and in most cases even better than the k-means algorithm.