CRIS Current Research Information System

Cluster analysis by k-means algorithm by R programming applied for the geological data analysis is the scope of the presented paper. The research object is the Mariana Trench, a hadal trench located in west Pacific Ocean. The study evaluates the similarity of the geological data by the analysis of their attributes. The original observation data set contained samples varying in parameters: geology (sediment thickness), tectonics (locations on the tectonic plates), volcanism (igneous volcanic areas), bathymetry (depth ranges) and geomorphology (slope steepness and aspect). The data pool was divided to the clusters using k-means algorithm with aim to detect similarities. Clustering was chosen as a main statistical method, since it enables detecting similar groups within the original data set by unsupervised classification. Technically, the research was performed using R language and its statistical libraries. The main R libraries include {cluster}, {factoextra}; minor libraries include {ggplot2}, {FactoMiner}, {openxlsx}, {carData}, {rio}, {car} and {flashClust}. Several clusters were tested from two to seven, the optimal number is defined as five. The results show visualized computations: correlation matrix of the factors; comparison of the bi-factors showing pairwise correlation; pairwise comparative analysis showing influence of the variables as bi-factors: sediment thickness correlating with slope angles; correlation of the volcanic igneous areas with slope angles and aspect degree. Four variables affect geomorphology: slope angle, sediment thickness, aspect degree, bathymetry and volcanism. The paper includes listings of R programming codes for repeatability of the algorithms in similar research.

Polina Lemenkova (2019). K-means Clustering in R Libraries {cluster} and {factoextra} for Grouping Oceanographic Data. INTERNATIONAL JOURNAL OF INFORMATICS AND APPLIED MATHEMATICS, 2(1), 1-26 [10.5281/zenodo.3457771].

K-means Clustering in R Libraries {cluster} and {factoextra} for Grouping Oceanographic Data

Polina Lemenkova^Primo

2019

Abstract

Cluster analysis by k-means algorithm by R programming applied for the geological data analysis is the scope of the presented paper. The research object is the Mariana Trench, a hadal trench located in west Pacific Ocean. The study evaluates the similarity of the geological data by the analysis of their attributes. The original observation data set contained samples varying in parameters: geology (sediment thickness), tectonics (locations on the tectonic plates), volcanism (igneous volcanic areas), bathymetry (depth ranges) and geomorphology (slope steepness and aspect). The data pool was divided to the clusters using k-means algorithm with aim to detect similarities. Clustering was chosen as a main statistical method, since it enables detecting similar groups within the original data set by unsupervised classification. Technically, the research was performed using R language and its statistical libraries. The main R libraries include {cluster}, {factoextra}; minor libraries include {ggplot2}, {FactoMiner}, {openxlsx}, {carData}, {rio}, {car} and {flashClust}. Several clusters were tested from two to seven, the optimal number is defined as five. The results show visualized computations: correlation matrix of the factors; comparison of the bi-factors showing pairwise correlation; pairwise comparative analysis showing influence of the variables as bi-factors: sediment thickness correlating with slope angles; correlation of the volcanic igneous areas with slope angles and aspect degree. Four variables affect geomorphology: slope angle, sediment thickness, aspect degree, bathymetry and volcanism. The paper includes listings of R programming codes for repeatability of the algorithms in similar research.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Rivista
	
				INTERNATIONAL JOURNAL OF INFORMATICS AND APPLIED MATHEMATICS
			
	Codice DOI
	
				https://dx.doi.org/10.5281/zenodo.3457771
			
	Citazione
	
				Polina Lemenkova (2019). K-means Clustering in R Libraries {cluster} and {factoextra} for Grouping Oceanographic Data. INTERNATIONAL JOURNAL OF INFORMATICS AND APPLIED MATHEMATICS, 2(1), 1-26 [10.5281/zenodo.3457771].
			
	Tutti gli autori
	
						Polina Lemenkova

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/968064

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

ND

social impact