Unsupervised Discretization of Continuous Variables in a Chicken Egg Quality Traits Dataset

Authors

  • Zeynel Cebeci Department of Biometry and Genetics, Faculty of Agriculture, Çukurova University, 01330 Adana
  • Figen Yıldız Department of Biometry and Genetics, Faculty of Agriculture, Çukurova University, 01330 Adana

DOI:

https://doi.org/10.24925/turjaf.v5i4.315-320.1056

Keywords:

Data preprocessing, Discretization, Unsupervised discretization, Egg quality traits

Abstract

Discretization is a data pre-processing task transforming continuous variables into discrete ones in order to apply some data mining algorithms such as association rules extraction and classification trees. In this study we empirically compared the performances of equal width intervals (EWI), equal frequency intervals (EFI) and K-means clustering (KMC) methods to discretize 14 continuous variables in a chicken egg quality traits dataset. We revealed that these unsupervised discretization methods can decrease the training error rates and increase the test accuracies of the classification tree models. By comparing the training errors and test accuracies of the model applied with C5.0 classification tree algorithm we also found that EWI, EFI and KMC methods produced the more or less similar results. Among the rules used for estimating the number of intervals, the Rice rule gave the best result with EWI but not with EFI. It was also found that Freedman-Diaconis rule with EFI and Doane rule with EFI and EWI slightly performed better than the other rules.

Downloads

Published

05.04.2017

How to Cite

Cebeci, Z., & Yıldız, F. (2017). Unsupervised Discretization of Continuous Variables in a Chicken Egg Quality Traits Dataset. Turkish Journal of Agriculture - Food Science and Technology, 5(4), 315–320. https://doi.org/10.24925/turjaf.v5i4.315-320.1056

Issue

Section

Animal Production