A universal global measure of univariate and bivariate data utility for confidentialised unit record files
Most existing data utility measures for confidentialised unit record files have certain shortcomings, i.e. are limited to continuous variables, to univariate utility assessment, and/or to local information loss measurements. This seminar will present a new user-centered global data utility measure and several integrated local univariate and bivariate data utility measures, all based on a benchmarking approach. Information loss and data utility in the model are calculated using various statistical tests and association measures, such as two-sample Kolmogorov Smirnov test, Chi-Square test (Cramer’s V), ANOVA F test (Eta Squared), Kruskal-Wallis H test (Epsilon Squared), Spearman Coefficient (Rho) and Pearson Correlation Coefficient (r). The next important steps in global data utility assessment should be developing an R package or programme code for measuring global data utility automatically and also to establish the relationship between univariate, bivariate and multivariate data utility of confidentialised data.