After the completion of this course the student will be able to:
- perform exploratory large-scale data analysis with the help of descriptive statistics and some Software packages such as SPSS, STATA and/or R, WEKA
- find estimates, perform hypotheses testing and construct confidence intervals for several population parameters
- learn how to use analysis of variance and analysis of contingency tables
- learn how to use simple, and multiple regression models
- acquire basic knowledge in data mining concepts
- understand and be able to apply the basic principles of data preprocessing, clustering, classification, pattern discovery, association rule discovery to big data
In addition to the above:
- to perform statistical analyses for continuous and discrete data; present and interpret results using routines from statistical packages
- to recognize and interpret statistical procedures in articles from current and suitable literature
- to apply data mining concepts and algorithms to large scale biomedical data
An introductory course in Probability.
An introductory course in Statistics would help but is not a prerequisite.
It is recommended that students have at least a basic knowledge of Data Structures and Algorithms.
collection, classification, and presentation of data arising in public health and clinical studies
random variables and useful distribution models
Point estimation, hypothesis testing (parametric and non-parametric tests) and confidence intervals
analysis of variance
analysis of contingency tables
correlation and regression analysis
multiple regression, logistic regression
Data preprocessing and data compression
Classification (Naive Bayes, k-NN, Classification and regression trees)
Clustering Algorithms (partitioning, hierarchical, density-based, grid-based, model-based, outlier analysis)
Association rule discovery algorithms
Bayesian networks, neural networks
Text and Web Mining
Spatial and Temporal Data Mining
Sequence Data Mining
Evaluation of Data Analytics
- Daniel, W.W. and Cross, C.L. (2012). Biostatistics: a foundation for analysis in the health sciences (10th Edition). Wiley Global Education http://informatika.uvlf.sk/subory/prezentacie%20zas/book%201.pdf
- Klein, J. P. and Moeschberger, M. L. (2003). Survival Analysis: Techniques for Censored and Truncated Data, 2nd edition. New York : Springer Verlag.
- Cox, D. R.; Oakes, D. (1984). Analysis of Survival Data. Chapman and Hall, London – New York.
- Crowder, M. J., Kimber, A. C., Smith, R. L., and Sweeting, T. J. (1991). Statistical Analysis of Reliability Data. Chapman & Hall, London
- Ronald E. Walpole, Raymond H. Myers, Sharon L. Myers, Keying Ye (2011)
- Probability and Statistics for Engineers and Scientists (ninth edition) http://folk.ntnu.no/jenswerg/40CEFd01.pdf
- Ghosh, J.K., Delampady, M. and Tapas, S. (2006). An Introduction to Bayesian Analysis: Theory and Methods. Springer.
- Gilks, W.R., Richardson, S. and Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo in Practice. Chapman & Hall.
- Jiawei Han, Micheline Kamber & Jian Pei, Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufmann Publishers, May 2011.
- Tan, Steinbach, Kumar, Introduction to Data Mining, Addison-Wesley, 2007
- Margaret Dunham, Data Mining Introductory and Advanced Topics, 2003, Pearson Education
- David J. Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining , MIT Press, Fall 2000
- Armitage, P., Berry, G. and Mathews JNS (2002). Statistical Methods in Medical Research. 4th Edition. Blackwell Science (ISBN 0-632-05257-0)
- Altman DG, Practical Statistics for Medical Research, Chapman & Hall/CRC Texts in Statistical Science 1990, ISBN: 0412276305
- Bland M, An Introduction to Medical Statistics, Oxford Medical Publications
- 2000 (ISBN: 0192632698)
- Vittinghoff, E., Glidden, D.V., Shiboski, S.C., McCulloch, C.E., Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, Springer 2012 (ISBN 978-1-4614-1353-0)
- Wassertheil-Smoller Sylvia, Biostatistics and Epidemiology: A Primer for Health and Biomedical Professionals, Springer 2004 (ISBN: 0-387-40292-6)
Teaching and learning methods
Lectures, problem solving.
Assessment and grading methods
Written exams (70%), assignments (30%)
Schedule of lectures
Introduction, probabilities, random variables and distributions (Daskalaki)
Point estimation, hypothesis testing (parametric and non-parametric) and confidence
intervals (Economou – Malefaki)
ANOVA and contingency tables (Malefaki)
Week 4 – 5
Simple, multiple and logistic regression (Daskalaki)
Week 6 – 7
Survival Analysis (Economou)
Introduction to Bayesian Statistics (Malefaki)
Data preprocessing for Big Data Analytics (Megalooikonomou)
Data Classification, Data Clustering, Association Rule Discovery, Evaluation of Data
Spatial, Temporal and Text Data Mining (Megalooikonomou)
Bayesian Networks and Neural Networks (Sakellaropoulos)