Code EA8
Type Elective
Semester A
ECTS credits 5
Teaching Staff V. Megalooikonomou, S. Daskalaki, P.Economou, S. Malefaki, G. Sakellaropoulos

Learning outcomes

After the completion of this course the student will be able to:

  • perform exploratory large-scale data analysis with the help of descriptive statistics and some Software packages such as SPSS, STATA and/or R, WEKA
  • find estimates, perform hypotheses testing and construct confidence intervals for several population parameters
  • learn how to use analysis of variance and analysis of contingency tables
  • learn how to use simple, and multiple regression models
  • acquire basic knowledge in data mining concepts
  • understand and be able to apply the basic principles of data preprocessing, clustering, classification, pattern discovery, association rule discovery to big data

Competences

In addition to the above:

  • to perform statistical analyses for continuous and discrete data; present and interpret results using routines from statistical packages
  • to recognize and interpret statistical procedures in articles from current and suitable literature
  • to apply data mining concepts and algorithms to large scale biomedical data

Prerequisites

An introductory course in Probability.
An introductory course in Statistics would help but is not a prerequisite.
It is recommended that students have at least a basic knowledge of Data Structures and Algorithms.

Course contents

collection, classification, and presentation of data arising in public health and clinical studies
random variables and useful distribution models
Point estimation, hypothesis testing (parametric and non-parametric tests) and confidence intervals
analysis of variance
analysis of contingency tables
correlation and regression analysis
multiple regression, logistic regression
survival analysis
Bayesian Statistic
Data preprocessing and data compression
Classification (Naive Bayes, k-NN, Classification and regression trees)
Clustering Algorithms (partitioning, hierarchical, density-based, grid-based, model-based, outlier analysis)
Association rule discovery algorithms
Bayesian networks, neural networks
Text and Web Mining
Spatial and Temporal Data Mining
Sequence Data Mining
Evaluation of Data Analytics

Recommended reading

  1. Daniel, W.W. and Cross, C.L. (2012). Biostatistics: a foundation for analysis in the health sciences (10th Edition). Wiley Global Education http://informatika.uvlf.sk/subory/prezentacie%20zas/book%201.pdf
  2. Klein, J. P. and Moeschberger, M. L. (2003). Survival Analysis: Techniques for Censored and Truncated Data, 2nd edition. New York : Springer Verlag.
  3. Cox, D. R.; Oakes, D. (1984). Analysis of Survival Data. Chapman and Hall, London – New York.
  4. Crowder, M. J., Kimber, A. C., Smith, R. L., and Sweeting, T. J. (1991). Statistical Analysis of Reliability Data. Chapman & Hall, London
  5. Ronald E. Walpole, Raymond H. Myers, Sharon L. Myers, Keying Ye (2011)
  6. Probability and Statistics for Engineers and Scientists (ninth edition) http://folk.ntnu.no/jenswerg/40CEFd01.pdf
  7. Ghosh, J.K., Delampady, M. and Tapas, S. (2006). An Introduction to Bayesian Analysis: Theory and Methods. Springer.
  8. Gilks, W.R., Richardson, S. and Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo in Practice. Chapman & Hall.
  9. Jiawei Han, Micheline Kamber & Jian Pei, Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufmann Publishers, May 2011.
  10. Tan, Steinbach, Kumar, Introduction to Data Mining, Addison-Wesley, 2007
  11. Margaret Dunham, Data Mining Introductory and Advanced Topics, 2003, Pearson Education
  12. David J. Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining , MIT Press, Fall 2000
  13. Armitage, P., Berry, G. and Mathews JNS (2002). Statistical Methods in Medical Research. 4th Edition. Blackwell Science (ISBN 0-632-05257-0)
  14. Altman DG, Practical Statistics for Medical Research, Chapman & Hall/CRC Texts in Statistical Science 1990, ISBN: 0412276305
  15. Bland M, An Introduction to Medical Statistics, Oxford Medical Publications
  16. 2000 (ISBN: 0192632698)
  17. Vittinghoff, E., Glidden, D.V., Shiboski, S.C., McCulloch, C.E., Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, Springer 2012 (ISBN 978-1-4614-1353-0)
  18. Wassertheil-Smoller Sylvia, Biostatistics and Epidemiology: A Primer for Health and Biomedical Professionals, Springer 2004 (ISBN: 0-387-40292-6)

Teaching and learning methods

Lectures, problem solving.

Assessment and grading methods

Written exams (70%), assignments (30%)

Schedule of lectures

Week 1

Introduction, probabilities, random variables and distributions (Daskalaki)

Week 2

Point estimation, hypothesis testing (parametric and non-parametric) and confidence

intervals (Economou – Malefaki)

Week 3

ANOVA and contingency tables (Malefaki)

Week 4 – 5

Simple, multiple and logistic regression (Daskalaki)

Week 6 – 7

Survival Analysis (Economou)

Week 8

Introduction to Bayesian Statistics (Malefaki)

Week 9

Data preprocessing for Big Data Analytics (Megalooikonomou)

Week 10-11

Data Classification, Data Clustering, Association Rule Discovery, Evaluation of Data

Analytics (Megalooikonomou)

Week 12

Spatial, Temporal and Text Data Mining (Megalooikonomou)

Week 13

Bayesian Networks and Neural Networks (Sakellaropoulos)