BIOSTATISTICS

Code	A3
Type	Compulsory
Semester	A
ECTS credits	4
Teaching Staff	V. Megalooikonomou, S. Daskalaki, P.Economou, S. Malefaki, G. Sakellaropoulos

Learning outcomes

After the completion of this course the student will be able to:

perform exploratory large-scale data analysis with the help of descriptive statistics and some Software packages such as SPSS, STATA and/or R, WEKA
find estimates, perform hypotheses testing and construct confidence intervals for several population parameters
learn how to use analysis of variance and analysis of contingency tables
learn how to use simple, and multiple regression models
acquire basic knowledge in data mining concepts
understand and be able to apply the basic principles of data preprocessing, clustering, classification, pattern discovery, association rule discovery to big data

Competences

In addition to the above:

to perform statistical analyses for continuous and discrete data; present and interpret results using routines from statistical packages
to recognize and interpret statistical procedures in articles from current and suitable literature
to apply data mining concepts and algorithms to large scale biomedical data

Prerequisites

An introductory course in Probability.
An introductory course in Statistics would help but is not a prerequisite.
It is recommended that students have at least a basic knowledge of Data Structures and Algorithms.

Course contents

collection, classification, and presentation of data arising in public health and clinical studies
random variables and useful distribution models
Point estimation, hypothesis testing (parametric and non-parametric tests) and confidence intervals
analysis of variance
analysis of contingency tables
correlation and regression analysis
multiple regression, logistic regression
survival analysis
Bayesian Statistic
Data preprocessing and data compression
Classification (Naive Bayes, k-NN, Classification and regression trees)
Clustering Algorithms (partitioning, hierarchical, density-based, grid-based, model-based, outlier analysis)
Association rule discovery algorithms
Bayesian networks, neural networks
Text and Web Mining
Spatial and Temporal Data Mining
Sequence Data Mining
Evaluation of Data Analytics

Recommended reading

Daniel, W.W. and Cross, C.L. (2012). Biostatistics: a foundation for analysis in the health sciences (10th Edition). Wiley Global Education http://informatika.uvlf.sk/subory/prezentacie%20zas/book%201.pdf
Klein, J. P. and Moeschberger, M. L. (2003). Survival Analysis: Techniques for Censored and Truncated Data, 2nd edition. New York : Springer Verlag.
Cox, D. R.; Oakes, D. (1984). Analysis of Survival Data. Chapman and Hall, London – New York.
Crowder, M. J., Kimber, A. C., Smith, R. L., and Sweeting, T. J. (1991). Statistical Analysis of Reliability Data. Chapman & Hall, London
Ronald E. Walpole, Raymond H. Myers, Sharon L. Myers, Keying Ye (2011)
Probability and Statistics for Engineers and Scientists (ninth edition) http://folk.ntnu.no/jenswerg/40CEFd01.pdf
Ghosh, J.K., Delampady, M. and Tapas, S. (2006). An Introduction to Bayesian Analysis: Theory and Methods. Springer.
Gilks, W.R., Richardson, S. and Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo in Practice. Chapman & Hall.
Jiawei Han, Micheline Kamber & Jian Pei, Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufmann Publishers, May 2011.
Tan, Steinbach, Kumar, Introduction to Data Mining, Addison-Wesley, 2007
Margaret Dunham, Data Mining Introductory and Advanced Topics, 2003, Pearson Education
David J. Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining , MIT Press, Fall 2000
Armitage, P., Berry, G. and Mathews JNS (2002). Statistical Methods in Medical Research. 4th Edition. Blackwell Science (ISBN 0-632-05257-0)
Altman DG, Practical Statistics for Medical Research, Chapman & Hall/CRC Texts in Statistical Science 1990, ISBN: 0412276305
Bland M, An Introduction to Medical Statistics, Oxford Medical Publications
2000 (ISBN: 0192632698)
Vittinghoff, E., Glidden, D.V., Shiboski, S.C., McCulloch, C.E., Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, Springer 2012 (ISBN 978-1-4614-1353-0)
Wassertheil-Smoller Sylvia, Biostatistics and Epidemiology: A Primer for Health and Biomedical Professionals, Springer 2004 (ISBN: 0-387-40292-6)

Teaching and learning methods

Lectures, problem solving.

Assessment and grading methods

Written exams (70%), assignments (30%)

Schedule of lectures

Week 1

Introduction, probabilities, random variables and distributions (Daskalaki)

Week 2

Point estimation, hypothesis testing (parametric and non-parametric) and confidence

intervals (Economou – Malefaki)

Week 3

ANOVA and contingency tables (Malefaki)

Week 4 – 5

Simple, multiple and logistic regression (Daskalaki)

Week 6 – 7

Survival Analysis (Economou)

Week 8

Introduction to Bayesian Statistics (Malefaki)

Week 9

Data preprocessing for Big Data Analytics (Megalooikonomou)

Week 10-11

Data Classification, Data Clustering, Association Rule Discovery, Evaluation of Data

Analytics (Megalooikonomou)

Week 12

Spatial, Temporal and Text Data Mining (Megalooikonomou)

Week 13

Bayesian Networks and Neural Networks (Sakellaropoulos)