Prof. Ioannis Pavlidis (firstname.lastname@example.org).
Office hours: Friday 12pm-1pm (HBSC 306)
Dinesh Majeti (email@example.com).
Office hours: Tuesday, Thursday 11am-12pm (HBSC 302)
The course covers statistical methods in human and technology studies or experiments. The course starts with a contrast of hypothesis-driven research supported by statistical inference versus rigorous deduction based on first principles; this is in order to delineate the current from the past mode of science and motivate the subject. Then, it proceeds in a step-wise manner building the student’s background in the statistical tools of the trade. The course culminates the last two weeks with a connection of sound research methods to sound researchers’ attitudes, by delving into the teachings of stoic philosophy.
Although the introduction and methodological sections of papers differ from discipline to discipline (e.g., algorithms vs. assays), the results section of papers should look similar, according to currently accepted best practices. The produced data should be derived according to appropriate study/experimental designs and should be subjected to relevant statistical tests. There is no such thing as statistics for computer scientists or statistics for biologists; statistics is the same for everybody. However, certain disciplines tend to use some tools more than others, and instruction needs to be tailored according to the differing educational backgrounds. In computer science in particular, awakening to standard analysis of study/experimental results has been slow; most of this analysis used to be carried out heuristically. This has changed the last few years and several computer science disciplines have already adopted statistical methods as the standard in results analysis, while others are bound to follow sooner or later. Among the computer science communities that are at the forefront of this movement are the Human-Computer Interaction and Computer Vision communities. The Statistical Methods course aims to cover this need and is paced taking into account the typical background of graduate students in computer science. It is very practical in its orientation (no proofs), emphasizing the understanding of concepts and the ability to apply the right design or test to the right problem.
The main part of the course starts with the delineation between continuous and discrete variables and the enormous implication that this carries for the selection of tests. Then, it proceeds with the description of distributions, probabilities and error types that are fundamental to the construction of the t-tests, ANOVA tests, and non-parametric tests. In the second stage, the course visits regression in its various forms, completing the coverage of significance and association tests used in almost all scientific papers. The treatment of data collection comes next, although one would expect it earlier. The reason for this delay is emphasis on quality control methods in field data collection, which requires knowledge of statistical testing and association. In the last stage of the course’s main part, we visit the various experimental designs, including new methods, the so-called Mixed Methods. Before start analyzing data, one needs to know according to which principle to collect these data in order to address her/his hypothesis; for this, s/he needs to pick the right design. Even an impeccable testing will not save the day if the researcher picked the wrong study or experimental design (garbage in – garbage out). Hence, the student acquires towards the end of the course a 30,000 feet view of the scientific process, solidifying her/his ability to design, collect, and test.
The emergence of the statistical design of studies/experiments and the statistical analysis of results coincides with the spread of interdisciplinary research projects. These projects involve many people from many disciplines and last for a long time. An example of such a booming discipline in Computer Science is Wireless Health, where computer scientists, medical doctors, and social scientists are involved. Because heuristic analysis of results is no longer an option (where iffy outcomes may be claimed as improvements), it is entirely possible that the team finds after several years and millions of dollars that they wasted their time and their resources. This motivates the closing sessions on stoic philosophy for this course.
The course has three homework assignments to reinforce the understanding of the three main segments of the course and a short one-page essay to cover the culminating lecture on research attitudes. In the place of exam, the course has a semester long-project, where a problem is defined for the class, and then each student is required to come up with a study design, collect/quality control data, and perform tests, putting everything in the form of a term paper. In 2014 the theme of the course was career quantification of computer-science professors and their reflection on departments, based on openly available publication and funding data. The question was whether objective performance data was in step with the departmental ranking reported by the U.S. News Report or not. Project themes may change from year to year to keep things interesting.
3 x 15 % Homework
Observations in science; induction, deduction, and hypothesis; absolute and comparative experiments; drawing inference
Concept vs. variables; types of variables (categorical vs. interval); location, dispersion, and other measures; histograms and boxplots
Discrete (Uniform, Binomial, Poisson) vs. continuous (Normal) distributions; sampling distributions (χ2, t, and F); α and β; Type I and Type II errors; p values; power
Hypothesis test on μ, p, and σ2; assumptions and violations (tests of normality); pooled t test; paired t test
Analysis of variance (ANOVA); assumptions and violations; analyzing non-normal data; corrections for multiple comparisons
Homework #1 Due, Homework #2 Out
Descriptive statistics; Cohen’s kappa; Kolmogorov-Smirnov; Mann–Whitney; and, Wilcoxon tests
Types of regression analysis (maximum likelihood, least squares, logistic, Poisson); non-linear least squares regression
Homework #2 Due, Homework #3 Out
Data collection in qualitative and quantitative research; quality control; sample size; random, non-random, and mixed sampling
Error control designs; treatment designs; combination designs; sampling designs
Completely randomized designs; randomized block designs; Latin square designs; factorial designs
Homework #3 Due
Response surface designs; split-top designs; designs with repeated measures
New methods for interdisciplinary research (qualitative + quantitative); what and when; mixed designs (convergent parallel design; explanatory sequential design; embedded design)
Emotion vs. rationality in dealing with success and adversity in longitudinal projects
Historical paradigms to follow or avoid
Essays and Project Reports Due
 Boddy, R. and Smith, G. Statistical Methods in Practice for Scientists and Technologists. Wiley, 2009.
 Freund, R. J., W. J. Wilson, and D. L. Mohr. Statistical Methods. 2010.
 Hinkelmann, Klaus, ed. Design and Analysis of Experiments, Special Designs and Applications. Vol. 3. John Wiley & Sons, 2011.
 Friedman, Lawrence M., Curt Furberg, and David L. DeMets. Fundamentals of Clinical Trials. Vol. 4. New York: Springer, 2010.