COURSES
END OF SEMESTER COMMENTS
★
★
★
★
★
The lectures were easy to understand and carefully explained by the professor including the doubts. The presentations that were saved in the blackboard are used as additional references. The TA explained the concepts and code very well. The marks or assignment evaluations were done very strictly, ie, it is 1 mark less if in one line says 16 and another has 14(by mistake), -1 for report quality. I just hope it would have been much better if it was -0.5 rather than -1. Thanks for all the help. |
★
★
★
★
★
Learnt a lot. Thank to professor and TAs. |
★
★
★
★
★
could have been a little linient on the grading. |
★
★
★
★
★
Just an awesome class, but I felt that course load(hw) was a bit much |
★
★
★
★
★
The class was very informative and challenging. I learned a fair good deal from this course. Excited to be taking Ubiquitous Computer next fall under Dr. Pavlidis! |
★
★
★
★
★
Professor Pavlidis and the TAs Vitalii and Shaila instructed the course in an organzied manner, providing a learning enviornment encouraging participation and discussion. Weekly homeworks allowed for a steady practice of the information gained from lecture. Labs were extremly helpful in completing homework assignments. Professor Pavlidis provided interesting datasets from the medical field and educational research. Although I do wish I particpated in the in-person course provided due to the unique personalities of the professor and TAs the online form of the course was extremely convinient and useful. |
★
★
★
★
★
One of the best classes I have ever taken. Professor: Dr. Ioannis T Pavlidis is the best professor as per me. He is always ready to clear your doubts. Getting responses from him is the easiest task even on weekends which makes him unique among all. If you ask him any questions at any moment, he is very happy and eager to answer everything. His knowledge in the field is beyond the limits. Always listen to the feedback provided by the students. Make necessary changes in class delivery also if students needed (until they are reasonable for sure). Definitely, A LOT to learn from him. TAs: Vitalii and Shaila, are the best at their level. For the practical session, you can always reach out to Vitalii, he is always ready to guide you in the right direction. For any grading, project related queries, Shaila is always ready to give you logically correct feedback. With the help of them, you will definitely learn how to write and organize your code and report. Class quality and workload: Definitely, you will have a huge workload as nothing comes easily. When you have time and want to invest it instead of waste it, surely go for this class. You will enjoy it a lot and will learn a lot. After finishing this class, you will 100% feel that it was useful and all the hard work (ofc smart work is needed 😇😇) that you have done gave you fruitful results. Kudos 🙌🏻🙌🏻🙌🏻 to one of the best teams I have ever worked with. |
KEY INFORMATION
- Prof. Ioannis Pavlidis (This email address is being protected from spambots. You need JavaScript enabled to view it.) Office Hours: 3-4 pm on Fridays @ TEAMS
- Vitalii Zhukov (This email address is being protected from spambots. You need JavaScript enabled to view it.) Office Hours: 2-3 pm on Mondays @ TEAMS
- 10% Participation
- 50% (5 x 10%) Homework
- 40% Project
COURSE OUTLINE
- Topics to Cover: Situating Statistics and Machine Learning in Data Science; observations and variables; types of measurements for variables; distributions; numerical descriptive statistics; exploratory data analysis; bivariate data; data collection
- Topics to Cover: Probability; discrete probability distributions; continuous probability distributions; sampling distributions
- Homework #1 Out
- Topics to Cover: Hypothesis testing; estimation; sample size; assumptionss
- Assignment of Projects
- Topics to Cover: Inferences on the population mean; inferences on a proportion; inferences on the variance of one population; assumptions
- Homework #1 Due
- Homework #2 Out
- Topics to Cover: Inferences on the difference between means using independent samples; inferences on variances; inferences on means for dependent samples; inferences on proportions; assumptions
- Topics to Cover: Analysis of variance; linear model; assumptions; specific comparisons; random models; unequal sample sizes; analysis of means
- Homework #2 Due
- Homework #3 Out
- Project, milestone #1 Due 3/07/2021
- Topics to Cover: The regression model; estimation of parameters; inferences for regression; correlation; regression diagnostics
- Topics to Cover: The multiple regression model; estimation of coefficients; inferential procedures; correlations; special models; multicollinearity; variable selection; detection of outliers
- Topics to Cover: The dummy variable model; unbalanced data; models with dummy and interval variables; weighted least squares; correlated errors
- Homework #3 Due
- Homework #4 Out
- Topics to Cover: Logistic regression
- Project, milestone #2 Due
- Topics to Cover: Factorial experiments
- Topics to Cover: Block design; repeated measures designs
- Homework #4 Due
- Homework #5 Out
- Topics to Cover: One sample; two independent samples; more than two samples; rank correlation; the bootstrap
- Homework #5 Due 5/03/2021
- Project, milestone #3 Due 5/05/2021
- Project Reports Due
WEEKLY GRADES AND STUDENT COMMENTS
★
★
★
★
★
Today's lecture was very helpful as always. The high level information combined with the explanation from the professor was helpful to understand the information, and the example using the MPG data helped me visualize the concept. Additionally, the deeper discussion regarding the project was extremely beneficial for me. |
★
★
★
★
★
well understood. |
★
★
★
★
★
Great explanation. |
★
★
★
★
★
The class was very interesting and practical experience is very good. |
★
★
★
★
★
I had trouble following along with todays lecture for some reason. I think maybe it was just the speed you went through the slides, it felt like a lot to take in. The Rstudio session was also rushed, but that was understandable since class ran long with the questions. I'm still a little lost on how to do the last portion of the project, but I'm hopeful that we will go into more detail about it in the next lecture perhaps or maybe it will just become more clear as we progress with lecture and the Rstudio practice. |
★
★
★
★
★
Clarifications of the doubts in the class are really helpful and the class is really interesting. |
★
★
★
★
★
Today's lecture was particularly helpful with the lengthy discussion of the figures for the next part of the project. Additionally, the theory for logistic regression was very helpful to understand the concept |
★
★
★
★
★
Things are coming together now. Looks like the 3rd part of the project will be both challenging and fun. |
★
★
★
★
★
Having difficulties to do homework even after reviewed the lecture video couples of times. |
★
★
★
★
★
The class felt a little rushed, but maybe that's just due to it being short. I appreciated the time you took to go over the third and final part of the project, I hope that we can go into more depth as the class progresses I feel like the second portion of the project wasn't covered enough. The example Rstudio session was well done. Going in I wasn't sure what we were doing and by the end I understood enough to do the in-class exercise with some difficulty. |
★
★
★
★
★
no comments |
★
★
★
★
★
The whole lecture was very interesting and also easy to understand. |
★
★
★
★
★
It was good. |
★
★
★
★
★
Overall it was a good lecture, but it was somewhat hard to follow. I couldn't make some of the connections you were making and that may have been helped with examples or maybe not going quite as quickly through the material. |
★
★
★
★
★
Understood dummy variables. Need to review the second half of lecture again. Class exercise reinforced some of the concepts. |
★
★
★
★
★
class was interesting and class followed by relative assignments made me to revise the previous class which helped to have a clear idea in those topics |
★
★
★
★
★
Lecture was good and helpful. |
★
★
★
★
★
Everthing's clear and easy to understand and implement. |
★
★
★
★
★
This lecture was helpful for me to understand the underlying theory of a couple different approaches for linear models, but I had trouble grasping the difference between linear regression and linear modeling in terms of theory. However, the distinction that the application for different factor levels was helpful. |
★
★
★
★
★
Vitali's demo was a little rushed and it was a tad hard to relate it to the exercise |
★
★
★
★
★
I really like the way the class is going including the practical implementation of the topics covered and the time you give at the end of the lecture for the exercise try out. It seems really interesting. |
★
★
★
★
★
It was something new to me, and I think both professor and Vitalii did a great job on the topic! |
★
★
★
★
★
I'm very comfortable with the the lecture and also the practical session provided by the TA. It is very clear and I could clear all my doubts during the session too. |
★
★
★
★
★
I got a little lost at the end with the C(p) portion, but overall it was a great class. I appreciated the time spent talking about the second part of the project and it clarified a few things I was stuck on. |
★
★
★
★
★
Everything is clear and easy to get. |
★
★
★
★
★
Started understanding multivariate regression and ability to deduce response variable based on their combination. |
Comments from students [show / hide] |
★
★
★
★
★
Thanks for the examples and great explanation by vitalli. |
★
★
★
★
★
The lecture information was helpful and provided enough detail to help me understand the theory behind linear regression without becoming too complicated. The portion of class dedicated to discussion of the second project milestone was also highly appreciated. |
★
★
★
★
★
The class is pretty interesting with parallel practical work. |
★
★
★
★
★
This was probably the best class to date. I appreciated you going over the project first and taking the time to explain what the plots were. The practice was particularly helpful for the exercise today and sometimes it feels like this is a learn to use Rstudio course more than a statistics course. For example, with the project, which felt very much like a test of how well I could use Rstudio, but today's exercise didn't feel that way. I also appreciated the clear expectations for the second part of the project, so thank you. |
★
★
★
★
★
At the end of the class, I'm really happy as I have information about the ANOVA test and how to implement it on some data. For me, it's a valuable class. I appreciate it. |
Comments from students [show / hide] |
★
★
★
★
★
Thank you for the extension and the hints provided during the class. Is there a chance we can get the answers/explanation for homework 1? |
★
★
★
★
★
Classes are so far so good. |
★
★
★
★
★
I had some confusion based on the first project and what was expected from us. The class mostly cleared it up, but having the expectations explicitly outlined in the syllabus or on blackboard would've been helpful, maybe for future classes or for the second assignment if it's not too much trouble. One other thing, in a previous lecture you stated that the third plot was the intersection of the first and second (the lecture on Feb. 05), which made the plot clarification given today confusing since I don't believe that is actually the intersection (I could be mistaken). In any case, it would've been helpful to clarify with more time to fix the plot, again probably something for future classes. Overall the lecture today was good and I learned a lot, I really appreciated detail you both went into during the Rstudio portion of the class. Thank you! |
★
★
★
★
★
I think I understood this one the best. Good class! |
★
★
★
★
★
At the end of the class, I'm really happy as I have information about the ANOVA test and how to implement it on some data. For me, it's a valuable class. I appreciate it. |
Comments from students [show / hide] |
★
★
★
★
★
Overall I think today was a great class. I was a little lost on the discussion about the project and that has me worried that I am somehow behind. But overall I think the length of the class has been better the last two meetings and really appreciate the time in class to do the mini-assignment. |
★
★
★
★
★
It'll be helpful if you could explain more about the gephy for the upcoming project. |
★
★
★
★
★
very helpful as ur giving the hoemwork realted to the previous class |
★
★
★
★
★
The class is very interesting with a parallel practical and hands-on approach. |
★
★
★
★
★
The pace of this lecture was good for me and the information provided for gephi and for R for the project tasks was very concise and helpful. |
Comments from students [show / hide] |
★
★
★
★
★
Great lecture and tutorial. Would it be possible to see the answers for the exercises posted after the due date? I think it would help with the homework a lot. |
★
★
★
★
★
As I learn the theory of statistical methods and implement them on R studio , I'm getting more confident to use R programming. |
★
★
★
★
★
Today's lecture was very helpful to provide a high-level view of common test statistics and how they are computed and applied. Additionally, the information about the project was helpful for me as well as the R tutorial. |
★
★
★
★
★
This class was much simpler to follow than previous portions |
★
★
★
★
★
Good lecture. |
★
★
★
★
★
This class was particularly good. I liked that there was time at the end to work on the weekly assignment so we could ask questions if needed. |
Comments from students [show / hide] |
★
★
★
★
★
Today's class was very usefull. Dr. Pavlidis's lecture moves at a reasonable pace (for me) and tends to focus on important, high level information without going into too much detail about the underlying statistics/mathematics. Additionally, the R tutorials continue to be helpful for me and I find they are paced well |
★
★
★
★
★
It's been a good to learn more about the subject, as i get to learn new things and get to know more about the subject. Explaining more in deep would be a better like explaining with more examples would be easy to understand, other than that everything 's fine. thank you. |
★
★
★
★
★
Like the material being covered, the real world data analysis project, and practical help being provided by Vitali! I am coming back to school after a while and am a bit rusty with Math. Will covering the stats course at Khan Academy give me enough background to get a solid foundation for this course? |
★
★
★
★
★
Informative! |
★
★
★
★
★
The class would be interesting if the professor tells us more examples about the theoretical concepts like drawing things on the screen to give us more clarity. |
★
★
★
★
★
The class was nice and the professor explained everything very nicely. The TA also demonstrate materials nicely/ |
★
★
★
★
★
I would enjoy it if more in person classes are organised and a parallel on hands experience would do great in the meetings. |
★
★
★
★
★
Please reduce the Pace of your explanation and try to conduct atleast a class in-person in a month to revise all the topics coververed so that we can have more interaction. |
★
★
★
★
★
Much better today, but the practical portion was still far too quick. I couldn't keep up and it was hard to follow, please try to slow down with the examples, I like to try to understand how the code works and its hard to do that when you skip ahead. |
Comments from students [show / hide] |
★
★
★
★
★
Kindly, slow down the pace and use as many as examples to demonstrate in practical part of the class. |
★
★
★
★
★
The lab portion of the class was extremely rushed so it was hard to get anything useful from it. |
★
★
★
★
★
Please provide examples. |
★
★
★
★
★
The duration of the class is long, it will become less interesting as time passes. It would be better if the duration of the class is one and a half-hour each on two weekdays. |
★
★
★
★
★
I like the way you teach in class. But the pace of the practical class is fast. |
★
★
★
★
★
Thank you for your lecture. |
★
★
★
★
★
The R programming part was very fast. It would be better to cover the basics of each part slowly than to go fast through the whole code. |
Comments from students [show / hide] |
★
★
★
★
★
Class is helpful for both theory and code sections. Thank you! |
★
★
★
★
★
The exercises you worked through at times went too quickly for me, but outside of that the class was very useful. The class itself feels a bit long to be honest, it would've been nice to have two days at one and a half hours or two hours, three just seems like a lot for one time in my opinion, but that could just be me. I really did learn a lot and it was very interesting and I'm excited to learn more. |
★
★
★
★
★
Bit of audio problem, but otherwise great lecture. Liked the practical aspect and hands on coding. |
★
★
★
★
★
For the break, I would suggest objective time frames so we know how much time we have. Something like "we will take a break until [insert exact time]" would be helpful. Other than that, the lecture was very helpful |
★
★
★
★
★
The lecture was informative and a good review on some material I learned in previous courses. |
★
★
★
★
★
I've had no previous experience in R so this was a very good and useful introduction for me. |
★
★
★
★
★
First lecture was very clear and concise with its goals and content. The R introduction was also extremely helpful and structured very well. |
KEY INFORMATION
Class Meetings
Friday 4:00 – 7:00 pm @ Teams
Course Instructor
Prof. Ioannis Pavlidis (This email address is being protected from spambots. You need JavaScript enabled to view it..
Office hours: 3-4 pm on Fridays (@ Teams)
Course TA
Mohammed Emtiaz Ahmed (This email address is being protected from spambots. You need JavaScript enabled to view it..
Office hours: 12-2 pm on Thursdays (@ Teams)
Course Description - COSC 6323
The course covers statistical methods in human and technology studies or experiments, from where the bulk of scientific and engineering data originate. The course starts by situating statistics in the context of data science. Special emphasis is placed on the relationship of statistics to machine learning. Then, instruction proceeds in a stepwise manner building the student’s background in the statistical tools of the trade, without which an MS thesis or PhD dissertation cannot be complete. The course culminates with sessions on experimental design, one of the cornerstones of modern data science.
Although the introduction and methodological sections of scientific papers differ from discipline to discipline (e.g., algorithms vs. assays), the results sections of papers should conform to a universal pattern, according to currently accepted best practices. The produced data should be derived according to appropriate study/experimental designs and should be subjected to relevant statistical tests. There is no such thing as statistics for computer scientists or statistics for biologists; statistics is the same for everybody. However, certain disciplines tend to use some tools more than others, and instruction needs to be tailored according to students’ educational backgrounds. In computer science in particular, adopting statistical analysis of experimental results has been slow. This has changed the last few years and several computing disciplines have already adopted statistical methods as the analytic standard, while others are bound to follow sooner or later. Among the computer science communities that are at the forefront of this movement are the Human-Computer Interaction and Computer Vision communities. The Statistical Methods course aims to cover this need and is paced taking into account the typical background of graduate students in computer science. It is very practical in its orientation (no proofs), emphasizing the understanding of concepts and the ability to choose the right design or apply the right test.
The first part of the course starts with the delineation between continuous and discrete variables and the enormous implication that this carries for the selection of tests. Then, it proceeds with the description of distributions, probabilities and error types that are fundamental to the construction of the t-tests, ANOVA tests, and non-parametric tests. Next, the course visits regression in its various forms, completing the coverage of significance and association tests used in almost all scientific papers. Emphasis is placed on multiple regression and linear modeling – a powerful and elegant method to examine the effect of multiple factors in a research problem; it is heavily used nowadays in MS and PhD research. The treatment of symbolic data and the tools of last resort, that is, nonparametric methods complete the course’s first part.
The course’s second part covers various experimental designs. Before students start analyzing data, they need to know according to which principle to collect these data in order to address their hypothesis; for this, they need to pick the right experimental design. Even perfect analysis will not save the day if the investigator picked the wrong experimental design (i.e., garbage in – garbage out). Hence, at the end of the course’s second part students acquire 30,000 feet view of the scientific and engineering process, solidifying their ability to design, collect, and test data.
The course has four homework assignments to reinforce the understanding of the concepts and methods. In the place of a final exam, the course has a semester long-project, where a problem is defined for the class, and then each group of students is required to come up with a study design, collect/quality control data, and perform tests, putting everything in the form of a term paper. The homeworks are individual assignments while the project is a group assignment; each project group typically consists of 2-3 students.
The students need to know R and R Studio in order to process and plot the data. R is becoming one of the most useful tools for computer scientists in the data analytics business. The instructors provide the students with online educational material and organize an R tutorial class. Importantly, the last hour of each three-hour class session is devoted to R programming, where the students code the theoretical principles covered earlier in the session.
Gradebook
10% Participation
4 x 12.5 % Homework
40% Project
COURSE OUTLINE
Course Outline
Lesson 1: Data, Statistics, and Data Science 1/17/2020
Situating Statistics and Machine Learning in Data Science; observations and variables; types of measurements for variables; distributions; numerical descriptive statistics; exploratory data analysis; bivariate data; data collection
Lesson 2: Probabilities and Sampling Distributions 1/24/2020
Probability; discrete probability distributions; continuous probability distributions; sampling distributions
Homework #1 Out
Lesson 3: Principles of Inference 1/31/2020
Hypothesis testing; estimation; sample size; assumptions
Assignment of Projects
Lesson 4: Inferences on a Single Population 2/7/2020
Inferences on the population mean; inferences on a proportion; inferences on the variance of one population; assumptions
Homework #1 Due on 2/7/2020
Homework #2 Out on 2/7/2020
Lesson 5: Inferences for Two Populations 2/14/2020
Inferences on the difference between means using independent samples; inferences on variances; inferences on means for dependent samples; inferences on proportions; assumptions
Lesson 6: Inferences for Two or More Means 2/21/2020
Analysis of variance; linear model; assumptions; specific comparisons; random models; unequal sample sizes; analysis of means
Homework #2 Due on 2/21/2020
Homework #3 Out on 2/21/2020
Lesson 7: Linear Regression 3/6/2020
The regression model; estimation of parameters; inferences for regression; correlation; regression diagnostics
Lesson 8: Multiple Regression 3/27/2020
The multiple regression model; estimation of coefficients; inferential procedures; correlations; special models; multicollinearity; variable selection; detection of outliers
Lesson 9: Linear Models 4/3/2020
The dummy variable model; unbalanced data; models with dummy and interval variables; weighted least squares; correlated errors
Homework #3 Due on 4/3/2020
Homework #4 Out on 4/3/2020
Lesson 10: Categorical Data 4/10/2020
Hypothesis test for a multinomial population; goodness of fit; contingency tables; loglinear model
Lesson 11: Nonparametric Methods 4/17/2020
One sample; two independent samples; more than two samples; rank correlation; the bootstrap
Lesson 12: Experimental Designs 4/24/2020
Randomized designs; paired comparison designs; randomized complete block designs; Latin square designs; Greco-Latin square designs; balanced incomplete block designs; two-factor factorial designs; general factorial designs
Homework #4 Due on 4/27/2020
Project Reports Due on 4/29/2020
References
[1] Horton, N.J. and Kleinman, K. Using R and RStudio for Data Management, Statistical Analysis, and Graphics. CRC Press, 2015
[2] Freund, R. J., W. J. Wilson, and D. L. Mohr. Statistical Methods. 2010.
[3] Montgomery, Douglas, C. Design and Analysis of Experiments. Ninth Edition. John Wiley & Sons, 2017.
It's actually good.
Overall the class was good, the topics felt very clear. For the practice session, I also was having trouble understanding the block design in terms of commands. I didn't quite understand how Rstudio treats the design as a block without having to specify which variable is your blocking variable.