See past projects here
Submit links to projects the day before due to professor
Submit paper to usual dropbox link
For your final project, I’m asking you to create an exploration of a dataset of your choice. This project should provide people with ways to explore the dataset in a way that will guide them towards learning something about relationships and patterns in the data. The final project will have two graded parts - a Shiny app for end-users and a final paper where you tell us just what you did. Your app and paper should explore two or more particular features of the data you find interesting, and provide an end-user a way to derive new and meaningful inferences from the data within the bounds of the controls you provide them with.
Your app should present 1) the dataset you are looking at, and a full explanation of what it is (this can be a front page, a tabset, or whatever you would like), 2) provide users at least 3 ways to explore different aspects of the data, 3) provide model fits and statistical tests where it would help the user, 4) useful visual representations that are easy for the user to understand. It should be organized in a way that a user has no problem navigating around, and should not just be one giant pane of overwhelming data and analyses.
You will present the app to both your classmates and visitors in a presentation Zoom session. There, I will set you each up with your own breakout room where people can visit you and talk to you about the app. You will have a running app on http://shinyapps.io (or other server) to which you can give them a link. Half of the class will present for the first half-hour, the other half for the second half-hour, so you can visit each others projects!
In addition to your professor and TA visiting you and trying out your
app, you will be required to submit the code and data for the app as a
zipped up archive. Note - You might want to use the
goodpractices
library and the styler
library
to make sure your code is readable and accords to good style-guide
principles. You will submit this as you usually submit homeworks.
For your app and presentation, you will be graded on 1. Ease of
use
2. Interpretability of what a user learns from the app
3. Thorough exploration of the data
4. Quality of visualizations
5. Validity of analyses
6. Quality and readbility of code
7. Presentation of app to an end-user
8. Ability to answer questions about the data and results
Extra credit is available to those who want to publicize their app more
widely.
Each of the above will be graded on a 5 point scale.
1 = Student shows little ability to execute this concept.
2 = Student shows a flawed attempt at execution of this step, or minimal
effort.
3 = Student shows understanding of the concept, and is able to achieve a
minimal satisfactory outcome.
4 = Student shows understanding, and presents a well-crafted execution
of the concept.
5 = Student has achieved mastery of the concept, with a deep and
compelling execution.
The final paper should be broken up into something along the lines of the following sections. You may feel free to adapt this flexibly given your unique data set and set of problems and questions. But this is a general guide, particularly if you are lost.
As a quick note, you might want to massage you code chunks a bit
in your presentation so we don’t see code, error messages, warnings,
etc. Remember warning=FALSE
, message=FALSE
,
echo=FALSE
, and more are all your friends. For more -
including how to resize graphs and such - see here.
If you want to output your statistical results as a table, use knitr::kable or
kabelExtra.
If you want to build tables that you fill in with text, I recommend this
markdown table
generator as markdown
tables can be tricky.
You might want to use the goodpractices
library and the
styler
library to make sure your code is readable and
accords to good style-guide principles.
Last, here’s a markdown cheatsheet - although Rstudio’s help has a good set of materials as well.
Remember, I’m looking for quality well commented, well thought-out, well styled code in addition to gorgeous visualizations that convery a message and unambiguous analyses (where possible). As a guide to the whole paper, here’s my rubric. Once you finish your paper, assess yourself. Heck, if you even want to write a short section where you assess yourself against this rubric and justify why we should give you a certain grade according to this rubric, feel free!
Exepectations | 4 (Exemplary) | 3 (Accomplished) | 2 (Developing) | 1 (Beginning) |
---|---|---|---|---|
Description of data | The student provides the source of the dataset. The description includes background on how the data were collected, with a focus on details of the data collection that would be relevant to how they answer their question (e.g., understanding of sampling design that may be relevant to meeting assumptions of statistical tests). The student provides exploratory summary statistics or visualisations that help the reader understand the scope, content, and coverage of the data. | The student provides the source of the dataset. The description includes background on how the data were collected, with a focus on details of the data collection that would be relevant to how they answer their question (e.g., understanding of sampling design that may be relevant to meeting assumptions of statistical tests). The student provides exploratory summary statistics and visualizations. | The student provides the source and brief description of the dataset. The student provides exploratory summary statistics or visualizations. | Student gives us name and source of data set and what type of data is in it. |
Explanation and justification of question(s) | There is a single focal question that is testable given the data. There are additional questions that are subsets or follow ups of the focal question. These questions will also be testable given the data. The student has described the rationale behind the question, providing context for how they came up with this question. | There is a single focal question that is testable given the data. The student has described the rationale behind the question, providing context for how they came up with this question. | There is a single focal question that is testable given the data. The student has described the rationale behind the question. | There is a single focal question that relates to the data. The rationale for the question is unclear, however. |
Description of workflow | The student provides a verbal description of the workflow used to answer their question. What steps did they take to answer their question. This can include everything from data tidying to visualization to analysis. Justification for the workflow is included in this description. | The student provides a verbal description of the workflow used to answer their question. What steps did they take to answer their question? This can include everything from data tidying to visualization to analysis. | The student provides a verbal description of the workflow of the analyses used to answer their question. | The student provides a broad-based verbal description of the workflow of their process, but is not able to break it down into specific steps. The description reads as an abstract rather than a concrete set of actions. |
Selection and justification of statistical methods | The statistical methods used are appropriate and answer the question posed. A justification of statistical method choice includes a clear statement of the underlying model used, with a description of the data generating process and error generating process. The student has clearly described the assumptions of the test or tests that they used and provided support that these assumptions have been met (e.g., verbal descriptions, additional statistical tests, data visualisations). | The statistical methods used are appropriate and answer the question posed. The student has clearly described the assumptions of the test or tests that they used and provided support that these assumptions have been met (e.g., verbal descriptions, additional statistical tests, data visualisations). | The statistical methods used are appropriate and answer the question posed. | The studnet proposes statistical methods, but how they relate to the question being asked is unclear. |
Code quality | The student adheres to an R style guide (e.g., http://adv-r.had.co.nz/Style.html). The code is easy to read and well commented, allowing an external reviewer to understand why certain steps are being done. This code is modular, with complex problems being broken down into small, human readable, and logically discrete steps. Functions are used instead of repeated code chunks where appropriate. The report has been written in Rmarkdown | The code is easy to read and well commented, allowing an external reviewer to understand why certain steps are being done. This code is modular, with complex problems being broken down into small, human readable, and logically discrete steps. Functions are used instead of repeated code chunks where appropriate. | The code is easy to read and well commented, allowing an external reviewer to understand why certain steps are being done. | The student has made an effort to make the code readable via good commenting practices. |
Presentation of results | Data visualisations are clearly relevant to the questions being asked and models being tested. Figures are appropriately captioned and can be interpreted with minimal additional context (i.e., can stand alone). Each visualisation conveys information that is related to the questions described in the introduction. Axes are well labeled, legends are clear, color schemes make key points easily understandable to the reader. Minutaue of font-sizes, visual aesthetics show clear attention to detail. | Data visualisations are clearly relevant to the questions being asked and models being tested. Figures are appropriately captioned and can be interpreted with minimal additional context (i.e., can stand alone). Each visualisation conveys information that is related to the questions described in the introduction. | Data visualisations are clearly relevant to the questions being asked and models being tested. Each visualisation conveys information that is related to the questions described in the introduction. | Visualizations convey information related to analyses and questions in a clear manner, but are difficult to interpret. |
Discussion/evaluation of results | The conclusions are derived logically from the results and data visualisations. The student has examined and evaluated limitations in the analysis and has proposed ways to overcome these limitations. The student is able to synthesize multiple different results into a single strong conclusion. | The conclusions are derived logically from the results and data visualisations. The student has multiple conclusions drawn from analyses, but does not bring them together into a single strong point. | The conclusions are derived logically from the results and data visualisations. | The conclusions are derived from the results and data visualisations, but do not connect clarly and cleanly. Some analyses are ignored. |
Quality of writing | Spelling and grammar count. Sections written so that they can be clearly understood by the reader. Student’s prose flow cleanly and clearly. Sentences are complete, organized, and are easy to understand. Clarity of communication is key. | Spelling and grammar count. Sections written so that they can be clearly understood by the reader. Clarity of communication is key. | Spelling and grammar count. Sections written so that they can be clearly understood by the reader. | Spelling and grammar are not great, but the writing is still clear. |
So, each section of the rubric is out of 4.
Note: Using libraries not taught in class will be +1 point per library. Please note when you are using a new library for you.
This is before we moved to shiny apps, and, this needs a bit more organizational cohesion, and it was written in bullet points, which got big points off (write! and write well!) but overall, not a bad paper