MATH 29264 – Advanced Functions and Data Modelling / MATH 29264 Data Modelling Project
Overview:
In this project, you will take on the role of secondary health science researchers to demonstrate your understanding of the data modelling techniques studied in class. As secondary researchers, you will analyze data collected on male body fat and draw some conclusions based on this data.
Learning Outcomes:
At the end of this assignment, you will have completed (in part) the following learning outcomes:
- Modelled a variety of practical problems in the health sciences, using various functions.
- Summarized data and its meaning using statistical terminology, charts and tables.
- Calculated measures of center and variation for variables of a given data set.
- Fitted a model to data using linear, log-linear and non-linear regression analysis.
Submission Format:
- Your project must be completed using Microsoft Excel.
- Each question should be on its own worksheet (except where otherwise indicated), labelled by part and question number (e.g. Part 1-2, Part 2-3, etc.)
- All graphs should be properly titled and fully labelled
- All statistical values/probabilities should be clearly labelled and calculated using Excel formulas
Project Requirements:
- You are expected to complete this project individually. Working with other students, or submitting work that is not yours will result in a mark of zero and a breach of academic integrity, with appropriate sanction, according to Sheridan’s Academic Integrity policy.
- You are expected to complete this project punctually. Late Project submissions are subject to a deduction of 10% per business day for up to 5 days.
- You are expected to complete this project professionally. Your assignment should be laid out clearly and neatly, with appropriate headings, titles, and labels, and your conclusions should be presented in a way that is easy to understand.
Setup (3 marks):
- Download the data set entitled Male_Bodyfat_Data.xls from SLATE
- This file contains data on the percent body fat, age, weight, height, and 10 body circumference measurements for 1099 adult men
- Identify the last three digits of your student number (on your ONECard)
- If your student number ends in 000, start at case 1000 (NOTE: not row 1000)
- If your student number ends in any other number (001 – 999), go to the case (NOT the row) that corresponds with those three digits
- COPY ALL COLUMNS of information from your starting case row, ), along with all the information for the next 99 cases (so you should have 100 cases in total)
- E.g. if your student number ends in 009, you would copy cases 009-108.
- E.g. if your student number ends in 547, you would copy cases 547-646.
- E.g. if your student number ends in 000, you would copy cases 1000-1099.
- Open a NEW EXCEL WORKBOOK and paste the data into your new workbook on Sheet1
- You MUST copy and paste the data into a new workbook. You CANNOT complete your assignment in the workbook that you downloaded from SLATE – doing so constitutes a breach of academic integrity and will result in a mark of ZERO (0).
- Rename this sheet to “Raw Data”.
PART 1: Organizing and Presenting Data (12 marks)
Constructing statistical charts and graphs using Microsoft Excel
- (3 marks) Construct a frequency table with 5-12 classes the male heights listed in the data. Use Excel’s Pivot Table/Pivot Chart feature to create the frequency table.
- (3 marks) On the same sheet as part 1, construct a frequency histogram for the data. Be sure to properly label the axes, give the graph a proper title, and eliminate any gaps between the bars.
- (3 marks) Repeat step 1 again for the male ages in the data. This should be on a different sheet.
- (3 marks) On the same sheet as part 3, construct a pie chart for the data. Be sure to properly label the axes, give the graph a proper title, and add an appropriate legend to the chart.
PART 2: Analyzing Data (6 marks)
Calculating summary statistics using Microsoft Excel
- On a new sheet, copy and paste two different measurements from the data (excluding age, height, and percent body fat) – be sure to label each column appropriately.
- (3 marks) On the same sheet, calculate the following summary statistics for the first additional measurement (you must use Excel formulas and functions – do not enter values manually):
- Median
- Mode (be sure to account for multiple modes, if applicable)
- Range
- (3 marks) On the same sheet, calculate the following summary statistics for the first additional measurement (you must use Excel formulas and functions – do not enter values manually):
- Mean
- Standard Deviation
- Coefficient of Variation
PART 3: Modelling Data using Correlation and Regression (15 marks)
Creating scatter plots and running regression analysis using Microsoft Excel
- On a new sheet, copy and paste the Percent Body Fat data, as well as the two additional measurements you used in part 2 – be sure to label each column appropriately.
- (3 marks) Create a scatter plot of Percent Body Fat versus the first additional measurement. The model should be set up so that the additional measurement is a predictor of Percent Body Fat. Add a linear trendline and include the equation of the line of best fit, and the model’s R2 value.
- (3 marks) Repeat step 2 comparing Percent Body Fat versus the second additional measurement. Again, the second additional measurement should be the predictor of Percent Body Fat.
- (3 marks) Calculate Pearson’s correlation coefficient (using Excel’s formulas and functions) for each of the two models established in question 2 and question 3. Which additional measurement that you selected is the better predictor for percent body fat in an adult male?
- (3 marks) For the variable that was the better predictor of Percent Body Fat, create another scatter plot, as you did in steps 2 and 3. However, this time, establish an exponential trendline for your data. Be sure to include the equation of the exponential trendline on your scatter-plot.
- (3 marks) For the same variable, calculate the R2 value for the exponential model you established in step 5. Be careful here, as this is not as simple as the linear model – you will need to follow the log-linear approach we studied in class, as the R2 value on the exponential trendline will be incorrect.