Learning Objectives

Following this assignment students should be able to:

  • use joins to combine tables in SQL
  • understand the basic rules of tidy data
  • implement quality control for data entry in spreadsheets


Lecture Notes

  1. Joins
  2. Tidy Data
  3. Data Entry


  1. -- Basic Join --

    Write a query that returns the year, month, and day for each individual captured as well as it’s genus and species names. This can be accomplished by joining the species table to the surveys table using the species_id column in both tables. Save this query as species_captures_by_data.

    [click here for output]
  2. -- Multi-table Join --

    The plots table in the Portal database can be joined to the surveys table by joining plot_id to plot_id and the species table can be joined to the surveys table by joining species_id to species_id.

    The Portal mammal data include data from a number of different experimental manipulations. You want to do a time-series analysis of the population dynamics of all of the species at the site, taking into account the different experimental manipulations. Write a query that returns the year, month, day, genus and species of every individual as well as the plot_id and plot_type of the plot they are captured on. Save this query as species_plot_data.

    [click here for output]
  3. -- Filtered Join --

    You are curious about what other kinds of animals get caught in the Sherman traps used to census the rodents. Write a query that returns a list of the genus, species, and taxa (from the species table) for non-rodent individuals that are caught on the Control plots. Non-rodents are indicated in the taxa column of the species table. You are only interested in which species are captured, so make this list unique (only one line for each species). Save this query as non_rodents_on_controls.

    [click here for output]
  4. -- Detailed Join --

    We want to do an analysis comparing the size of individuals on the Control plots to the Long-term Krat Exclosures. Write a query that returns the year, genus, species, weight and the plot_type for all cases where the plot type is either Control or Long-term Krat Exclosure. Be sure to choose only rodents and exclude individuals that have not been identified to species. Remove any records where the weight is missing. Save this query as size_comparison_controls_vs_krat_exclosures.

    [click here for output]
  5. -- Aggregated Join --

    Write a query that displays the total number of rodent individuals sampled on each plot_type. Save this query as individuals_per_plot_type.

    [click here for output]
  6. -- Improving Messy Data --

    A lot of real data isn’t very tidy, mostly because most scientists aren’t taught about how to structure their data in a way that is easy to analyze.

    Download a messy version of some of the Portal Project data. Note that there are multiple tabs in this spreadsheet.

    Think about what could be improved about this data. In a text file (to be turned in as part of the assignment):

    1-5. Describe five things about this data that are not tidy and how you could fix each of those issues.

    6. Could this data easily be imported into a database in its current form?

    7. Do you think it’s a good idea to enter the data like this and clean it up later, or to have a good data structure for analysis by the time data is being entered? Why?

  7. -- Data entry validation in Excel --

    You’re starting a new study of small mammals at the NEON site at Ordway-Swisher. Create a spreadsheet in Excel for data entry. It should have four columns: Year, Site, Species, and Mass.

    Set the following data validation criteria to prevent any obviously wrong data from getting entered:

    1. Year must be an integer between 2015 and 2025.
    2. Site should be one of the following A1, A2, B1, B2.
    3. Species should be one of the following Dipodomys spectabilis, Dipodomys ordii, Dipodomys merriami.
    4. Mass should be a decimal greater than or equal to zero but less than or equal to 500 since mass is measured in grams in this study and nothing bigger than half a kilogram will possibly fit into your Sherman traps. Change the error message on this validation criteria to explain why data is invalid and what the valid values are.

    Save this file as yourname_ordway_mammal_data.xlsx.