Chapter 8 Data Format Standard

The minimal tidy data format that all packages need to incorporate looks like the following. This is a bare-bones minimum. Additional data columns are possible and welcomed, but, for inter-operability, the following are required.

x
date location location_type location_code location_code_type data_type value
2020-01-21 Washington state 53 fips_code cases_total 1
2020-01-21 Washington state 53 fips_code deaths_total 0
2020-01-22 Washington state 53 fips_code cases_total 1
2020-01-22 Washington state 53 fips_code deaths_total 0
2020-01-23 Washington state 53 fips_code cases_total 1
|2020-01-23 |Washington |state |53 |fips_code |deaths_total | 0|

The data columns are as follows:

  • date - The date in YYYY-MM-DD form
  • location - The name of the location as provided by the data source. The counties dataset provides county and state. They are combined and separated by a ,, and can be split by tidyr::separate(), if you wish.
  • location_type - The type of location using the covid19R controlled vocabulary. Nested locations are indicated by multiple location types being combined with a `_
  • location_code - A standardized location code using a national or international standard. In this case, FIPS state or county codes. See https://en.wikipedia.org/wiki/Federal_Information_Processing_Standard_state_code and https://en.wikipedia.org/wiki/FIPS_county_code for more
  • location_code_type The type of standardized location code being used according to the covid19R controlled vocabulary. Here we use fips_code
  • data_type - the type of data in that given row. Includes total_cases and total_deaths, cumulative measures of both.
  • value - number of cases of each data type