Chapter 8 Data Format Standard
The minimal tidy data format that all packages need to incorporate looks like the following. This is a bare-bones minimum. Additional data columns are possible and welcomed, but, for inter-operability, the following are required.
The data columns are as follows:
- date - The date in YYYY-MM-DD form
- location - The name of the location as provided by the data source. The counties dataset provides county and state. They are combined and separated by a
,, and can be split by
tidyr::separate(), if you wish.
- location_type - The type of location using the covid19R controlled vocabulary. Nested locations are indicated by multiple location types being combined with a `_
- location_code - A standardized location code using a national or international standard. In this case, FIPS state or county codes. See https://en.wikipedia.org/wiki/Federal_Information_Processing_Standard_state_code and https://en.wikipedia.org/wiki/FIPS_county_code for more
- location_code_type The type of standardized location code being used according to the covid19R controlled vocabulary. Here we use
- data_type - the type of data in that given row. Includes
total_deaths, cumulative measures of both.
- value - number of cases of each data type