Introduction to the covid19swiss Dataset • covid19swiss

The covid19swiss R package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) pandemic outbreak in Switzerland cantons and Principality of Liechtenstein (FL).

Data structure

The covid19swiss dataset includes the following fields:

date - the timestamp of the case, a Date object
location - the Cantons of Switzerland and the Principality of Liechtenstein (FL) abbreviation code
location_type - description of the location, either Canton of Switzerland or the Principality of echtenstein
location_code - a canton index code for merging geometry data from the rnaturalearth package, ailable only for Switzerland cantons
location_code_type - the name of code in the rnaturalearth package for Switzerland map
data_type - the type of case
value - the number of cases corresponding to the date and data_type fields

Where the available data_type field includes the following cases:

tested_total - cumulative number of tests performed as of the date
cases_total - cumulative confirmed Covid-19 cases as of the current date
hosp_new - new hospitalizations on the current date
hosp_current - current number of hospitalized patients as of the current date
icu_current - number of hospitalized patients in ICUs as of the current date
vent_current - number of hospitalized patients requiring ventilation as of the current date
recovered_total - cumulative number of patients recovered as of the current date
deaths_total - cumulative deaths due to Covid-19 as of the current date

The data organized in a long format:

library(covid19swiss)

head(covid19swiss)
#>         date location         location_type location_code location_code_type
#> 1 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 2 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 3 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 4 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 5 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 6 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#>      data_type value
#> 1 tested_total     4
#> 2  cases_total    NA
#> 3     hosp_new    NA
#> 4 hosp_current    NA
#> 5  icu_current    NA
#> 6 vent_current    NA

It is straightforward to transform the data into a wide format with the pivot_wider function from the tidyr package:

library(tidyr)

covid19swiss_wide <- covid19swiss %>% 
  pivot_wider(names_from = data_type, values_from = value)

head(covid19swiss_wide)
#> # A tibble: 6 x 13
#>   date       location location_type location_code location_code_t… tested_total
#>   <date>     <chr>    <chr>         <chr>         <chr>                   <int>
#> 1 2020-01-24 GE       Canton of Sw… CH.GE         gn_a1_code                  4
#> 2 2020-01-25 GE       Canton of Sw… CH.GE         gn_a1_code                  8
#> 3 2020-01-26 GE       Canton of Sw… CH.GE         gn_a1_code                 11
#> 4 2020-01-27 GE       Canton of Sw… CH.GE         gn_a1_code                 18
#> 5 2020-01-28 GE       Canton of Sw… CH.GE         gn_a1_code                 27
#> 6 2020-01-29 GE       Canton of Sw… CH.GE         gn_a1_code                 54
#> # … with 7 more variables: cases_total <int>, hosp_new <int>,
#> #   hosp_current <int>, icu_current <int>, vent_current <int>,
#> #   recovered_total <int>, deaths_total <int>

Query and summarise the data

The following examples demonstrate simple methods for query and summarise the data with the dplyr and tidyr packages.

Cases summary by canton

The first example demonstrates how to query the total confirmed, recovered, and death cases by canton as of April 8th:

library(dplyr)

covid19swiss %>%
  filter(date == as.Date("2020-09-08"),
         data_type %in% c("cases_total", "recovered_total", "death_total")) %>%
  select(location, value, data_type) %>%
  pivot_wider(names_from = data_type, values_from = value) %>%
  arrange(-cases_total)
#> # A tibble: 26 x 3
#>    location cases_total recovered_total
#>    <chr>          <int>           <int>
#>  1 VD              8070              NA
#>  2 GE              7310              NA
#>  3 ZH              6652              NA
#>  4 TI              3565             929
#>  5 BE              2698              NA
#>  6 VS              2458             320
#>  7 AG              2260              NA
#>  8 FR              1920             164
#>  9 SG              1341              NA
#> 10 BS              1254            1154
#> # … with 16 more rows

Note: some fields, such as total_recovered or total_tested, are not available for some cantons and marked as missing values (i.e., NA)

Calculating rates for Canton of Geneva

In the next example, we will filter the dataset for the Canton of Geneva and calculate the following metrics:

Positive rate - \(\frac{Total ~confirmed}{Total ~tested}\)
Recovery rate - \(\frac{Total ~recovered}{Total ~confirmed}\)
Death rate - \(\frac{Total ~death}{Total ~confirmed}\)

covid19swiss %>% dplyr::filter(location == "GE",
                               date == as.Date("2020-04-10")) %>%
  dplyr::select(data_type, value) %>%
  tidyr::pivot_wider(names_from = data_type, values_from = value) %>%
  dplyr::mutate(positive_tested = round(100 * cases_total / tested_total, 2),
                death_rate = round(100 * deaths_total / cases_total, 2),
                recovery_rate = round(100 * recovered_total / cases_total, 2)) %>%
  dplyr::select(positive_tested, recovery_rate, death_rate) 
#> # A tibble: 1 x 3
#>   positive_tested recovery_rate death_rate
#>             <dbl>         <dbl>      <dbl>
#> 1            23.7          10.1       3.83

Values are in precentage

Separating between Switzerland and Principality of Liechtenstein

The raw data include both Switzerland and the Principality of Liechtenstein. Separating the data by country can be done by using the location field:

switzerland <- covid19swiss %>% filter(location != "FL")

head(switzerland)
#>         date location         location_type location_code location_code_type
#> 1 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 2 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 3 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 4 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 5 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#> 6 2020-01-24       GE Canton of Switzerland         CH.GE         gn_a1_code
#>      data_type value
#> 1 tested_total     4
#> 2  cases_total    NA
#> 3     hosp_new    NA
#> 4 hosp_current    NA
#> 5  icu_current    NA
#> 6 vent_current    NA

liechtenstein <- covid19swiss %>% filter(location == "FL")

head(liechtenstein)
#>         date location                 location_type location_code
#> 1 2020-02-27       FL Principality of Liechtenstein          <NA>
#> 2 2020-02-27       FL Principality of Liechtenstein          <NA>
#> 3 2020-02-27       FL Principality of Liechtenstein          <NA>
#> 4 2020-02-27       FL Principality of Liechtenstein          <NA>
#> 5 2020-02-27       FL Principality of Liechtenstein          <NA>
#> 6 2020-02-27       FL Principality of Liechtenstein          <NA>
#>   location_code_type    data_type value
#> 1         gn_a1_code tested_total     3
#> 2         gn_a1_code  cases_total    NA
#> 3         gn_a1_code     hosp_new    NA
#> 4         gn_a1_code hosp_current    NA
#> 5         gn_a1_code  icu_current    NA
#> 6         gn_a1_code vent_current    NA