The 2019 Novel Coronavirus COVID-19 (2019-nCoV) Dataset • coronavirus

The coronavirus package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The raw data pulled from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus repository.

More details available here, and a csv format of the package dataset available here

A summary dashboard is available here

Source: Centers for Disease Control and Prevention’s Public Health Image Library

Important Note

As this an ongoing situation, frequent changes in the data format may occur, please visit the package news to get updates about those changes

Installation

Install the CRAN version:

install.packages("coronavirus")

Install the Github version (refreshed on a daily bases):

# install.packages("devtools")
devtools::install_github("covid19r/coronavirus")

Usage

The package contains a single dataset - coronavirus:

library(coronavirus) 

data("coronavirus")

This coronavirus dataset has the following fields:

head(coronavirus) 
#>   Province.State Country.Region Lat Long       date cases      type
#> 1                   Afghanistan  33   65 2020-01-22     0 confirmed
#> 2                   Afghanistan  33   65 2020-01-23     0 confirmed
#> 3                   Afghanistan  33   65 2020-01-24     0 confirmed
#> 4                   Afghanistan  33   65 2020-01-25     0 confirmed
#> 5                   Afghanistan  33   65 2020-01-26     0 confirmed
#> 6                   Afghanistan  33   65 2020-01-27     0 confirmed
tail(coronavirus) 
#>       Province.State Country.Region     Lat     Long       date cases      type
#> 64569       Zhejiang          China 29.1832 120.0934 2020-04-08     2 recovered
#> 64570       Zhejiang          China 29.1832 120.0934 2020-04-09     3 recovered
#> 64571       Zhejiang          China 29.1832 120.0934 2020-04-10     0 recovered
#> 64572       Zhejiang          China 29.1832 120.0934 2020-04-11     1 recovered
#> 64573       Zhejiang          China 29.1832 120.0934 2020-04-12     2 recovered
#> 64574       Zhejiang          China 29.1832 120.0934 2020-04-13     1 recovered

Here is an example of a summary total cases by region and type (top 20):

library(dplyr)

summary_df <- coronavirus %>% group_by(Country.Region, type) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases)

summary_df %>% head(20) 
#> # A tibble: 20 x 3
#> # Groups:   Country.Region [13]
#>    Country.Region type      total_cases
#>    <chr>          <chr>           <int>
#>  1 US             confirmed      580619
#>  2 Spain          confirmed      170099
#>  3 Italy          confirmed      159516
#>  4 France         confirmed      137875
#>  5 Germany        confirmed      130072
#>  6 United Kingdom confirmed       89570
#>  7 China          confirmed       83213
#>  8 China          recovered       78039
#>  9 Iran           confirmed       73303
#> 10 Spain          recovered       64727
#> 11 Germany        recovered       64300
#> 12 Turkey         confirmed       61049
#> 13 Iran           recovered       45983
#> 14 US             recovered       43482
#> 15 Italy          recovered       35435
#> 16 Belgium        confirmed       30589
#> 17 France         recovered       28001
#> 18 Netherlands    confirmed       26710
#> 19 Switzerland    confirmed       25688
#> 20 Canada         confirmed       25679

Summary of new cases during the past 24 hours by country and type (as of 2020-04-13):

library(tidyr)

coronavirus %>% 
  filter(date == max(date)) %>%
  select(country = Country.Region, type, cases) %>%
  group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type,
              values_from = total_cases) %>%
  arrange(-confirmed)
#> # A tibble: 185 x 4
#> # Groups:   country [185]
#>    country        confirmed death recovered
#>    <chr>              <int> <int>     <int>
#>  1 US                 25306  1509     10494
#>  2 United Kingdom      4364   718      -322
#>  3 France              4205   574       532
#>  4 Turkey              4093    98       511
#>  5 Spain               3268   547      2336
#>  6 Italy               3153   566      1224
#>  7 Russia              2558    18       179
#>  8 Peru                2265    23       844
#>  9 Germany             2218   172      4000
#> 10 Iran                1617   111      2089
#> # … with 175 more rows

Data Sources

The raw data pulled and arranged by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) from the following resources:

World Health Organization (WHO): https://www.who.int/
DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.
BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/
National Health Commission of the People’s Republic of China (NHC): http:://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml
China CDC (CCDC): http:://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm
Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html
Macau Government: https://www.ssm.gov.mo/portal/
Taiwan CDC: https://sites.google.com/cdc.gov.tw/2019ncov/taiwan?authuser=0
US CDC: https://www.cdc.gov/coronavirus/2019-ncov/index.html
Government of Canada: https://www.canada.ca/en/public-health/services/diseases/coronavirus.html
Australia Government Department of Health: https://www.health.gov.au/news/coronavirus-update-at-a-glance
European Centre for Disease Prevention and Control (ECDC): https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases

coronavirus

Important Note

Installation

Usage

Data Sources

Links

License

Developers

Dev status