The coronavirus package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The raw data pulled from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus repository.
More details available here, and a csv
format of the package dataset available here
A summary dashboard is available here
As this an ongoing situation, frequent changes in the data format may occur, please visit the package news to get updates about those changes
Install the CRAN version:
install.packages("coronavirus")
Install the Github version (refreshed on a daily bases):
# install.packages("devtools")
devtools::install_github("covid19r/coronavirus")
The package contains a single dataset - coronavirus
:
This coronavirus
dataset has the following fields:
head(coronavirus)
#> Province.State Country.Region Lat Long date cases type
#> 1 Afghanistan 33 65 2020-01-22 0 confirmed
#> 2 Afghanistan 33 65 2020-01-23 0 confirmed
#> 3 Afghanistan 33 65 2020-01-24 0 confirmed
#> 4 Afghanistan 33 65 2020-01-25 0 confirmed
#> 5 Afghanistan 33 65 2020-01-26 0 confirmed
#> 6 Afghanistan 33 65 2020-01-27 0 confirmed
tail(coronavirus)
#> Province.State Country.Region Lat Long date cases type
#> 64569 Zhejiang China 29.1832 120.0934 2020-04-08 2 recovered
#> 64570 Zhejiang China 29.1832 120.0934 2020-04-09 3 recovered
#> 64571 Zhejiang China 29.1832 120.0934 2020-04-10 0 recovered
#> 64572 Zhejiang China 29.1832 120.0934 2020-04-11 1 recovered
#> 64573 Zhejiang China 29.1832 120.0934 2020-04-12 2 recovered
#> 64574 Zhejiang China 29.1832 120.0934 2020-04-13 1 recovered
Here is an example of a summary total cases by region and type (top 20):
library(dplyr)
summary_df <- coronavirus %>% group_by(Country.Region, type) %>%
summarise(total_cases = sum(cases)) %>%
arrange(-total_cases)
summary_df %>% head(20)
#> # A tibble: 20 x 3
#> # Groups: Country.Region [13]
#> Country.Region type total_cases
#> <chr> <chr> <int>
#> 1 US confirmed 580619
#> 2 Spain confirmed 170099
#> 3 Italy confirmed 159516
#> 4 France confirmed 137875
#> 5 Germany confirmed 130072
#> 6 United Kingdom confirmed 89570
#> 7 China confirmed 83213
#> 8 China recovered 78039
#> 9 Iran confirmed 73303
#> 10 Spain recovered 64727
#> 11 Germany recovered 64300
#> 12 Turkey confirmed 61049
#> 13 Iran recovered 45983
#> 14 US recovered 43482
#> 15 Italy recovered 35435
#> 16 Belgium confirmed 30589
#> 17 France recovered 28001
#> 18 Netherlands confirmed 26710
#> 19 Switzerland confirmed 25688
#> 20 Canada confirmed 25679
Summary of new cases during the past 24 hours by country and type (as of 2020-04-13):
library(tidyr)
coronavirus %>%
filter(date == max(date)) %>%
select(country = Country.Region, type, cases) %>%
group_by(country, type) %>%
summarise(total_cases = sum(cases)) %>%
pivot_wider(names_from = type,
values_from = total_cases) %>%
arrange(-confirmed)
#> # A tibble: 185 x 4
#> # Groups: country [185]
#> country confirmed death recovered
#> <chr> <int> <int> <int>
#> 1 US 25306 1509 10494
#> 2 United Kingdom 4364 718 -322
#> 3 France 4205 574 532
#> 4 Turkey 4093 98 511
#> 5 Spain 3268 547 2336
#> 6 Italy 3153 566 1224
#> 7 Russia 2558 18 179
#> 8 Peru 2265 23 844
#> 9 Germany 2218 172 4000
#> 10 Iran 1617 111 2089
#> # … with 175 more rows
The raw data pulled and arranged by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) from the following resources: