Categories: , ,

In this post I have made an epidemic curve using the ggplot and incidence2 packages in R using the Queendland COVID-19 data that are avaikble from the Queensland Government Open Data resource.

A more detailed wotkthrough to produce epidemic curves in R can can found on the Epidemiologist R Handbook website. This handbook has been developed by epidemiologists transitioning to R.

The epidemic curve (or “epi curve”) is a core epidemiological chart typically used to visualise temporal patterns of illness onset among a cluster or epidemic of cases. In this workflow the incidence2 and ggplot packages are used to efficiently develop aggreated counts for specific time periods in the epidemic and present the results in a histogram.

A number of other packages are used to process the data and help with the presentation


Get the data from the Open Data resource

  • read the CSV from the Open data resource
  • change the variable name for the date index
  • apply the incidence function to the data and specify grouping variable (source of infection)
  • rename the variables
  • recode the source of infection variable
qld_covid <- read_csv("https://www.data.qld.gov.au/dataset/7b90d88e-4f1f-4770-b721-5d91ca36c514/resource/1dbae506-d73c-4c19-b727-e8654b8be95a/download/opendata_qld_covidcase_loc.csv")
qld_covid <- qld_covid %>% mutate(date_index = dmy(NOTIFICATION_DATE))

epi_covid <- incidence(x = qld_covid, 
                       date_index = date_index,
                       interval = "week",
                       groups = SOURCE_INFECTION)

epi_covid <- epi_covid %>% select(date_index, source=SOURCE_INFECTION, count)
epi_covid$source <- recode(epi_covid$source, `Locally acquired - contact of confirmed case and/or in a known cluster` = "Locally acquired - contact of confirmed case")

The final epicurve

epicurve <- plot(epi_covid, 
     centre_dates = FALSE,
     color = "black",
     date_format = "%b %Y\n(Week %W)") +
     scale_fill_brewer(palette="RdGy") +
        guides(fill=guide_legend(ncol=2, title.position = "top", title = "Source of infection")) +
        title = "Weekly notification counts of COVID-19 in Queensland",
        subtitle =  str_glue(                         
        "Case date range between {earliest_date} and {latest_date}.",
        # central_cases = nrow(central_data),
        earliest_date = format(min(as.Date(epi_covid$date_index, na.rm=T)), format = '%d %b %Y'),
        latest_date = format(max(as.Date(epi_covid$date_index, na.rm=T)), format = '%d %b %Y')),
        x = "Week of notification",
        y = "Number of cases") +
    theme_minimal() +
    theme(legend.position='bottom', plot.subtitle = element_text(face = "italic", size=9)) 

Final Epicurve