5 min read

Mapping point data

by

In this tutorial, we illustrate how to generate maps using point data. Typically, individual points will correspond to the location of a reported case. We will illustrate several approaches including: basic point maps, representing point density, and using spatial clustering to approximate spatial structures.

The data: a simulated Ebola outbreak in Freetown

This example uses the latest devel version of the http://www.repidemicsconsortium.org/outbreaks/ package, which can be installed by typing:


devtools::install_github("reconhub/outbreaks")

Note that this requires devtools to be installed with its dependencies.

Once installed, we load the package and have a quick look at the data:


library(outbreaks)

class(ebola_sim)
## [1] "list"
names(ebola_sim)
## [1] "linelist" "contacts"

head(ebola_sim$linelist)
##   case_id generation date_of_infection date_of_onset date_of_hospitalisation date_of_outcome
## 1  d1fafd          0              <NA>    2014-04-07              2014-04-17      2014-04-19
## 2  53371b          1        2014-04-09    2014-04-15              2014-04-20            <NA>
## 3  f5c3d8          1        2014-04-18    2014-04-21              2014-04-25      2014-04-30
## 4  6c286a          2              <NA>    2014-04-27              2014-04-27      2014-05-07
## 5  0f58c4          2        2014-04-22    2014-04-26              2014-04-29      2014-05-17
## 6  49731d          0        2014-03-19    2014-04-25              2014-05-02      2014-05-07
##   outcome gender           hospital       lon      lat
## 1    <NA>      f  Military Hospital -13.21799 8.473514
## 2    <NA>      m Connaught Hospital -13.21491 8.464927
## 3 Recover      f              other -13.22804 8.483356
## 4   Death      f               <NA> -13.23112 8.464776
## 5 Recover      f              other -13.21016 8.452143
## 6    <NA>      f               <NA> -13.23443 8.468572

## column 10 = longitude
## column 11 = latitude
lonlat <- ebola_sim$linelist[, c(10, 11)]
head(lonlat)
##         lon      lat
## 1 -13.21799 8.473514
## 2 -13.21491 8.464927
## 3 -13.22804 8.483356
## 4 -13.23112 8.464776
## 5 -13.21016 8.452143
## 6 -13.23443 8.468572
nrow(lonlat)
## [1] 5888

ebola_sim is a list whose first component is a data.frame containing the linelist of a simulated outbreak. Each row is a different reported case, with individual spatial coordinates provided as longitude and latitude. We isolate these coordinates in a 2-columns data.frame.

Tip to use your own data: you can read data from a spreadsheet into R and use the procedure described here, making sure you indicate the right columns for longitude and latitude (in our example, columns 10 and 11). The most common formats will be:

  • .xlsx: use the function read.xlsx from the xlsx package
  • .csv: use the function read.csv from base R
  • .txt: use the function read.table from base R

Basic point map

The packages ggmap and ggplot2 can be used to visualise the distribution of cases on a map. The function get_map can be tailored (see ?get_map) to use different zoom levels and different map backgrounds. Once the background map has been created, normal ggplot2 features can be used to overlay graphics on the map:


library(ggplot2)
library(ggmap)

## spatial locations
base_points <- ggplot(data = lonlat, aes(x = lon, y = lat))

## make basic map
base <- ggmap(get_map(lonlat, zoom = 13), base_layer = base_points) +
        labs(x = "", y = "")
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=8.469638,-13.233806&zoom=13&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false

## visualise map and spatial distribution
base + geom_point(col = "red", alpha = .3)

Here, transparency (the alpha parameter) is used to give an idea of point density when close locations result in partly overlapping symbols. In the following part we see how this can be improved using density estimation or spatial clustering.

Visualising point density

Kernel density estimation can be used to visualise the density of points (here, a proxy for the number of cases) on a map; this can be done easily using ggplot2’s geom_density_2d feature:


base + geom_point(col = "red", alpha = .3) + geom_density_2d()

The amount of smoothing used in the density estimation is determined by a parameter h; for instance, for less smoothing (adding some customisation for colors and line types):


base + geom_point(col = "black", alpha = .15) +
       geom_density_2d(color = "red", h = 0.005)

One can also represent case density using a heatmap:


heat_colors <- scale_fill_gradientn("Case density", colors = c("white", "gold", "red", "darkred"), guide = FALSE)

base + stat_density_2d(aes(fill = ..level..), geom = "polygon", alpha = .3) +
       heat_colors

Using spatial clustering

Point data may be non-randomly distributed, in which case it would make sense to identify clusters of infections. This can be achieve using various algorithms. Here, we illustrated the use of the Kmeans approach. This method uses spatial coordinates and a pre-defined number of clusters to identify spatial groups, so as to minimize the overall distance between cases and their group centroid.

We illustrate this procedure for an arbitrary number of groups set to 60 (this number can be varied to suit the needs of a specific dataset). This feature uses the devel version of epimaps, which can be installed using install_github("reconhub/epimaps").


library(epimaps)
clust <- spatial_clusters(lonlat, 60)
clust
## 
##   /// spatial_clusters object //
## 
##   // $lonlat: [matrix] lon/lat for 5888 point locations
##   // $clusters_coords: [matrix] lon/lat/size for clusters
##   // $clusters: [factor] cluster membership for 5888 point locations
##   // $voronoi: [deldir] voronoi tesselation for clusters
plot(clust)

We can add a Voronoi tesselation which will indicate which points belong to which cluster (see ?plot.spatial_clusters for options):

## basic plot

plot(clust, show_tiles = TRUE)



## more customisation

plot(clust, show_tiles = TRUE, show_clusters = FALSE,
     show_points = TRUE, pt_alpha = 0.1)



## different representation, more granularity

new_clust <- spatial_clusters(lonlat, 500)

plot(new_clust, show_tiles = TRUE, show_clusters = FALSE,
     show_points = TRUE, show_density = TRUE, zoom = 14,
     dens_bw = 0.01, pt_col = "black", pt_alpha = 0.1)
## Warning: Removed 883 rows containing non-finite values (stat_density2d).
## Warning: Removed 883 rows containing missing values (geom_point).
## Warning: Removed 238 rows containing missing values (geom_segment).