Mapping Spatial Data in R

August 16, 2017   

This is a brief demonstration of common data manipulation and mapping techniques using spatial analysis tools in R. The goal here is to go from knowing nothing about shapefiles, to being able to create meaningful and attractive maps. For this exercise, I use crime data from Portland, Oregon that was provided by the National Institute of Justice for their crime prediction competition. The data and R scripts for this exercise are available at its Github repository.

Loading Spatial Objects from Shape Files

Spatial data usually comes in the form of shapefiles, which can be loaded using the readOGR function from the rgdal package.

port <- readOGR(dsn = "data", layer = "Police_Districts_Portland")
crime <- readOGR(dsn = "data", layer = "NIJ_Nov2016_Crime")

The port object contains information describing the geographic area of the 60 policing districts in Portland. The crime object contains data relating to every crime reported in Portland during the month of November 2016.

Exploring Spatial Objects

If you’re used to regular R dataframes, you’ll find that spatial objects (represented by S4 objects) are a little weird. But this allows a lot of flexibility. The first thing to know is that each spatial object belongs to one of many different “classes”. The most elementary type of spatial data are Points, which denote single point locations. Lines are a set of points connected by straight line segments that can be used to denote things like rivers or state boundaries. Polygons are shapes that denote area, and a Grid is a collection of cells organized into a regular grid. Running the class() command shows that the port object is a Spatial Polygons Data Frame and the crime object is a Spatial Points Data Frame. We can plot spatial objects easily using base R plot functions.

plot(port); title(main = list("Portland Police Districts (Polygon Object)", cex=0.8))
plot(crime); title(main = list("November 2016 Crime Reports (Points Object)", cex=0.8))

The map on the left is made up of 60 polygons, each representing a policing district within Portland. The Polygon class is especially suited to describing regions or zones. The plot on the right shows a scattering of points that each describe the location of a particular crime reported in November 2016. The Points class is particularly suited to describing exact locations.

Structure of Spatial Objects

The summary() command is a useful way of exploring a spatial object.

Object of class SpatialPointsDataFrame
              min     max
coords.x1 7608809 7717689
coords.x2  651489  727287
Is projected: TRUE 
proj4string :
[+proj=lcc +lat_1=44.33333333333334 +lat_2=46 +lat_0=43.66666666666666
+lon_0=-120.5 +x_0=2500000 +y_0=0 +ellps=GRS80 +units=ft +no_defs]
Number of points: 16834
Data attributes:
                CATEGORY           occ_date       census_tra    
 BURGLARY           :   95   2016/11/04:  645   Min.   :     0  
 MOTOR VEHICLE THEFT:  270   2016/11/03:  625   1st Qu.:  2000  
 OTHER              :13889   2016/11/05:  607   Median :  4700  
 STREET CRIMES      : 2580   2016/11/08:  607   Mean   :  6609  
                             2016/11/06:  602   3rd Qu.:  8201  
                             2016/11/01:  599   Max.   :980000  
                             (Other)   :13149                   

The first line of the summary() output confirms that the crime object is a Spatial Points Data Frame. A spatial object is made up of a number of different “slots” that, together, completely describe the geographical coordinates of the shape relevant to its class (polygons in this case). The exact number and types of slots differ according to the class of the object.

The next few lines in the output above summarize the contents of the Coordinates slot, which holds the geographic coordinates of each crime. The displayed values are known as the “bounding box,” which gives the four points (or “corners”) that denote the spatial extent of the data. The spatial extent is basically the corners of the smallest rectangle that contains all the crime points. These values can be accessed with the bbox() command.

              min     max
coords.x1 7608809 7717689
coords.x2  651489  727287

The next lines describe the projection characteristics of the crime object. A projection is an arbitrary coordinate reference system that provides a basis upon which to measure distances between objects in spatial data. All geographic objects are based on some form of coordinate system, and there are many different types of projections. The proj4string slot contains a string describing the projection information of the crime object’s coordinates. We’ll see below that some packages require us to change the projection system to one of the more common systems. The next line states that there were 16,834 recorded crime incidents in November 2016.

In addition to the geographic information, spatial objects also have additional data attributes that are contained in the data slot, which is actually a traditional R data frame. The output above shows that the data Data Frame has three columns - one describing the Category of each crime, one stating the date on which the crime occurred and one stating the census tract of each crime. Each row in the dataframe corresponds to a particular crime. The number of rows is therefore equal to the total number of crimes. Each slot can be accessed on its own via the @ operator, as follows.

       CATEGORY   occ_date census_tra
1 STREET CRIMES 2016/11/01       5200
2 STREET CRIMES 2016/11/01       7202
3 STREET CRIMES 2016/11/01      10600
4 STREET CRIMES 2016/11/01       1101
5 STREET CRIMES 2016/11/01       2000
6 STREET CRIMES 2016/11/01       1201

Manipulating spatial data

It turns out that some of the crimes contained in crime occur outside the city of Portland defined by the port polygons. We can clip the crimes that occur outside the city using simple base R functions. Plotting the clipped and non-clipped data illustrates the point.

crime_clp <- crime[port, ] # "Clip" crime to stay in same area

plot(port); points(crime); title(main = list("Original Crime Data", cex=0.8))
plot(port); points(crime_clp); title(main = list("Clipped Crime Data", cex=0.8))