Last updated:


Prepare your data

You will need the following packages:

library(sf) 
library(ggplot2)
library(RColorBrewer)
library(plyr)
library(tibble)
library(grid)


Look up appropriate EPSG code for the region you want to map at https://epsg.io/

For this example, we will be using EPSG:2263 and plotting New York City school district borders as well as the locations of various NYC high schools.

To view information for EPSG:2263, run:

st_crs("EPSG:2263")


Get shapefile from NYC OpenData. The school district shapefile data can be found here: [https://data.cityofnewyork.us/Education/School-Districts/r8nu-ymqj]

Click on the “Export” tab and under “Download Geospatial Data”, click the “Shapefile” hyperlink. This should prompt a download.


Unzip the file and you should have a folder with multiple files inside.

Next we will read the map:

school <- st_read("./data/School Districts")


Plot in ggplot

We will use the RColorBrewer to get our color codes for the 32 school districts.

bp <- brewer.pal(5, "RdYlBu") #get 5 colors from the RdYlBu palette
col <- colorRampPalette(bp) #generates function to create more colors from bp
mycolors <- sample(col(32)) #generate 32 colors and randomly reorder the color codes


Then use ggplot to plot the map and color by school district:

school_map <- ggplot(data=school_sp, aes(fill=as.factor(school_dis)))+
  geom_sf(color="black",show.legend = FALSE)+
  scale_fill_manual(values = mycolors)+
  theme_void()
school_map


Your map should look like this (the colors might be slightly different depending on how the color order was randomized):


Lets plot all the high schools onto the map. Download the csv file from: [https://data.cityofnewyork.us/Education/2019-DOE-High-School-Directory/uq7m-95z8/about_data]

To export the data, click the “Export” button and export all the data as a csv file.


Lets read the csv file into R as a dataframe. Since there are many columns, we will select only those that we are interested in (i.e. Latitude and Longitude column):

hs <- read.csv("./data/2019_DOE_High_School_Directory_20240207.csv") %>%
  select(Latitude,Longitude)


Use the st_as_sf() function to convert the dataframe into a shapefile (sf) object.

hs_sp <- st_as_sf(hs,coords = c("Longitude","Latitude"), crs="+proj=longlat +datum=WGS84") #tell R to look at the columns named "Longitude" and "Latitude" for the longitude and latitude. Note that the order of calling "Longitude" before "Latitude" matters


Create a new object that only contains the geometry value of the shapefile.

clean_hs <- hs_sp %>%
  select(geometry)


Plot using ggplot:

hs_map <- ggplot()+
  geom_sf(data=school_sp,aes(fill=as.factor(school_dis)),color="black",show.legend = FALSE)+
  scale_fill_manual(values = mycolors)+
  geom_sf(data=clean_hs, color="black", size=0.5)+
  theme_minimal()
hs_map


Your map should look like this: