With summer finally arriving, I wanted to find out where a good place for enjoying a nice chilled drink (alcoholic or non-alcoholic) outside would be in my hometown Edinburgh. So I combined an open data set about chair permits with some geo-coding and created an interactive map of places with outside seating in Edinburgh.

Background and Project Description

During the last few years, UK government has been working on open sourcing its data and Edinburgh City Council is no exception. At https://edinburghopendata.info, you can find a list of data sets containing information about many aspects of public life (event though some files could admittedly do with some updating). For example this page hosts a file containing details about chair and table permits for the year 2014. Luckily, an up-to-date version can be found here. Note that while the file structure is structurally the same for both files, the headers are different, so if you want to look at the historic data, you will need to adapt the code below accordingly. The file contains names and addresses of premises which have permission to put out chairs as well as some additional information. This file forms the basis of this project, which is divided into four parts:

  • get and load the permit file
  • use the open street map API to get the latitude and longitudes for each establishment as well as premise category
  • clean and bin the premise categories
  • plot the premises on a map using folium

Without further ado, let’s get started. The full notebook can be found on my GitHub.

Step 0: Setting up

First, we import the libraries.

import pandas as pd
import requests
import wget

import folium
from folium.plugins import MarkerCluster

Step 1: Getting Data

We use wget to download the file and read it into a pandas data frame. Make sure to set the encoding since the file contains special characters (lots of Cafes on the list).

filename = wget.download("http://www.edinburgh.gov.uk/download/downloads/id/11854/tables_and_chairs_permits.csv")

df0 = pd.read_csv(filename, encoding = "ISO-8859-1")
df0.head()
Permit Premises Name Premises Address Start date End date Ext Table Area
0 5.0 The Standing Order 62-66 George Street 01/04/2019 31/03/2020 22:00 6.0m x 1.5m & 6.0m x 1.2m
1 6.0 Bushwick Bar and Grill 7-11 East London Street 01/06/2019 01/09/2019 22:00 7.85m x 1.8m and 2.5m x 1.8m
2 7.0 La Barantine 202 Bruntsfield Place 01/01/2019 31/12/2019 NaN 2.0m x 1.2m
3 10.0 The Inverleith 10 Bowhill Terrace 18/04/2019 31/08/2019 NaN 4.1m x 3.1m
4 12.0 Angels Share Hotel 7-11 Hope Street 01/05/2019 30/04/2020 22:00 4.8m x 1.4m and 1.5m x 1.4m and 1.5m x 1.4m

A quick look at the data reveals that there are a few duplicates in the data. They are mainly due to multiple permits with different start and end dates. A good way of cleaning would be to filter on dates, but frankly I don’t care that much at this point, so I just keep the premise names and addresses and drop the duplicates. (Note: The file also contains information about the table area, which I might revisit at some point in the future). After dropping the duplicates, we are left with 389 rows with premise names and addresses.

# dropping duplicate entries
df1 = df0.loc[:, ['Premises Name', 'Premises Address']]
df1 = df1.drop_duplicates()

# in 2012: 280
print(df1.shape[0])
389

A remark on the side: In the summer 2014, there were only 280 premises with a chair and table permit. Open air culture is indeed taking off and this is the data to proof it :)

Step 2: Getting latitudes and longitudes for each premise

If we want to visualize premises on a map, addresses are not enough, we need GPS coordinates. There are different APIs, which allow you to query for an address and will return latitudes and longitudes (a process called geocoding. On possibility is to use the Google Maps API, but it comes with caveats. The OpenStreetMap API provides the same functionality, but is free to use and the results are decent enough for my purpose.

We use the pandas map function to obtain the API response for each row. After querying the API, we drop all rows, where we did get not get a response. Again, I am not too botherered about the few premises (about 20) which I am loosing, there are plenty left.

def query_address(address):
    """Return response from open streetmap.
    
    Parameter:
    address - address of establishment
    
    Returns:
    result - json, response from open street map
    """
    
    url = "https://nominatim.openstreetmap.org/search"
    parameters = {'q':'{}, Edinburgh'.format(address), 'format':'json'}
    
    response = requests.get(url, params=parameters)
    # don't want to raise an error to not stop the processing
    # print address instead for future inspection
    if response.status_code != 200:
        print("Error querying {}".format(address))
        result = {}
    else:
        result = response.json()
    return result
df1['json'] = df1['Premises Address'].map(lambda x: query_address(x))

# drop empty responses
df2 = df1[df1['json'].map(lambda d: len(d)) > 0].copy()
print(df2.shape[0])
374

Looking at the json fields in the response, we find that in addition to the coordinates, the API also returns a field named ‘type’, which contains the type of premise at this address. I add this information to the data frame together with the coordinates.

# extract relevant fields from API response (json format)
df2['lat'] = df2['json'].map(lambda x: x[0]['lat'])
df2['lon'] = df2['json'].map(lambda x: x[0]['lon'])
df2['type'] = df2['json'].map(lambda x: x[0]['type'])

The most frequent premise types are cafes, pubs, restaurants, tertiary and houses:

df2.type.value_counts()[:5]
cafe          84
pub           69
restaurant    66
tertiary      33
house         27
Name: type, dtype: int64

Step 3: Assigning Premise Categories

I am mostly interested in distinguishing between two types of premises: the ones that sell coffee and are more likely to be open during the day (like coffee shops and bakeries) and the ones that sell beer and are more likely to open in the evenings (like pubs and restaurants). I therefore want to sort my premises into three categories:

  • Category 1: day-time places (coffee shops, bakeries, delis, ice-cream)
  • Category 2: pubs, restaurants, fast-food and bars
  • Category 3: everything else

To do this I have two sources of information: the premise name and the type returned by OpenStreetMap. Looking at the data, we find that the type is good first indicator, but also that many places are labeled incorrectly or not at all. I therefor apply a two-step approach: i) Assign the category based on the OpenStreetMap type ii) Clean up the data using its name, where this steps overwrites step i). To clean up the data, I decided to overrule the OpenStreetMap classification if the premise name contains certain key elements (such as ‘cafe’, ‘coffee’ or similar for coffee shops and ‘restaurant’, ‘inn’ or similar for restaurant and pubs). This misclassifies for example Cafe Andaluz as coffee shop, but works decently well in most cases. Particularly it seems to most keep to the pattern of classifying as coffee shops places, which are likely to be open during the day, so it works for my purpose. Of course, with fewer than 400 entries, one could manually go through the list and assign the correct category to each and every one of the entries. However, I am interested in creating a process, which can be easily transfered to other places, therefore a manual intervention specifically taylored to Edinburgh’s scenery is not suitable.

Step 3a: Assigning Premise Categories According to OpenStreetMap Type

def define_category(mytype):
    if mytype in ['cafe', 'bakery', 'deli', 'ice_cream']:
        category = 1
    elif mytype in ['restaurant', 'pub', 'bar', 'fast_food']:
        category = 2
    else:
        category = 3
    return category

# assign category according to OpenStreetMap type
df2['category'] = df2['type'].map(lambda mytype: define_category(mytype))

Step 3b: Overwriting Categories According to Premise Name

def flag_premise(premisename, category):
    """Flag premise according to its name.
    
    Parameter: 
    premisename - str
    
    Returns:
    ans - boolean
    """
    prem = str(premisename).lower()
    if ((category == 'coffeeshop' and ('caf' in prem 
                                       or 'coffee' in prem 
                                       or 'Tea' in str(premisename) 
                                       or 'bake' in prem 
                                       or 'bagel' in prem 
                                       or 'roast' in prem))
         or 
        (category == 'restaurant' and ('restaurant' in prem 
                                       or 'bar ' in prem 
                                       or 'tavern' in prem 
                                       or 'cask' in prem 
                                       or 'pizza' in prem
                                       or 'whisky' in prem
                                       or 'kitchen' in prem
                                       or 'Arms' in str(premisename)
                                       or 'Inn' in str(premisename) 
                                       or 'Bar' in str(premisename)))):
        ans = True
    else:
        ans = False
    return ans

# flag coffee shops and restaurants according to their names
df2['is_coffeeshop'] = df2['Premises Name'].map(lambda x: flag_premise(x, category='coffeeshop'))
df2['is_restaurant'] = df2['Premises Name'].map(lambda x: flag_premise(x, category='restaurant'))

A quick inspection shows that the reassignement seems reasonable:

# show some differences between classification by name and by type returned by the API
df2.loc[(df2.is_coffeeshop) & (df2.type != 'cafe'), ['Premises Name', 'type']].head(10)
Premises Name type
19 One20 Wine Café bar
27 Café Habana theatre
38 Southern Cross Café fast_food
73 Café Rouge restaurant
95 Café Andaluz restaurant
114 The Manna House Bakery tertiary
115 Crumbs Café house
146 Cafe Renroc residential
185 Hard Rock Café tertiary
227 Snax Café house

I reassign the category for the premises flagged as restaurant or coffee-shop. Should a premise have been flagged as both, the coffee shop category takes precedence:

# re-set category if flagged as restaurant or coffeeshop through name
df2.loc[df2.is_restaurant, 'category'] = 2
df2.loc[df2.is_coffeeshop, 'category'] = 1

Step 4: Visualization

Finally, we use Python’s Folium package to visualize our results as markers on a map. Adding the individual points to MarkerClusters allows us to summarize the symbols into groups if too many symbols are in the same region. Creating a separate cluster for each category allows us to use the LayerControl option to toggle each of the categories individually. We use the ‘fa’ prefix to use the font-awesome (instead of the standard glyphicon) symbols.

# central coordinates of Edinburgh
EDI_COORDINATES = (55.953251, -3.188267)
  
# create empty map zoomed in on Edinburgh
map = folium.Map(location=EDI_COORDINATES, zoom_start=12)

# add one markercluster per type to allow for individual toggling
coffeeshops = MarkerCluster(name='coffee shops').add_to(map)
restaurants = MarkerCluster(name='pubs and restaurants').add_to(map)
other = MarkerCluster(name='other').add_to(map)

# add coffeeshops to the map
for chairs in df2[df2.category == 1].iterrows():
    folium.Marker(location=[float(chairs[1]['lat']), float(chairs[1]['lon'])], 
                  popup=chairs[1]['Premises Name'],
                 icon=folium.Icon(color='green', icon_color='white', icon='coffee', angle=0, prefix='fa'))\
    .add_to(coffeeshops)
    
# add pubs and restaurants to the map
for chairs in df2[df2.category == 2].iterrows():
    folium.Marker(location=[float(chairs[1]['lat']), float(chairs[1]['lon'])], 
                  popup=chairs[1]['Premises Name'],
                 icon=folium.Icon(color='blue', icon='glass', prefix='fa'))\
    .add_to(restaurants)
    
# add other to the map
for chairs in df2[df2.category == 3].iterrows():
    folium.Marker(location=[float(chairs[1]['lat']), float(chairs[1]['lon'])], 
                  popup=chairs[1]['Premises Name'],
                 icon=folium.Icon(color='gray', icon='question', prefix='fa'))\
    .add_to(other)
    
# enable toggling of data points
folium.LayerControl().add_to(map)    
    
display(map)

Supplementary Step 5: Saving the map to png

I wanted to have a screen shot of the map to be able to imbed the static version into my Medium post (which does not accept the dynamic version). The best way to get a static version (which is not just taking a screenshot) which I have found is to save the map in HTML format and then use Selenium to save a screenshot of the HTML. This is how this can be done (credits to this stackoverflow post for the Selenium part).

Note: In order to get the following to work you need to install the geckodriver. Download the file from here and put it into /usr/bin/local (for Linux machines).

import os
import time
from selenium import webdriver

# save map
fn = 'beergarden_happiness_map.html'
tmpurl = 'file:///{path}/{mapfile}'.format(path=os.getcwd(),mapfile=fn)
map.save(fn)

# download screenshot of map
delay = 5
browser = webdriver.Firefox()
browser.get(tmpurl)
# give the map tiles some time to load
time.sleep(delay)
browser.save_screenshot('{mapname}.png'.format(mapname=fn.split('.')[0]))
browser.quit()

Summary

In this post we downloaded an open data set containing chair and table permits from the Edinburgh Council. We then used the Open Street Map API to obtain the types and GPS positions for the premises based on their address. After some additional data cleaning based on the premise names, we binned the premises into the three categories “coffee shop”, “pub/restaurant” and “other” and plotted them on an interactive map, which we saved in HTML format and subsequently converted to png format.

Conclusion

We now have a working beer garden and open air coffee shop map of Edinburgh and can enjoy the summer sitting outside with a nice iced-coffee or an ice-cold beer - Prost! :)

Map put to good use: A chilled post-work drink in the sun :)

Leave a Comment