Data Transformation and Analysis Geospatial

Reverse Geocoding Coordinates Using the OpenCage API

How to reverse geocode a list of coordinates at scale using the OpenCage API.

SIMS members frequently deal with large sets of coordinate data which are collected from assessments, registrations, and other mobile data collection processes. Those points may be mapped for visualization purposes or analyzed in order to gain deeper insights into the dataset. But you may find yourself in a situation where we need to enrich the coordinate data by pulling additional attributes about each point. This guide covers how to use Python to pass a list of coordinates and get back information for each, quickly and at scale.

Getting Started

This guide assumes that you have Python correctly configured on your computer. If you aren’t sure, see this guide to check.

We’ll be making use of a Python package called Geopy, a powerful geocoding client which will handle the heavy lifting of communicating with the geo database, and OpenCage, an open data API. We’ll also use the popular Pandas package for reading the csv we pass the script.

To install Geopy and Pandas, open your terminal and run pip install geopy pandas (if you have either of these packages already, your computer will simply skip the installation). It’s always a good idea to install packages in project-specific virtual environments, but that is beyond the scope of this guide. If you’re not familiar with virtual environments, a number of articles are available with a quick internet search.

Next, you’ll need to get an API key from OpenCage. Free accounts allow up to 25,000 calls per month, which should be more than enough for most use cases. Should the operation you’re supporting require more than that, the paid tiers are reasonable. To get your API key, sign up on OpenCage’s site, select Geocoding API, and save the string.

Lastly, we’re assuming you have a csv file with your coordinates broken up into two separate columns labeled Latitude and Longitude (case sensitive). If you’d like to follow along with this tutorial with some dummy data, you can download this file I created as an example. If you’re using your own spreadsheet, you can have as many other columns as you want (no need to remove anything), as long as you’ve correctly labeled the coordinate columns as described above.

Building the Script

We’re ready to create the script. Create a new folder on your computer for this project, and save a new Python file inside of it. You should also save a copy of your coordinate data csv in the same folder.

To follow along with this code, you can download the completed file here, and here it is for reference:

import csv
import pandas as pd
from geopy.geocoders import OpenCage

# insert your OpenCage API key, leaving single quotes
geocoder = OpenCage('<YOUR_API_KEY_HERE>')

# load locations with pandas
location_data = 'largest_cities_coordinates.csv'
data = pd.read_csv(location_data)

# create placeholder list and counter
list_admin_info = []
counter = 0

# loop over each location and extract whatever available admin data API returns
for index, row in data.iterrows():
    try:
        temp_dict = {}
        latitude = row['Latitude']
        longitude = row['Longitude']
        location = geocoder.reverse((latitude, longitude), exactly_one=True)
        temp_dict['index'] = counter + 1
        temp_dict['lat'] = latitude
        temp_dict['lon'] = longitude
        temp_dict['admin1'] = location.raw['components'].get('state', 'N/A')
        temp_dict['admin2'] = location.raw['components'].get('city', 'N/A')
        temp_dict['admin3'] = location.raw['components'].get('neighbourhood', 'N/A')
        temp_dict['confidence'] = location.raw.get('confidence', 'N/A')
        list_admin_info.append(temp_dict)
    except:
        pass
    finally:
        counter += 1

# save output to new csv - change name as needed
keys = list_admin_info[0].keys()
a_file = open("output.csv", "w")
dict_writer = csv.DictWriter(a_file, keys)
dict_writer.writeheader()
dict_writer.writerows(list_admin_info)
a_file.close()

Let’s walk through each part of the script above:

Import the packages

import csv
import pandas as pd
from geopy.geocoders import OpenCage

The first three lines make our modules accessible to the script. import csv refers to a module that’s part of the Python Standard Library, and so it does not need to be installed separately. We alias pandas as pd, which is standard convention. And from the geopy module, we grab the available OpenCage as a component of the geocoders submodule.

Link OpenCage API

geocoder = OpenCage('<YOUR_API_KEY_HERE>')

Paste your API key here, leaving the quotes. Your code should look something like geocoder = OpenCage('df05hdfd8hl1dhzpp4')

Open and link the locations

location_data = 'largest_cities_coordinates.csv'
data = pd.read_csv(location_data)

If you’re following along using the file provided above with some dummy data to test, you can leave the name of the file in line 1 above. Otherwise, change it to match the name of your file. The second line uses Pandas to read the file.

Create a placeholder list and counter

list_admin_info = []
counter = 0

The first line creates an empty list. We’ll later append our data to the list in order to make it easier to save to our output file. The counter is used to create an index in our output file. Optionally, you can use the counter with a print() statement to keep tabs on where in the list you are as the script is running. This can be helpful when batch processing a large dataset in order to understand how much longer the script will run.

Create a for loop

There are a couple types of loops in Python (and many programming languages), but here we’re using a simple for loop. This means that everything inside of it (the indented code) will run until it reaches the end of that column in the CSV.

for index, row in data.iterrows():
    try:
        temp_dict = {}
        latitude = row['Latitude']
        longitude = row['Longitude']
        location = geocoder.reverse((latitude, longitude), exactly_one=True)
        temp_dict['index'] = counter + 1
        temp_dict['lat'] = latitude
        temp_dict['lon'] = longitude
        temp_dict['admin1'] = location.raw['components'].get('state', 'N/A')
        temp_dict['admin2'] = location.raw['components'].get('city', 'N/A')
        temp_dict['admin3'] = location.raw['components'].get('neighbourhood', 'N/A')
        temp_dict['confidence'] = location.raw.get('confidence', 'N/A')
        list_admin_info.append(temp_dict)
    except:
        pass
    finally:
        counter += 1

When we passed location_data to pd.read_csv(), we converted it into a DataFrame, which is a special two-dimensional data structure in pandas. In order to loop through this new format, we use the iterrows() method, which requires two iterators because it returns a sequence of pairs—the index, and the row’s data itself. We’ll only be referencing the row data, hence the appearance of row inside the loop.

Inside the try block, we create a temporary dictionary to hold the keys and values of what we want to extract. For each run of the loop, the script will grab the latitude and longitude of each row, then pass them to the geocoder. The exactly_one=True argument means we only want each API call to return the top result.

Next, each reference to temp_dict is creating a key and naming it what you see inside the brackets, and assigning a value. The first three create a numbered index, save the latitude, and save the longitude. This is useful for when we view our output file, we can see what each result is referring to.

Then we establish keys for admin1 through admin3. The .get() that is appended to each allows us to “safely” search for keys in the location.raw results, meaning if it doesn’t find that key, it will instead use our fallback values, which we call N/A. In addition to the admin values, we also add the confidence value, which is an integer that communicates how sure the API is that it found the correct information. This can be useful after you get your output csv to check for any possible issues.

Lastly, we append each dictionary back to the list_admin_info variable we established before the for loop.

As referenced above, OpenCage API has a confidence value, so rather than simply return an error when it doesn’t find an exact match, it returns the best possible one. For that reason, I haven’t experienced many errors when running this code, but it’s still a good idea to include the except block. We are only including pass inside of it, which basically means “skip that row”, but in practice, you may want to build this block out more with either a print statement, or—even better—create a new dictionary that saves some message saying that it couldn’t find the coordinates. That way, your output file has a record of which coordinates were skipped.

The finally block then increments up the counter by 1.

More on the OpenCage location data

When you see location.raw['components'], the script is drilling into the nested structure of the data that we get back from the API. OpenCage offers a bunch of information beyond what we’re using here. If you want to grab more than just the admin information, you can add those as additional keys. As an example of what other data is available, here is what is returned for coordinates in Auckland, New Zealand:

{
	"annotations": {
		"DMS": {
			"lat": "36° 50' 54.64968'' S",
			"lng": "174° 45' 47.86596'' E"
		},
		"MGRS": "60HUE0057819597",
		"Maidenhead": "RF73jd16oi",
		"Mercator": {
			"x": 19454561.126,
			"y": -4392386.109
		},
		"OSM": {
			"edit_url": "https://www.openstreetmap.org/edit?way=1099299063#map=17/-36.84851/174.76330",
			"note_url": "https://www.openstreetmap.org/note/new#map=17/-36.84851/174.76330&layers=N",
			"url": "https://www.openstreetmap.org/?mlat=-36.84851&mlon=174.76330#map=17/-36.84851/174.76330"
		},
		"UN_M49": {
			"regions": {
				"AUSTRALASIA": "053",
				"NZ": "554",
				"OCEANIA": "009",
				"WORLD": "001"
			},
			"statistical_groupings": [
				"MEDC"
			]
		},
		"callingcode": 64,
		"currency": {
			"alternate_symbols": [
				"NZ$"
			],
			"decimal_mark": ".",
			"disambiguate_symbol": "NZ$",
			"html_entity": "$",
			"iso_code": "NZD",
			"iso_numeric": "554",
			"name": "New Zealand Dollar",
			"smallest_denomination": 10,
			"subunit": "Cent",
			"subunit_to_unit": 100,
			"symbol": "$",
			"symbol_first": 1,
			"thousands_separator": ","
		},
		"flag": "🇳🇿",
		"geohash": "rckq2gftzgnt2wpjnzwr",
		"qibla": 261.2,
		"roadinfo": {
			"drive_on": "left",
			"road": "unnamed road",
			"road_type": "construction",
			"speed_in": "km/h"
		},
		"sun": {
			"rise": {
				"apparent": 1689363180,
				"astronomical": 1689357660,
				"civil": 1689361440,
				"nautical": 1689359520
			},
			"set": {
				"apparent": 1689312060,
				"astronomical": 1689317520,
				"civil": 1689313740,
				"nautical": 1689315660
			}
		},
		"timezone": {
			"name": "Pacific/Auckland",
			"now_in_dst": 0,
			"offset_sec": 43200,
			"offset_string": "+1200",
			"short_name": "NZST"
		},
		"what3words": {
			"words": "lamp.dose.friday"
		}
	},
	"bounds": {
		"northeast": {
			"lat": -36.8485122,
			"lng": 174.7637091
		},
		"southwest": {
			"lat": -36.8486322,
			"lng": 174.7632906
		}
	},
	"components": {
		"ISO_3166-1_alpha-2": "NZ",
		"ISO_3166-1_alpha-3": "NZL",
		"ISO_3166-2": [
			"NZ-AUK"
		],
		"_category": "road",
		"_type": "road",
		"city": "Auckland",
		"continent": "Oceania",
		"country": "New Zealand",
		"country_code": "nz",
		"neighbourhood": "Aotea Arts Quarter",
		"postcode": "1010",
		"road": "unnamed road",
		"road_type": "construction",
		"state": "Auckland",
		"state_code": "AUK",
		"suburb": "City Centre"
	},
	"confidence": 9,
	"formatted": "unnamed road, City Centre, Auckland 1010, New Zealand",
	"geometry": {
		"lat": -36.8485138,
		"lng": 174.7632961
	}
}

That’s a lot! So, for example, if you wanted to also get the country code, you would just add temp_dict['iso'] = location.raw['components'].get('country_code', 'N/A'). The rest of the script will still work, as our csv builder at the bottom of the script (see Parse our list of dictionaries and save to CSV section below) can parse a variable number of columns.

Parse our list of dictionaries and save to CSV

keys = list_admin_info[0].keys()
a_file = open("output.csv", "w")
dict_writer = csv.DictWriter(a_file, keys)
dict_writer.writeheader()
dict_writer.writerows(list_admin_info)
a_file.close()

Once the for loop reaches the end of the list of addresses, we will now have a list_admin_info variable with a bunch of dictionaries. 

Set keys as the header

We need to create a header for the two columns we’re about to create in the CSV output. 

list_admin_info[0].keys() looks at the first dictionary inside the list (remember Python indexes starting at zero, so [0] actually means the first dictionary), and the .keys() method takes our keys since each dictionary in the list has the same keys.

Create writeable file

We have Python generate a new CSV file called output.csv. Since we’ve saved our script inside a specific folder, this new file will be generated in the same folder. open() creates the file, output.csv gives it a name, and "w" grants Python write access (rather than read-only)

Use DictWriter to loop over the list of dictionaries

We create a dict_writer variable to hold the csv module’s DictWriter() method, which takes the file we just created and keys we established above as arguments. dict_writer.writeheader() creates the header row, and dict_writer.writerows(list_admin_info) loops over each value in the dictionaries inside list_admin_info, writing a new row for each. Finally, we close the file. Open up the folder where the script is and you should see the coordinates. Since it ran through the list of addresses in order, you could simply copy/paste the two column output.csv file we just created into your original address file as two new columns.

Check Result

Your output.csv file should appear in the same folder as where you ran the script from. Here’s the output using the dummy data shared at the beginning of this guide.

Exit mobile version