Data Transformation and Analysis Geospatial

Geocoding Addresses Using the Google Maps API

Use the Google Maps API to batch process large lists of addresses into point coordinates.

Ideally, the locations of people’s homes, community centers, national society branches, hospitals, or any other specific point would have coordinates that facilitate mapping, but the reality is that many datasets store locations as plain text strings. If you plug those addresses into a search engine or mapping platform, you can find the coordinates, but what if you need to do this at scale?

Geocoding—the process of converting an address into coordinates—can be done with a variety of APIs. The larger the database of addresses, the more likely we are to find a match, and that’s why this guide uses Google. As you’ll see below, there are considerations to keep in mind before choosing Google, like cost and privacy, but for the real-world example I walk through here, you’ll see that leveraging the vast scope of their database helps with trickier addresses.

Scenario

This tutorial documents a real-world task that was requested as part of the 2023 Türkiye Earthquake response. SIMS was given a spreadsheet with addresses of ATMs throughout the country, but no organizations had mapped them all. The dataset included more than 10,000 addresses, so manually copying and pasting each address into Google wasn’t feasible.

In order to speed the process up, I leveraged the Google Maps API to process the data quickly and using only a couple dozen lines of Python code.

Considerations before starting

Geocoding options

As mentioned above, using Google for anything—whether that’s searches, email, Android, Chrome, or in this case, maps—involves certain tradeoffs. We’re leveraging a proprietary database rather than something like OpenStreetMap, a decision which runs against the ethos of SIMS and utilizing open-source and widely accessible tools. Plus, as a Google product, there can be cost considerations for the operation, depending on the scale at which you’re working. There are plenty of other geocoding APIs available, but having tested a few of them for this task, I found that Google matched at nearly triple the rate of the others. So depending on the country where you’re working, different geo data sources may be more accurate and cheaper, and so you should scope out all of your options before committing to Google.

If you decide to use another API, the process here may look slightly different, but depending on your level of familiarity with Python, you may be able to adapt this code to your needs.

Google Developer account

Regardless of the API you choose, you’ll need an API token. These allow the API owner to meter and track your usage of the server, and to manage your permissions of what you can actually access in the database. As we’re using Google for this tutorial, we’ll need to grab a Google Maps API token.

  1. Go to the Google developer console and log in or sign up for a new account.
  2. At the top of the page, if this is a new account, you may see a welcome offer of credit to use. If you see that, click Activate. It’s always nice to have some credit!
  3. Click on “Select a project” and then “New Project”
  4. Give the project a name. For this example, we’ll call it “SIMS Tutorial”.
  5. Once the project is created, go to it by again using “Select a project” then clicking on what you just created.
  6. On the project page, use the navigation pane on the left to select “APIs & Services”, then “Library”. In the search pane, type “Geocode” and hit enter. You should see “Geocoding API” in the search results. Click on the card for it.
  7. On the next page, select “Enable”
  8. If this is a new account, you will be prompted to enter billing information. Google offers a certain amount of API usage each month before charging, so you won’t be charged. If you took advantage of the free credit from the step above, you’ll have even more credit. But to give you a sense of how much credit we’ll realistically need, this task should only “cost” a few dollars of credit. Note: The fact that APIs can cost money based on usage is why it’s so important in a subsequent step when you get your API token that you keep it secret and don’t share with others!
  9. Once you’ve updated your billing, click “APIs and Services”, then “Credentials”. On this screen, click “Create Credentials”, then select “API Key”.
  10. Copy the API key and save it for later. If you plan on dealing with lots of API keys, I recommend saving them securely. There are numerous strategies for this, including saving them in your bash profile on your local machine, a service like Blackbox, or even a password manager. Whatever you do, do not share the key or commit it to any Git repositories.

Local requirements

Before we get going, we need to validate that a few things are already installed on your computer, including Python, the PIP package manager, and a couple of packages that will integrate into the script.

Verify that Python and PIP are installed

See this guide to make sure your Python interpreter is set up correctly and that you have the package manager PIP installed.

Install dependencies

There are two packages that we’re going to use for the script. The first is created by Google itself, and facilitates the communication between our local device and their servers. The second is Pandas, a popular tool for working with and manipulating data. Ideally, you’d complete this tutorial within a virtual environment on your computer. That means you install these libraries in isolated containers on your computer, which provides a number of benefits. Covering how to do this is a bit outside the scope of this tutorial, and but if you’re interested in pursuing this type of support for SIMS in the future, I’d recommend reading up on that process.

So for now, let’s just install these libraries normally, outside of a virtual environment. Open up your command line tool and type: pip install googlemaps pandas and hit enter. You should see a bunch of action as it installs these two packages before giving you a confirmation.

Building the script

We’re now ready to build the script. Open up your preferred code editor (there are dozens of free options, but the most popular is VS Code) and create a new file called geocoder.py, saving it to a new folder on your computer.

For this walkthrough, we’re going to use a smaller version of the actual file that had the locations of the ATMs. This is to save time (the 10,000+ row file took about 15 minutes to process) and API usage. The slimmer version of the CSV can be downloaded here. Save the file inside the same folder where you saved the script above.

To follow along with this code, you can download the completed file here, and here it is for reference:

import googlemaps
import pandas as pd
import csv

# save this script in the same folder as your CSV file, then update the name of the file below
location_data = 'file_name_here.csv'
data = pd.read_csv(location_data)

# paste your API key below, leaving the quotes in place
gmaps = googlemaps.Client(key="<YOUR API KEY HERE>")

list_coordinates = []

counter = 0

for place in data['ADDRESS']:

	temp_dict = {}
	geocode_result = gmaps.geocode(place)
	try:
		temp_dict['lat'] = geocode_result[0]['geometry']['location']['lat']
		temp_dict['lng'] = geocode_result[0]['geometry']['location']['lng']
	except:
		temp_dict['lat'] = 0
		temp_dict['lng'] = 0
	list_coordinates.append(temp_dict)
	counter += 1
	print(counter)

keys = list_coordinates[0].keys()
a_file = open("output.csv", "w")
dict_writer = csv.DictWriter(a_file, keys)
dict_writer.writeheader()
dict_writer.writerows(list_coordinates)
a_file.close()

Import the packages

import googlemaps
import pandas as pd
import csv
  • Lines 1 and 2 import the two dependencies we installed with PIP in the previous section.
  • csv is a built-in module that allows us to work with CSV files, which we’ll export the results to at the end of the script.
location_data = 'file_name_here.csv'
data = pd.read_csv(location_data)
  • Our addresses are stored in the CSV file downloaded above. Change the file_name_here to match the file name you saved to our working folder. We save it to a variable we call location_data
  • We create a variable called data, which uses the Pandas read_csv() method to parse the CSV data, entering the location_data variable.

Create the Google Maps client

This is where the googlemaps package helps us. We could use an HTTP requests package like the aptly-named requests, but Google is doing the hard work for us. The googlemaps.Client() builds the client for us, which is what will handle the sending and receiving of data to and from their server. The only argument it needs is called key, which is where you’ll paste in the API key from the set up process above. Insert the key inside the quotes so that it looks something like this:

gmaps = googlemaps.Client(key="d0g8fg3hhh239fxevq3dfsdadf91")

With this connected, we now have access to addtional methods inside the googlemaps package. The one we’ll be using later is called, appropriately, geocode() which takes in an address as plain text and returns a number of data points for that address. We’ll access geocode() later in the for loop below.

Create empty list

Later in the script we’re going to create a for loop that runs through each address automatically to send the address and receive coordinates back. This list is what will hold that data for us until we’re ready to process it into a new CSV file. Think of this as a temporary bucket to hold our results.

Create a counter

For long runs of scripts like this, it can be helpful to show us where in the process it is. Without printing what’s going on in our terminal window, it can be tough to know how close we are to having the loop complete. What this is doing will make more sense in the next step.

Create a for loop

There are a couple types of loops in Python (and many programming languages), but here we’re using a simple for loop. This means that everything inside of it (the indented code) will run until it reaches the end of that column in the CSV.

for place in data['ADDRESS']:

    temp_dict = {}
    geocode_result = gmaps.geocode(place)
    try:
        temp_dict['lat'] = geocode_result[0]['geometry']['location']['lat']
        temp_dict['lng'] = geocode_result[0]['geometry']['location']['lng']
    except:
        temp_dict['lat'] = 0
        temp_dict['lng'] = 0
    list_coordinates.append(temp_dict)
    counter += 1
    print(counter)

Establish loop

Line 1 establishes the for loop, and place serves as our iterator. What word you choose here is irrelevant, it simply is used as a reference for each element in whatever we pass it which in this case is each row in the CSV file. Earlier, we created a variable called data, which we then use bracket notation to specify that appropriate column. If you open the CSV file in a spreadsheet program like Excel, you’ll see “ADDRESS” is the column header.

Create dictionary

Line 3 creates a blank dictionary. Notice that when we created the list_coordinates variable, we gave it empty brackets, which tells Python to expect a list. temp_dict has received curly braces, which tells Python to expect a dictionary. Dictionaries are a specific datatype in Python that hold keys and values, separated by a colon. {'city':'Washington DC'} would be a simple dictionary.

Utilize geocode() method and assign results to dictionary

Line 4 creates a variable to hold the result we get back from Google. The gmaps.geocode() is referencing the Google Maps client we created earlier, and place is the iterator we created. To better understand what [0]['geometry']['location']['lat'] and [0]['geometry']['location']['lng'] are doing, let’s look at what a single response from the API looks like. Below is what is returned when I pass “Washington, DC” into the geocode() method:

[
    {
        "address_components": [
            {
                "long_name": "Washington",
                "short_name": "Washington",
                "types": [
                    "locality",
                    "political"
                ]
            },
            {
                "long_name": "District of Columbia",
                "short_name": "District of Columbia",
                "types": [
                    "administrative_area_level_2",
                    "political"
                ]
            },
            {
                "long_name": "District of Columbia",
                "short_name": "DC",
                "types": [
                    "administrative_area_level_1",
                    "political"
                ]
            },
            {
                "long_name": "United States",
                "short_name": "US",
                "types": [
                    "country",
                    "political"
                ]
            }
        ],
        "formatted_address": "Washington, DC, USA",
        "geometry": {
            "bounds": {
                "northeast": {
                    "lat": 38.9958641,
                    "lng": -76.909393
                },
                "southwest": {
                    "lat": 38.7916449,
                    "lng": -77.119759
                }
            },
            "location": {
                "lat": 38.9071923,
                "lng": -77.0368707
            },
            "location_type": "APPROXIMATE",
            "viewport": {
                "northeast": {
                    "lat": 38.9958641,
                    "lng": -76.909393
                },
                "southwest": {
                    "lat": 38.7916449,
                    "lng": -77.119759
                }
            }
        },
        "place_id": "ChIJW-T2Wt7Gt4kRKl2I1CJFUsI",
        "types": [
            "locality",
            "political"
        ]
    }
]

That’s a lot of stuff! But we only need the coordinates. For a large search result like an entire city, in addition to information about the admin stack, we’re also getting back coordinates for the centroid, as well as the four corners of the map it would return. What we want is the key called location, which has another dictionary nested inside of it with lat and lng as keys, and the coordinates as values.

Let’s go back to the script and breakdown what we saw earlier: geocode_result[0]['geometry']['location']['lat']. We’re essentially drilling down into the levels that the JSON return has. Think of this like navigating through a folder structure:

  • geocode_result is the variable that holds the response. The JSON data above is an example of one run of the for loop’s response from Google.
  • [0] essentially drills down a level. The result we get are wrapped in brackets on the first and last lines ([ ]), and subsetting in this way removes them.
  • ['geometry'] is the next level to drill into. After [0] in the previous step, there are now five top levels: address_components, formatted_address, geometry, place_id, and types. We want geometry.
  • ['location'] is the next key we want from what’s nested inside geometry.
  • ['lat'] drills down to our final spot. We repeat this on the next line for ['lng']

To review the following:

for place in data['ADDRESS']:

    temp_dict = {}
    geocode_result = gmaps.geocode(place)
    try:
        temp_dict['lat'] = geocode_result[0]['geometry']['location']['lat']
        temp_dict['lng'] = geocode_result[0]['geometry']['location']['lng']

For each run of the loop, we create a temporary dictionary, send the address from the next line in the CSV to Google and store the result in geocode_result, if we don’t get an error then we navigate through the JSON response to grab the latitude and longitude and assign those numbers as values in the temporary dictionary.

If the “Washington, DC” search example above had been one of the runs of the for loop, it would have assigned the coordinates to the temp_dict like this: {'lat': 38.9071923, 'lng': -77.0368707}.

Try and except blocks

try and its partner except are used to handle exceptions, which are errors that a script may encounter. The try is what we want the script to do and if it runs into an error, rather than stopping everything and shutting down the loop, it then moves down to the except block.

     except:
        temp_dict['lat'] = 0
        temp_dict['lng'] = 0

In the case of our geocoding here, an exception may be raised if Google can’t find a particular address. When that exception occurs, we simply pass 0 for the latitude and longitude (also known as “Null Island”!) which we can then deal with in Excel later on.

Append temporary dictionary to a list

After either finding the coordinates for the address it searched for on each run of the loop (or not finding them and having zeros assigned), we now have a filled in temporary dictionary that looks something like this: {'lat': 38.9071923, 'lng': -77.0368707}. Before moving on to the next address, we need to save these coordinates. list_coordinates.append(temp_dict) references the list_coordinates list we established earlier before entering the for loop, then uses the append() method to insert the temp_dict. So as we move down the list of addresses, this list will continue to grow.

Update and print the counter

For a longer-run of this script with lots of addresses, we’d like to see where we are in the process. counter += 1 references the counter variable established earlier (which we initially set to zero) and increments it up by one. The print(counter) prints the new value of the counter in our console. Think of this like the script alerting you to what row in the CSV it’s working on.

Parse our list of dictionaries and save to CSV

Once the for loop reaches the end of the list of addresses, we will now have a list_coordinates variable with a bunch of dictionaries. If we were to look at that list, it would look something like this:

[
`{'lat': 38.9071923, 'lng': -77.54532134}`,
`{'lat': 39.5467852, 'lng': -77.78754645}`,
`{'lat': 38.6442215, 'lng': -77.45647310}`,
`{'lat': 38.4567666, 'lng': -77.04456766}`,
...
]

The CSV generator looks like this:

keys = list_coordinates[0].keys()
a_file = open("output.csv", "w")
dict_writer = csv.DictWriter(a_file, keys)
dict_writer.writeheader()
dict_writer.writerows(list_coordinates)
a_file.close()

Set keys as the header

We need to create a header for the two columns we’re about to create in the CSV output. list_coordinates[0].keys() looks at the first dictionary inside the list (remember Python indexes starting at zero, so [0] actually means the first dictionary), and the .keys() method takes our keys since each dictionary in the list has the same keys (lat and lng).

Create writeable file

We have Python generate a new CSV file called output.csv. Since we’ve saved our script inside a specific folder, this new file will be generated in the same folder. open() creates the file, output.csv gives it a name, and "w" grants Python write access (rather than read-only)

Use DictWriter to loop over the list of dictionaries

We create a dict_writer variable to hold the csv module’s DictWriter() method, which takes the file we just created and keys we established above as arguments. dict_writer.writeheader() creates the header row, and dict_writer.writerows(list_coordinates) loops over each value in the dictionaries inside list_coordinates, writing a new row for each lat and lng. Finally, we close the file. Open up the folder where the script is and you should see the coordinates. Since it ran through the list of addresses in order, you could simply copy/paste the two column output.csv file we just created into your original address file as two new columns.

Exit mobile version