Getting Country Data from the GO Platform via the API
Data Transformation and Analysis

Getting Country Data from the GO Platform via the API

The GO Platform’s database can be a valuable resource for SIMS members when creating products. By referencing its data, we ensure consistency and quality. One issue we’ve seen arise during SIMS activations is inconsistent use of the host national society’s name. Getting this information, along with other attributes like the national society’s web site, the country’s centroid, FDRS code and more helps connect data with other systems and platforms.

Scenario

While creating a global map of national societies, my team needed to quickly generate a list of all national societies, along with the following for each:

  • Official name of the national society
  • Which IFRC region it is located in
  • National society’s website
  • GO country page ID

This is mostly static information, so simply running through this process once and saving the data to a CSV will be sufficient for most use cases. However, as I was creating web applications that needed to reference this information frequently, I also wanted this to be run as a standalone function periodically to check for changes. Furthermore, following this tutorial is a great way to become more familiar with the process of extracting data from the GO Platform’s database—there are dozens of tables of data that you might find useful, and starting here with the country table is perfect for beginners since the data structure is quite simple.

Local requirements

Before we get going, we need to validate that a few things are already installed on your computer, including Python, the PIP package manager, and a couple of packages that will integrate into the script.

Verify that Python and PIP are installed

See this guide to make sure your Python interpreter is set up correctly and that you have the package manager PIP installed.

Install dependencies

The only additional package that you’ll need here is called requests. To install it, open your command line tool and type pip install requests. Ideally, you’d complete this tutorial within a virtual environment on your computer. That means you install these libraries in isolated containers on your computer, which provides a number of benefits. Covering how to do this is a bit outside the scope of this tutorial, and but if you’re interested in pursuing this type of support for SIMS in the future, I’d recommend reading up on that process.

Building the script

We’re now ready to build the script. Open up your preferred code editor (there are dozens of free options, but the most popular is VS Code) and create a new file called get_ns_data.py, saving it to a new folder on your computer.

Open your new file in your editor. If you want to follow along with the completed script, you can download it here or reference it below:

import math
import csv

# script requires requests library
# prompt user in console to auto pip install it
try:
	import requests
except ImportError as e:
	response = input("The required module 'requests' is not installed. Would you like to install it now? (yes/no): ")
	if response.lower() in ['yes', 'y']:
		import sys
		import subprocess
		
		# install the module using pip
		subprocess.check_call([sys.executable, "-m", "pip", "install", "requests"])
		
		# try importing the module again after installation
		import requests
	else:
		print("The module 'requests' is required to run this script.")
		sys.exit(1)

url = 'https://goadmin.ifrc.org/api/v2/country/'
r = requests.get(url).json()

current_page = 1
page_count = int(math.ceil(r['count'] / 50))
print(f"THE PAGE COUNT TOTAL IS: {page_count}")

output = []

while current_page <= page_count:
	print(f'Getting data from: {current_page}')
	for result in r['results']:
		temp_dict = {}
		temp_dict['country_name'] = result['name']
		temp_dict['iso2'] = result['iso']
		temp_dict['iso3'] = result['iso3']
		temp_dict['go_id'] = result['id']
		temp_dict['society_name'] = result['society_name']
		temp_dict['society_url'] = result['society_url']
		temp_dict['fdrs_url'] = result['url_ifrc']
		output.append(temp_dict)
	
	if r['next']:
		next_page = requests.get(r['next']).json()
		r = next_page
		current_page += 1
	else:
		break
	
keys = output[0].keys()
a_file = open("output.csv", "w")
dict_writer = csv.DictWriter(a_file, keys)
dict_writer.writeheader()
dict_writer.writerows(output)
a_file.close()

Import the packages

import math
import csv

# script requires requests library
# prompt user in console to auto pip install it
try:
	import requests
except ImportError as e:
	response = input("The required module 'requests' is not installed. Would you like to install it now? (yes/no): ")
	if response.lower() in ['yes', 'y']:
		import sys
		import subprocess
		
		# install the module using pip
		subprocess.check_call([sys.executable, "-m", "pip", "install", "requests"])
		
		# try importing the module again after installation
		import requests
	else:
		print("The module 'requests' is required to run this script.")
		sys.exit(1)

As mentioned in the scenario section above, this is an unusual way of handling imports because I wanted to make this more straightforward for non-Python users to run from their computer. If you feel comfortable running pip install on your computer or inside a virtual environment, you can simply replace the above code with this:

import math
import csv
import requests
  • math and csv are part of the Python standard library (which means they come bundled inside your Python installation), but they still need to be imported separately.
  • requests is a popular library for sending HTTP requests. We’ll be interacting with the GO Platform’s database with methods from this library.

Define URL and GET request

url = 'https://goadmin.ifrc.org/api/v2/country/'
r = requests.get(url).json()
  • url establishes the endpoint of the resource we want. A database like the GO Platform will divide data into tables (e.g. emergency, surge_alerts, etc.), and we are querying the country table here.
  • r is defined with a requests library method called get, which accepts the URL above, and .json() has the data come back in that format.

Create page flipper and empty list

current_page = 1
page_count = int(math.ceil(r['count'] / 50))
print(f"THE PAGE COUNT TOTAL IS: {page_count}")

output = []
  • current_page is a simple variable we create to flip through the pages in the API. We’ll see more about what this is doing later in the script.
  • page_count helps the user see how many pages they can expect to get back from the API. Many servers will fragment results by page to reduce load, meaning results will be spread out across multiple URLs. By counting the results and dividing by 50 (the number of results the GO Platform database returns at a time), we find the total number of pages to expect. The print on the following line returns that value to the end user in their console.
  • output is an empty list. The loop in the next section will run through each result and build a dictionary, and each one of those dictionaries will be appended to this list for processing later.

Loop through results

Each result in the country table will now be returned in JSON format. To understand what the data looks like, here’s the root data and the first entry (Afghanistan):

{
    "count": 281,
    "next": "https://goadmin.ifrc.org/api/v2/country/?limit=50&offset=50",
    "previous": null,
    "results": [
        {
            "iso": "AF",
            "iso3": "AFG",
            "society_url": "http://www.arcs.org.af/",
            "region": 2,
            "key_priorities": null,
            "inform_score": null,
            "id": 14,
            "url_ifrc": "https://www.ifrc.org/national-societies-directory/afghan-red-crescent",
            "record_type": 1,
            "record_type_display": "Country",
            "bbox": {
                "type": "Polygon",
                "coordinates": [
                    [
                        [
                            60.503889000065236,
                            29.377476867128088
                        ],
                        [
                            74.87943118675915,
                            29.377476867128088
                        ],
                        [
                            74.87943118675915,
                            38.48893683918417
                        ],
                        [
                            60.503889000065236,
                            38.48893683918417
                        ],
                        [
                            60.503889000065236,
                            29.377476867128088
                        ]
                    ]
                ]
            },
            "centroid": {
                "type": "Point",
                "coordinates": [
                    67.709953,
                    33.93911
                ]
            },
            "independent": true,
            "is_deprecated": false,
            "fdrs": "DAF001",
            "society_name": "Afghan Red Crescent Society",
            "name": "Afghanistan",
            "overview": null,
            "translation_module_original_language": "en"
        },
...

JSON data is stored as keys and values. For example, the iso3 is a key here, and AFG is the value. You can also nest additional sets of keys and values inside a value. For instance, the bbox key has type and coordinates nested inside.

Now let’s walk through the code for the loop:

while current_page <= page_count:
	print(f'Getting data from: {current_page}')
	for result in r['results']:
		temp_dict = {}
		temp_dict['country_name'] = result['name']
		temp_dict['iso2'] = result['iso']
		temp_dict['iso3'] = result['iso3']
		temp_dict['go_id'] = result['id']
		temp_dict['society_name'] = result['society_name']
		temp_dict['society_url'] = result['society_url']
		temp_dict['fdrs_url'] = result['url_ifrc']
		output.append(temp_dict)
	
	if r['next']:
		next_page = requests.get(r['next']).json()
		r = next_page
		current_page += 1
	else:
		break
  • while current_page <= page_count isn’t, strictly speaking, the cleanest way of creating our loop. A for loop would run until it can’t find any more data to iterate over, and a while loop like this runs until a certain condition you specify is no longer true. A for loop would suffice here, but I use while here just to illustrate how we can use current_page and page_count as variables. In your own use, feel free to simply use a for loop.
  • print uses an f-string to concatenate that test with the page number. This script should not take more than a few seconds to run, but having a print statement like this can be useful to end users as a sort of progress bar. As each page is flipped, it will print out “Getting data from 1″, then …”from 2”, and so on.
  • for result in r['results'] establishes an iterator with result (which we will use to reference each run of the loop. r refers back to the requests.get(url).json() line above the for loop, which is where the actual raw, unfiltered data will be stored as our script interacts with the database, and the ['results'] notation is telling it to look at that particular key. The other keys we could access at the root level are count, next, and previous. See line 4 of the JSON printout at the top of this section to help visualize what this is referring to.
  • temp_dict{} establishes an empty dictionary for each run of the loop. That means there’s an empty dictionary waiting for us to populate it. On the following lines wherever you see temp_dict followed by square brackets with some text inside quotes means that we are establishing a new key and naming it. Then, each = result followed by square brackets and some test inside quotes is setting that key we created to this result. For example, temp_dict['country_name'] = result['name'] first creates a key in the dictionary called country_name, then sets it equal to whatever the value in the API called name is. Note that for many other temp_dict lines I’ve used the same string for both—in these cases, that’s because I found the name of the key in the API to be descriptive enough for our purposes (e.g. society_name). In the case of line five above, though, name isn’t clear what it might be referring to, but that’s what the GO developers named it.
  • output.append(temp_dict) takes the seven keys and their values we established in the preceding lines and tacks them onto the list called output we created earlier. That means that list will, over time, start to look like like this:
[
	{'country_name': 'Afghanistan', 'iso2': 'AF', 'iso3': 'AFG', 'go_id': 14, 'society_name': 'Afghan Red Crescent Society', 'society_url': 'http://www.arcs.org.af/', 'fdrs_url': 'https://www.ifrc.org/national-societies-directory/afghan-red-crescent'}, 
	{'country_name': 'Albania', 'iso2': 'AL', 'iso3': 'ALB', 'go_id': 15, 'society_name': 'Albanian Red Cross', 'society_url': 'http://www.kksh.org.al/', 'fdrs_url': 'https://www.ifrc.org/national-societies-directory/albanian-red-cross'},
...
]
  • if r['next'] helps with pagination. When the script has gotten through the first 50 entries in page 1, it will now look for whether or not there’s a next key in the JSON. This would be a URL, similar to the one at the top of the script that we specified, but slightly modified to skip the first 50 entries from page one. That URL, if it exists, means there is another page available, and would look something like this: https://goadmin.ifrc.org/api/v2/country/?limit=50&offset=50. If it does exist, we override the original URL with this new one, then run through the loop all over again. We also increment up the current_page value so our print statement will tell us what page we’re on.
  • else simply ends the loop if there is no r['next'] value.

Save results to CSV

keys = output[0].keys()
a_file = open("output.csv", "w")
dict_writer = csv.DictWriter(a_file, keys)
dict_writer.writeheader()
dict_writer.writerows(output)
a_file.close()

At the end of the loop, we will have an output variable that is a list with a bunch of dictionaries inside. Let’s save it to a CSV file on our computer.

  • keys references the keys in each dictionary. output is currently a list of dictionaries, but we want to save this to a CSV to make it easier to read and integrate with other tools like Excel. output[0] specifies the first dictionary in that list of dictionaries (in our case, this means it is just looking at the Afghanistan dictionary), and .keys() specifies that we just want the keys. These keys (e.g. country_name, iso2, etc.) will become the column headers in our CSV. Why only look at the first entry to get those keys? Because each dictionary in our list of dictionaries has the same seven keys.
  • a_file uses a context manager to open a file called output.csv. You can rename this to be more descriptive if you’d like, to something like GO_countries.csv. The w as the second argument gives the script permission to write to the file. If there is no output.csv (or whatever you choose to call it) in the directory where you are running this, it will create it. If it does exist already, it will be overwritten each time you run the script.
  • dict_writer creates an instance of the DictWriter class from the CSV module. The two arguments specify first what the file it will send results to, and the keys.
  • dict_writer.writeheader() is a method (which we’ve associated with the instance we created above) to create a header for our CSV. Remember that the header, or first line of the spreadsheet, are the keys we grabbed from the first dictionary.
  • dict_writer.writerows(output) is another method that transcribes each value from output. That means it will essentially build a loop similar to what we created to grab data from the database, except it is handling all of it for us in the background. It will create a row for each dictionary and save the values to the cells under the appropriate column.
  • a_file.close() finishes the editing process on the file and closes it so that it can be safely opened by end user in their own spreadsheet application.

Note that if you aren’t getting a file generated, it might mean you didn’t save the script where you thought you did. Make sure to locate where the file is saved, because that folder/directory is where the CSV will be saved. If you are running this from your desktop, then the CSV will appear on the desktop too.

Conclusion

You will now have a CSV file that looks like this:

Snapshot of a CSV output from this process.

Since the GO database includes geographic information for territories and regions (not just countries), we also see some columns where there is missing data. That’s because some values in the database exist in order to power components of the GO Platform, but aren’t actually countries. For example, you can see the “Africa Region” there, but no iso2, iso3, society URL or FDRS URL. If you want to use this data elsewhere but only want countries that have a national society to appear, you can use a filter on the iso2 column to filter out blanks.

As mentioned above, a process to download this data in such an involved way may seem like overkill for a dataset that isn’t going to change that often. However, this guide walks you through a process that can be tweaked to query other parts of the GO Platform’s API. Happy coding!

Leave a Reply

Your email address will not be published. Required fields are marked *