The GO Platform’s database can be a valuable resource for SIMS members when creating products. By referencing its data, we ensure consistency and quality. One issue we’ve seen arise during SIMS activations is inconsistent use of the host national society’s name. Getting this information, along with other attributes like the national society’s web site, the country’s centroid, FDRS code and more helps connect data with other systems and platforms.
Scenario
While creating a global map of national societies, my team needed to quickly generate a list of all national societies, along with the following for each:
- Official name of the national society
- Which IFRC region it is located in
- National society’s website
- GO country page ID
This is mostly static information, so simply running through this process once and saving the data to a CSV will be sufficient for most use cases. However, as I was creating web applications that needed to reference this information frequently, I also wanted this to be run as a standalone function periodically to check for changes. Furthermore, following this tutorial is a great way to become more familiar with the process of extracting data from the GO Platform’s database—there are dozens of tables of data that you might find useful, and starting here with the country
table is perfect for beginners since the data structure is quite simple.
Local requirements
Before we get going, we need to validate that a few things are already installed on your computer, including Python, the PIP package manager, and a couple of packages that will integrate into the script.
Verify that Python and PIP are installed
See this guide to make sure your Python interpreter is set up correctly and that you have the package manager PIP installed.
Install dependencies
The only additional package that you’ll need here is called requests
. To install it, open your command line tool and type pip install requests
. Ideally, you’d complete this tutorial within a virtual environment on your computer. That means you install these libraries in isolated containers on your computer, which provides a number of benefits. Covering how to do this is a bit outside the scope of this tutorial, and but if you’re interested in pursuing this type of support for SIMS in the future, I’d recommend reading up on that process.
Building the script
We’re now ready to build the script. Open up your preferred code editor (there are dozens of free options, but the most popular is VS Code) and create a new file called get_ns_data.py
, saving it to a new folder on your computer.
Open your new file in your editor. If you want to follow along with the completed script, you can download it here or reference it below:
import math
import csv
# script requires requests library
# prompt user in console to auto pip install it
try:
import requests
except ImportError as e:
response = input("The required module 'requests' is not installed. Would you like to install it now? (yes/no): ")
if response.lower() in ['yes', 'y']:
import sys
import subprocess
# install the module using pip
subprocess.check_call([sys.executable, "-m", "pip", "install", "requests"])
# try importing the module again after installation
import requests
else:
print("The module 'requests' is required to run this script.")
sys.exit(1)
url = 'https://goadmin.ifrc.org/api/v2/country/'
r = requests.get(url).json()
current_page = 1
page_count = int(math.ceil(r['count'] / 50))
print(f"THE PAGE COUNT TOTAL IS: {page_count}")
output = []
while current_page <= page_count:
print(f'Getting data from: {current_page}')
for result in r['results']:
temp_dict = {}
temp_dict['country_name'] = result['name']
temp_dict['iso2'] = result['iso']
temp_dict['iso3'] = result['iso3']
temp_dict['go_id'] = result['id']
temp_dict['society_name'] = result['society_name']
temp_dict['society_url'] = result['society_url']
temp_dict['fdrs_url'] = result['url_ifrc']
output.append(temp_dict)
if r['next']:
next_page = requests.get(r['next']).json()
r = next_page
current_page += 1
else:
break
keys = output[0].keys()
a_file = open("output.csv", "w")
dict_writer = csv.DictWriter(a_file, keys)
dict_writer.writeheader()
dict_writer.writerows(output)
a_file.close()
Import the packages
import math
import csv
# script requires requests library
# prompt user in console to auto pip install it
try:
import requests
except ImportError as e:
response = input("The required module 'requests' is not installed. Would you like to install it now? (yes/no): ")
if response.lower() in ['yes', 'y']:
import sys
import subprocess
# install the module using pip
subprocess.check_call([sys.executable, "-m", "pip", "install", "requests"])
# try importing the module again after installation
import requests
else:
print("The module 'requests' is required to run this script.")
sys.exit(1)
As mentioned in the scenario section above, this is an unusual way of handling imports because I wanted to make this more straightforward for non-Python users to run from their computer. If you feel comfortable running pip install
on your computer or inside a virtual environment, you can simply replace the above code with this:
import math
import csv
import requests
math
andcsv
are part of the Python standard library (which means they come bundled inside your Python installation), but they still need to be imported separately.requests
is a popular library for sending HTTP requests. We’ll be interacting with the GO Platform’s database with methods from this library.
Define URL and GET request
url = 'https://goadmin.ifrc.org/api/v2/country/'
r = requests.get(url).json()
url
establishes the endpoint of the resource we want. A database like the GO Platform will divide data into tables (e.g.emergency
,surge_alerts
, etc.), and we are querying thecountry
table here.r
is defined with arequests
library method calledget
, which accepts the URL above, and.json()
has the data come back in that format.
Create page flipper and empty list
current_page = 1
page_count = int(math.ceil(r['count'] / 50))
print(f"THE PAGE COUNT TOTAL IS: {page_count}")
output = []
current_page
is a simple variable we create to flip through the pages in the API. We’ll see more about what this is doing later in the script.page_count
helps the user see how many pages they can expect to get back from the API. Many servers will fragment results by page to reduce load, meaning results will be spread out across multiple URLs. By counting the results and dividing by 50 (the number of results the GO Platform database returns at a time), we find the total number of pages to expect. Theprint
on the following line returns that value to the end user in their console.output
is an empty list. The loop in the next section will run through each result and build a dictionary, and each one of those dictionaries will be appended to this list for processing later.
Loop through results
Each result in the country
table will now be returned in JSON format. To understand what the data looks like, here’s the root data and the first entry (Afghanistan):
{
"count": 281,
"next": "https://goadmin.ifrc.org/api/v2/country/?limit=50&offset=50",
"previous": null,
"results": [
{
"iso": "AF",
"iso3": "AFG",
"society_url": "http://www.arcs.org.af/",
"region": 2,
"key_priorities": null,
"inform_score": null,
"id": 14,
"url_ifrc": "https://www.ifrc.org/national-societies-directory/afghan-red-crescent",
"record_type": 1,
"record_type_display": "Country",
"bbox": {
"type": "Polygon",
"coordinates": [
[
[
60.503889000065236,
29.377476867128088
],
[
74.87943118675915,
29.377476867128088
],
[
74.87943118675915,
38.48893683918417
],
[
60.503889000065236,
38.48893683918417
],
[
60.503889000065236,
29.377476867128088
]
]
]
},
"centroid": {
"type": "Point",
"coordinates": [
67.709953,
33.93911
]
},
"independent": true,
"is_deprecated": false,
"fdrs": "DAF001",
"society_name": "Afghan Red Crescent Society",
"name": "Afghanistan",
"overview": null,
"translation_module_original_language": "en"
},
...
JSON data is stored as keys and values. For example, the iso3
is a key here, and AFG
is the value. You can also nest additional sets of keys and values inside a value. For instance, the bbox
key has type
and coordinates
nested inside.
Now let’s walk through the code for the loop:
while current_page <= page_count:
print(f'Getting data from: {current_page}')
for result in r['results']:
temp_dict = {}
temp_dict['country_name'] = result['name']
temp_dict['iso2'] = result['iso']
temp_dict['iso3'] = result['iso3']
temp_dict['go_id'] = result['id']
temp_dict['society_name'] = result['society_name']
temp_dict['society_url'] = result['society_url']
temp_dict['fdrs_url'] = result['url_ifrc']
output.append(temp_dict)
if r['next']:
next_page = requests.get(r['next']).json()
r = next_page
current_page += 1
else:
break
while current_page <= page_count
isn’t, strictly speaking, the cleanest way of creating our loop. Afor
loop would run until it can’t find any more data to iterate over, and awhile
loop like this runs until a certain condition you specify is no longer true. Afor
loop would suffice here, but I usewhile
here just to illustrate how we can usecurrent_page
andpage_count
as variables. In your own use, feel free to simply use afor
loop.print
uses an f-string to concatenate that test with the page number. This script should not take more than a few seconds to run, but having a print statement like this can be useful to end users as a sort of progress bar. As each page is flipped, it will print out “Getting data from 1″, then …”from 2”, and so on.for result in r['results']
establishes an iterator withresult
(which we will use to reference each run of the loop.r
refers back to therequests.get(url).json()
line above thefor
loop, which is where the actual raw, unfiltered data will be stored as our script interacts with the database, and the['results']
notation is telling it to look at that particular key. The other keys we could access at the root level arecount
,next
, andprevious
. See line 4 of the JSON printout at the top of this section to help visualize what this is referring to.temp_dict{}
establishes an empty dictionary for each run of the loop. That means there’s an empty dictionary waiting for us to populate it. On the following lines wherever you seetemp_dict
followed by square brackets with some text inside quotes means that we are establishing a new key and naming it. Then, each= result
followed by square brackets and some test inside quotes is setting that key we created to this result. For example,temp_dict['country_name'] = result['name']
first creates a key in the dictionary calledcountry_name
, then sets it equal to whatever the value in the API calledname
is. Note that for many othertemp_dict
lines I’ve used the same string for both—in these cases, that’s because I found the name of the key in the API to be descriptive enough for our purposes (e.g.society_name
). In the case of line five above, though,name
isn’t clear what it might be referring to, but that’s what the GO developers named it.output.append(temp_dict)
takes the seven keys and their values we established in the preceding lines and tacks them onto the list calledoutput
we created earlier. That means that list will, over time, start to look like like this:
[
{'country_name': 'Afghanistan', 'iso2': 'AF', 'iso3': 'AFG', 'go_id': 14, 'society_name': 'Afghan Red Crescent Society', 'society_url': 'http://www.arcs.org.af/', 'fdrs_url': 'https://www.ifrc.org/national-societies-directory/afghan-red-crescent'},
{'country_name': 'Albania', 'iso2': 'AL', 'iso3': 'ALB', 'go_id': 15, 'society_name': 'Albanian Red Cross', 'society_url': 'http://www.kksh.org.al/', 'fdrs_url': 'https://www.ifrc.org/national-societies-directory/albanian-red-cross'},
...
]
if r['next']
helps with pagination. When the script has gotten through the first 50 entries in page 1, it will now look for whether or not there’s anext
key in the JSON. This would be a URL, similar to the one at the top of the script that we specified, but slightly modified to skip the first 50 entries from page one. That URL, if it exists, means there is another page available, and would look something like this:https://goadmin.ifrc.org/api/v2/country/?limit=50&offset=50
. If it does exist, we override the original URL with this new one, then run through the loop all over again. We also increment up thecurrent_page
value so our print statement will tell us what page we’re on.else
simply ends the loop if there is nor['next']
value.
Save results to CSV
keys = output[0].keys()
a_file = open("output.csv", "w")
dict_writer = csv.DictWriter(a_file, keys)
dict_writer.writeheader()
dict_writer.writerows(output)
a_file.close()
At the end of the loop, we will have an output
variable that is a list with a bunch of dictionaries inside. Let’s save it to a CSV file on our computer.
keys
references the keys in each dictionary.output
is currently a list of dictionaries, but we want to save this to a CSV to make it easier to read and integrate with other tools like Excel.output[0]
specifies the first dictionary in that list of dictionaries (in our case, this means it is just looking at the Afghanistan dictionary), and.keys()
specifies that we just want the keys. These keys (e.g.country_name
,iso2
, etc.) will become the column headers in our CSV. Why only look at the first entry to get those keys? Because each dictionary in our list of dictionaries has the same seven keys.a_file
uses a context manager to open a file calledoutput.csv
. You can rename this to be more descriptive if you’d like, to something likeGO_countries.csv
. Thew
as the second argument gives the script permission to write to the file. If there is nooutput.csv
(or whatever you choose to call it) in the directory where you are running this, it will create it. If it does exist already, it will be overwritten each time you run the script.dict_writer
creates an instance of theDictWriter
class from the CSV module. The two arguments specify first what the file it will send results to, and the keys.dict_writer.writeheader()
is a method (which we’ve associated with the instance we created above) to create a header for our CSV. Remember that the header, or first line of the spreadsheet, are the keys we grabbed from the first dictionary.dict_writer.writerows(output)
is another method that transcribes each value fromoutput
. That means it will essentially build a loop similar to what we created to grab data from the database, except it is handling all of it for us in the background. It will create a row for each dictionary and save the values to the cells under the appropriate column.a_file.close()
finishes the editing process on the file and closes it so that it can be safely opened by end user in their own spreadsheet application.
Note that if you aren’t getting a file generated, it might mean you didn’t save the script where you thought you did. Make sure to locate where the file is saved, because that folder/directory is where the CSV will be saved. If you are running this from your desktop, then the CSV will appear on the desktop too.
Conclusion
You will now have a CSV file that looks like this:
Since the GO database includes geographic information for territories and regions (not just countries), we also see some columns where there is missing data. That’s because some values in the database exist in order to power components of the GO Platform, but aren’t actually countries. For example, you can see the “Africa Region” there, but no iso2, iso3, society URL or FDRS URL. If you want to use this data elsewhere but only want countries that have a national society to appear, you can use a filter on the iso2 column to filter out blanks.
As mentioned above, a process to download this data in such an involved way may seem like overkill for a dataset that isn’t going to change that often. However, this guide walks you through a process that can be tweaked to query other parts of the GO Platform’s API. Happy coding!
1 Comment