Import GoogleEarth placemarks KML into Geopandas

Get the placemarks of all folders of a KML file into a GeoDataFrame using minidom (Python)

Update: See my Jupyter Notebook for a better version.

While playing with Python and Geopandas, I wanted to import my GoogleEarth placemarks into a Geopandas GeoDataFrame. This was more complicated than expected. In theory, Geopandas can open and parse KML (the format used by GoogleEarth) by simply calling:

geopandas.read_file() 

It is using the library fiona for this task. But this only imports the first folder and my placemarks are organised in many folders. After trying several modules specialising in KML, it turned out easier to parse it myself using the standard Python module minidom. KML is XML anyway. I also created a column with the path of the folder (most of my placemarks are unnamed, the folder names are the most useful information). This is not really fast, but it works.

import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from xml.dom.minidom import *

# Open the KML file

dom = parse('travel.kml')

# Define a function to get the path of a placemark

def subfolders(node):
    if node.parentNode == dom.documentElement:
        return ""
    else:
        foldername = node.getElementsByTagName("name")[0].firstChild.data
        path = subfolders(node.parentNode) + "/" + foldername
    return path

# Parse the DOM of the KML
# For each Placemark, get a tuple of name, lat, long, foldername and path
# Append the tuple to a list of tuples

entries = []
placemarks = dom.getElementsByTagName("Placemark")

for i in placemarks:
    longitude = i.getElementsByTagName("longitude")[0].firstChild.data
    latitude = i.getElementsByTagName("latitude")[0].firstChild.data
    try:
        name = i.getElementsByTagName("name")[0].firstChild.data
    except:
        name = ""
    parent = i.parentNode
    foldername = parent.getElementsByTagName("name")[0].firstChild.data
    path = subfolders(parent) 
    entries.append((name, latitude, longitude, foldername, path)) # List of tuples


df = pd.DataFrame(entries, columns=('name', 'latitude', 'longitude', 'folder', 'path'))
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude, crs="EPSG:4326"))

Now we can have a look at our GeoDataFrame with gdf.head(), save it to CSV with gdf.to_csv("travel.csv") and open the CSV again with gdf = gpd.read_file("travel.csv").

Now we could, for example, plot them on a simple world map and also plot a convex hull (bounding box) around them.

# Use the natural earth dataset as basemap
natworld = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# For the convex hull, I need a geometry of all placemarks combined
combined = gdf.dissolve()

# Plot
fig, ax = plt.subplots(figsize=(10,5))
natworld.plot(ax=ax, color="darkgrey", edgecolor="lightgrey")
gdf.plot(ax=ax, color="blue", marker=".")
combined.convex_hull.plot(ax=ax, color="none", edgecolor="red")

The result:

Save the figure:

fig.savefig("bounding-box.png")