Graph storage systems can be pretty hard to grok. Especially if you are, like me, used to relational database systems.
Recently I had the opportunity to examine a suite of technologies related to graph storage and I felt the need to archive what I discovered here.
A certain fork of NetworkX allows you to export your graph into the resource description framework. RDF is old and XML-based. The advantage to RDF is that it 1. allows you to define your own namespaces and 2. is able to persist in a relational database thanks to Triple Store.
I found namespaces to be the hardest thing to understand about RDF. Mostly because they are so free-form and also because they are fundamentally XML-based. But the advantage of a flexible namespace is that it is very easy to model complex interactions between resources. Even resources that don’t belong in your own namespace. One thing I discovered is that the government is very fond of namespaces between its departments.
After exporting my NetworkX graph to RDF, I was able to persist it into using the MySQL triplestore component. This handy feature creates the tables and indexes you need. Aside from providing it with the connection string to your MySQL instance you don’t work with the database at all. That’s because RDF also describes how you should go about querying for data.
SPARQL is the language RDF specifies for exploring graphs. Properly implemented, SPARQL can facilitate some very interesting questions about the stored graph.
For my POC, I wrote a simple application that you can play with here. It includes a simple REST layer for obtaining nodes and edges and a D3-based interface for exploring the graph.
Here are some screenshots of the app in all it’s glory.