Entry G21: Diameter
Having encountered the limit of connections between node pairs in Entry G19, I couldn’t resist taking a closer look. I’ve been trying to calculate the diameter of a graph pretty much since I started trying to run metrics on graphs.
The Problem
Diameter is the longest shortest path between any connected pair of nodes in the graph. This is simpler than it sounds. Fortunately, we just covered shortest paths in the Entry G20: of the potentially multiple paths to get from one node to the other, the shortest path has the fewest steps.
Shortest Path
If you didn’t like the map example from Entry 20, try this revision of the expanded egonet from Entry G18:
I traced out three paths from the node outlined in green to the node outlined in blue:
- Green: this is the shortest path, it only takes two steps to get from the green node to the blue one
- Purple: this path goes through our origin node (the one we built the egonet around). It’s tempting to go through this node when looking for the shortest path, but since we traverse four steps using this particular pair, it would be the incorrect answer
- Red: this is the longest path I could think of. Not sure why we’d ever want to use this path unless we were trying to get lost. There are twelve steps
Diameter
So for diameter, we look at all the shortest paths in the graph and figure out which one is the longest. In other words, using the most efficient path between any pair of nodes, what is the maximum number of steps we can take.
For our particular example using the egonet, we know that the longest possible path is four steps (the distance we expanded out times two). However, keep in mind that when including all relationships in the egonet, the diameter may be shorter (as seen above connecting the green and blue nodes).
For our example, there is in fact a shortest path of four, so that is the diameter of the egonet (I duplicated the relationship between the H2 and H1 nodes in the lower right to make it more clear that both shortest paths between the pair of nodes are four steps).
The Solution
The Entry G19c notebook already revealed that the diameter of Marvel Universe Social Network unimodal model is 5, but I created a paired down version of the code to return just the diameter in the Entry G21 notebook. It’s almost exactly the same as the Entry G20 notebook code, but I removed the limit:1
parameter and searched on all Heroes instead of just villains.
So we went from this:
MATCH (h:Hero)
call apoc.path.spanningTree(h, {labelFilter:'/Villain', minLevel: 1, limit: 1})
YIELD path
RETURN h.name as name, labels(h)[-1] as type, length(path) as min_distance
to this:
MATCH (h:Hero)
call apoc.path.spanningTree(h, {labelFilter:'>Hero', minLevel: 1})
YIELD path
RETURN h.name as name, labels(h)[-1] as type, length(path) as min_distance, count(path) as total_ct
Note that I took out the maxLevel
parameter because we want to know the farthest distance between connected pairs regardless of how far that is. If the graph is sparsely connected or has low assortativity, this could be a problem as the paths could be long and strung out. An example of a long path would be this illustration from Max De Marzi’s Fraud Detection slideshare, which has a distance of 26 from one end to the other:
I also added back in count(path)
to the RETURN
statement. If we leave off count(path)
we end up with duplicate rows, which represent the target node. Since we don’t return the name of the node we connect to, only the name of the node we start at, these rows look and act like duplicates.
By returning the count, we group the rows together, removing the duplicates and also get to see how many people each hero connects to at the full diameter. There are 210 nodes that have the full diameter length of five. Here we can see a sample of them and how many nodes they connect to at that distance.
Up Next
Mean Distance Between Connected Nodes