Entry G8: Components

3 minute read

Now that I have a general feel for the graph database with counts and density, I want to look at components.

The notebooks where I did my code for this entry can be found on my github page. I created three notebooks, one for each graph model. These notebooks contain the code for Entries G6, G7, and G8.

Note, after I created a multigraph with all three graph models, the code changed significantly. You can read Entry 16 for the results of these changes, but that entry is a supplement to this one, not a replacement.

Components

A component is just a set of connected nodes. I’ll be looking at three measures:

  • Component count
  • Component size
  • Component percent

Some quick facts about components:

  • A single graph can have many components, although there is usually a largest component containing over 90% of the graph (page 305 of Networks has a great chart showing basic metrics for 27 different graphs from 4 different industries)
  • There is by definition no path between any pair of nodes in different components
  • There are strongly connected components (i.e, cliques, which I talked about in Entry G5) and weakly connected components (any single path between nodes)

Here’s an example of a graph with two components from the simple bimodal example from Entry G3:

There is also something called k-components. This is the same concept as a regular component, but per page 180 of Networks “a set of nodes such that each is reachable from each of the others by at least k node-independent paths.”

Here’s an example from a subset of the Marvel Universe Social Network:

  • Blue = 1-component
  • Purple = 5-component
  • Green = 2-component
  • Red = 3-component

Component count

For this measure, all I want to know is how many components are in the graph.

The Marvel Universe Social Network has 22 components. However, 18 of those are isolated nodes. The question then becomes, do I want to count isolates as their own component?

My first thought was that the isolates are already accounted for as isolates in one of the earlier metrics. And this is 100% true for the unimodal model of the Marvel Universe Social Network. However, when looking at the metrics for the bimodal model of the graph, the 18 isolates are no longer isolated because those heroes are connected to the comic they appear in. As such, I’d need to include them in the components.

Decision: Isolates show nodes that aren’t connected to any other nodes. As such, I can safely remove components with a size of 1 from the results without losing information.

Component size

This is just the number of nodes in each component. For the unimodal model of the Universe Universe Social Network, there were four components (leaving out the 18 isolates as decided above).

As you can see in the table, this graph has the common giant component as referenced in the beginning of the entry.

Component ID Node Count
0 6403
239 9
92 7
3504 2

Component percent

This measure takes the component size one step further. We take the number of nodes in each component and divide by the the number of nodes in the graph. This gives us the size of each component within the full graph.

For the four components of the Marvel Universe Social Network unimodal model, we end up with this:

Component ID Node Count Node Percent
0 6,403 99.44%
239 9 0.14%
92 7 0.11%
3504 2 0.03%

Next Up

Global Metric Measures

Resources