Entry G13: Weighted Degree Comparison

2 minute read

Like Entry G12 this is a redo of part of Entry G10. This entry addresses the weighted degrees. If you need a reminder as to what weighted relationships are see Entry G4: Modeling Relationships.

Unlike Entry G12, I only needed one notebook to examine the weighted degrees. It can be found in the G13 notebook.

The changes from the Entry 10 notebooks to this one include:

Used a multigraph instead of three separate graph instances
Included OPTIONAL MATCH in the summary statistic pattern match queries
Put results into DataFrames for easier comparison
Added Comic to Comic summary statistics and distribution charts

Used a Multigraph

Just like with Entry G12, using the multigraph allowed me to query the same information against each of my graph models and easily compare the results. This allowed me to discover that the data in the Mixed Model loaded incorrectly.

There were over 100 relationships that were unaccounted for in the Mixed Model once I summed the weighted relationships. Since the Mixed Model is a stepping stone to the Unimodal Model, I knew that some of the relationships must have errored out. I cleared out the Mixed Modal graph with MATCH (n) DETACH DELETE n and reloaded the graph using the code from Entry G11’s notebook. This fixed the discrepancy. It also fixed the discrepancy noted in Entry G12 for the number of KNOWS relationships in the Unimodal and Mixed Models.

Now, just because I can run a query against all three models doesn’t mean I should. If you look in the G13 notebook, you’ll notice that I ran Hero to Hero relationships for all three models, but only ran Comic to Comic relationships for the Bimodal and Mixed Models. This is because all Comic information was removed from the Unimodal Model when we projected it.

Included OPTIONAL MATCH

Just as explained in Entry G12 and earlier in Entry G10, using OPTIONAL MATCH instead of the more restrictive MATCH allows the query to find isolate nodes (nodes that have no relationships to other nodes). This gives a more complete picture when examining the summary statistics and distribution charts.

Put Results in DataFrames

I can’t stress enough how helpful it is to have the results in a DataFrame instead of spread out across multiple cells. As an added bonus the formatting is the same every time instead of sometimes putting results on the same line and sometimes putting each result on its own line. Also, the font is easier to read.

Same line:

[{'degree_min': 1, 'degree_max': 111, 'degree_avg': 8.0, 'degree_stdev': 6.0}]

Multiple lines:

[{'degree_min': 1,
'degree_max': 111,
'degree_avg': 8.0,
'degree_stdev': 6.0}]

DataFrame:

Added Comic to Comic

I threw in the Comic to Comic weighted relationships mostly because I could. It does give me a second sample to examine without having to create another multigraph of graph models from a new dataset.

Up Next

Global Counts Comparison

Resources

Twitter Facebook LinkedIn

Julie Fisher

Entry G13: Weighted Degree Comparison

Used a Multigraph

Included OPTIONAL MATCH

Put Results in DataFrames

Added Comic to Comic

Up Next

Resources

You May Also Enjoy

Entry SM02: Clean Data

Entry SM01: Using S3 from AWS’s SageMaker

Entry NLP4: Frequencies and Comparison

Entry NLP3: Clean Data and Split into N-grams