Entry G2: Create a Neo4j Database

3 minute read

To begin any data science project, I need a data to play with. For the graph project I’m going to start with the Marvel Universe Social Network available on Kaggle. I picked this because it’s publicly available, stored as .csv files, and easily fits into both bimodal and unimodal graph models.

Get Database software

I work with Neoj4 databases at work, so that’s what I’ll use here too. This decision was based on my familiarity with the product as well as the fact that Neo4j has an open source Community Edition and free Desktop edition.

I’ll be using the Desktop edition for the examples. To follow along, download the version of the free Neo4j Desktop edition appropriate for your computer and follow the directions.

When you’ve got it loaded and fired up, you should have a page like this, but with an empty My Projects area:

Create an Empty Database

Now we need a new, empty graph that we can load the Marvel data into. This is easy and straight forward:

1. Make sure you’re on database page

This is the database stack icon in the far left.

2. Click the Add button

It’s located in the upper right of the My Project area.

3. Choose Local DBMS

4. Enter your database information

You’ll need to give your database a name, set a password, and choose a Neo4j version

5. Admire your new database

Setup the database

Next we need to prepare the database for data.

1. Click on your new database

This opens the Options for the database. It defaults to the Details page.

2. Navigate to Plugins

We want to add some plugins to make working with the data easier. To do this, choose the Plugins tab. You’ll see the four available plugins

3. Install the desired plugins

  1. Expand the library you want
  2. Click the Install button

I generally use the APOC library and Graph Data Science Library

  • APOC: this library holds a lot of useful, optimized functions that make writing queries easier
  • Graph Data Science Library: this library holds a lot of useful, optimized functions that are specifically designed for data science and analytic purposes

Side note You can add/remove/change the options at any time (now or after loading the data). However, if the database is running it takes longer to add plugins because the database has to restart for every library you install.

Import data

Now that our database is ready, we can import data.

Tomaz Bratanic, who often posts about graph and Neo4j on his Graph People blog, kindly hosts the network on his github page, which is easier to connect to than the Kaagle page, so we’ll import the data from there.

1. Start the database

Click Start to start the database

2. Open the database

Click Open to fire up the browser based interface.

You’ll notice that once the database is running, the options change slightly; Start has changed to Stop and Open is now selectable.

3. Navigate to Neo4j Browser

You should now have a Neo4j Desktop Browser window open. The initial page usually looks something like this (I’m using Neo4j Desktop version 1.4.1 and database version 4.1.3):

4. Set the schema

Since we’re using the Marvel Universe Social Network, we want two node labels: “Hero” and “Comic”. To do this, we’ll use the Neoj4 Broswer command line.

Just input the following Cypher code

CALL apoc.schema.assert( {},
{Comic:['name'],Hero:['name']});

5. Load the data

Use the following query to pull the data directly from Tomaz’s github page

CALL apoc.load.csv('https://raw.githubusercontent.com/tomasonjo/neo4j-marvel/master/data/edges.csv') yield map as row WITH row
MERGE (h:Hero {name:row.hero})
MERGE (c:Comic {name:row.comic})
MERGE (h)-[:APPEARS_IN]->(c);

Up Next

Choosing a Graph Model (schema)

Resources