Friday, July 29, 2022

New graph layouts with networkx and matplotlib

There have been many posts online about generating networkx and matplotlib graphs, however I ran into troubles where my graphs didn't quite agree well with default layouts that networkx provides. Here are some examples:

Spring layout
Spectral layout

I've tried them all but in many cases node data was hard to read and links were either too large or too small. I'm working on a crafting system for a game, and want to be able to visualize recipes and item progression through the game. I want to detect useless items, or uncraftable items, as well as a system to help me balance the item crafting with the passage of time. My graph is a DiGraph that has two kinds of nodes - buildings that contain recipes, and items that can go through a building to change its form. E.g. ore -> Forge -> iron. In addition, buildings take items to be built - so there are two kinds of paths - recipe path, and construction path. Since most buildings are built from some sort of wood/stone combo there are many construction paths  coming out of wood and stone. That is partially why both spring and spectral layouts tend to be clustered around wood and stone - those are the most connected nodes. Dropping those links doesn't help at all:

Spring layoutSpectral layout

So I started making my own layout. Networkx's layout generators are relatively straight-forward in their API - given a graph, output a dictionary that maps nodes to their positions. Mapping all of the nodes to (0,0) for example can be done with:

pos = dict(zip(G, np.zeros((len(G), 2))))

Where G is the Graph. Since I want to show a clear relationship between nodes connecting to each other I decided to calculate nodes' position based on paths from one node to another. I tend to read left-to-right so I wanted to lay the recipes out left-to-right. We start with some source nodes - nodes that don't have any links coming to them - those are also going to be building nodes - and place them on the left most of the graph. Then we trace the connections, following only the recipe connections (not the construction ones). Each time we visit a child, we put it right of the parent, unless the spot is already taken, then we put it right, above the parent. Repeat until we are able to place all of the children. Let's convert this paragraph to code!
First, we start with some source nodes. Networkx lets you to check that with 

if len(list(G.predecessors(node))) == 0:    

However, remember that our buildings also take construction materials. So wood and stone would be a predecessor to all of the buildings. In my particular case I can just filter nodes that aren't buildingds, but for general case we can solve this by passing a lambda for ignoring non-source nodes while placing objects:

is_construction_link = lambda child: \
G.get_edge_data(child, node) is not None and \
('type' not in G.get_edge_data(child, node) or \
G.get_edge_data(child, node)['type'] != 'construction_resource')
if len(list(filter(is_construction_link, G.predecessors(node)))) == 0:
# ready to place nodes

You will likely want to change the way you check edge data, but that's my graph's data structure. From here we're ready to start placing nodes. We will need to keep track of the span for node's children, as well as an ability to move child nodes a bit up for a better alignment. We can achieve it using a doubly nested dictionary, but I created a helper class to simplify syntax a little bit and provide default values:

class Grid:
def __init__(self): = {}
def __getitem__(self, t):
col, row = t
if col in
if row in[col]:
return None
def __setitem__(self, t, value):
col, row = t, {})[col][row] = value

This makes grid to be referenced by a single square braket pair and will never complain about missing keys this allows us to quickly find an empty cell:

while grid[col, row] is not None:
row += 1

 So lets review the way we want to generate this layout:

  1. We start at the most left and place a source node there.
  2. For each child of the node we place them to the right. 
  3. Ideally we recenter parent node so that parent is aligned with the center of its children nodes
  4. Repeat (2.) until we don't have any more children
  5. Repeat (1.) until we don't have any more unvisited source nodes.

 The unvisited is one of the keys there, but it will depend on how you define your source node lambda. If you use DiGraph and the  G.predecessors(node check, trace will never be able to reach those nodes since they don't have anything that can reach them, but if you use a regular Graph that won't work, so I included the visited set check to make sure we don't have infinite loops. This check also helps if you have cycles in your recipes.Let's put the whole function together:

def flow_layout(G, is_source_node):
import numpy as np
import sys
pos = dict(zip(G, np.zeros((len(G), 2))))
grid = Grid()
visited = set()
def trace(node, col, row):
def set_pos(node, col, row): # helper function
pos[node][0] = col
pos[node][1] = row
grid[col, row] = node
if node not in visited: # place only unvisited nodes
while grid[col, row] is not None:
row += 1
from_row, to_row = (row, row)
child_min = sys.maxsize
for child in G.neighbors(node):
edge = G.get_edge_data(node, child)
if 'type' not in edge or edge['type'] != 'construction_resource':
child_from_row, child_to_row = trace(child, col+1, row)
if child_from_row < child_min:
child_min = child_from_row
if child_to_row > to_row:
to_row = child_to_row
if to_row >= child_min > from_row:
from_row = child_min
# children center parents here
new_row = int((from_row+ to_row)*0.5)
while grid[col, new_row] is not None:
new_row += 1
if new_row > to_row:
to_row = new_row
set_pos(node, col, new_row)
return from_row, to_row
else: # the node has been visited
# if it's a child node that's placed to the side - recenter it
# parents center children here
if col-1 > 1 and col-1 < pos[node][0] and row > pos[node][1]+1:
avg = int((row + pos[node][1])/2)
pcol = pos[node][0]
while grid[pcol, avg] is not None:
avg += 1
old = pos[node][1]
pos[node][1] = avg
grid[pcol, old] = None
grid[pcol, avg] = node
return (0,0)
row = 1
for node in G:
if is_source_node(G, node):
tmp = trace(node, 1, row)[1]
if tmp > row:
row = tmp
return pos

And the result we get is:

Old is New: Remote firewall administration without a network

 I have a sleepless two-month old, a toddler, and a full-time technical job, so I really don't want to do much home-IT when I don't ...