Graph Calculations

Graph Operations

Download

This was a project for a data structures class. It serves as a great argument for outlining before writing code. I wrote it originally with the graph modeled as an adjacency list, as thats what the assignment called for being output, and it worked well enough for the first search algorithm I implemented. However, when I got to the topological sort, things got dicey.

This program takes as input a graph definition, in which the edges are defined. For instance, a simple graph might look like this:
A B
A C
B D

This would create a graph with an edge from A to B, an edge from B to C, and an edge from B to D.

It outputs (to stdout) a depth-first search, a breadth-first search, and a topological sort.

To explain how these work related to my algorithms, here is a sample graph:

And this graph, reprsented in an input file, would be: show/hide

My goal was to store these in an adjacency list. An adjacency list for this graph would look like: show/hide

Since I was displaying it this way, it made sense to me (at the time) to store it this way.

To do that, I store two separate kind of objects: A List object, and a Node object.

Each of the verticies that has children (one at the beginning of the line in the file) is stored in a List object. These have a pointer for a 'next' List object, so they can be stored in a linked list.
So we have a linked list of Lists.
Each of these Lists has a pointer to the first vertex in its adjacency list. These are stored in Nodes. These also have a pointer for 'next'. So each List item has a linked list full of Nodes.
However, this is only a representation of the adjacency list. The actual verticies are represented as Vertex objects. Each List and Node has a pointer to a Vertex, representing the vertex stored in that object.
Since some of these algorithms depend on the verticies being marked as 'visited', each List or Node referring to the same vertex could not just have a pointer to a different Vertex object. As I build these lists, every time a new vertex identifier ('A', 'B', etc) is encountered, I create a new vertex object and add it to a list. I reference this list every time I create a new Node or List. This way, all Nodes or Lists with the same vertex are pointing to the same vertex object in memory.

This diagram shows the memory layout of my system. Note that each orange rectangle represents the same location in memory. So the "B" pointed to by the List object at the top is the same "B" pointed to by the Node after "A."

This allowed the searches to be implemented easily. All we have to do is:
1. Take a List object, and mark its vertex as visited, and output its vertex's label.
2. Follow the link to its first node, and start at 1 on it.
3. Follow the link to the next node, and so on.
4. When we are out of Nodes, we are done.

A similar procedure is followed for the breadth-first search:
1. Take a List object, mark its vertex as visited, and output its vertex label.
2. Traverse the list of Nodes attatched to it, adding each of them to a queue, and outputting their names.
3. When we run out of links, repeat from 1 with the first Node we put in the queue.
4. When the queue is empty, we are done.

These processes could have been improved in efficiency, possibly, by having each Node link to the List object related to it, or by using Vertex objects in place of List objects. In the first case, this would eliminate the need for a function to look up the List for a given Vertex. I did not feel this was a good tradeoff for an extra pointer space in the Node structure. I considered the second case, but I did not want to compromise the List-as-Parent design I had set up.

The real problem with this architecture became apparent when I implemented the topological sort. The topological sort calls for finding a node with no children,outputting it at the left, and then deleting that node from the graph.

There are two problems with this in my architecture. First, and more trivially, deleting the node. Since Nodes linked from one List item don't know anything about Nodes linked from another linked item, simply deleting one of them wasn't sufficient. I had to traverse the entire graph every time, deleting every reference to a particular node. This is awful, but relatively simple.
Finding a node to delete, on the other hand, proved to be hideously inefficient.
Here is how I do it:
Traverse the entire graph. For each node..
1. Traverse the list of Lists until you find the current Node. 2. If the node has no children (the link is NULL), or if it is not on the List, it has no children.
3. Otherwise, start on the next node.

This is approximately O(n²) when, with a more sane design, it could be O(1).

So, with a few minutes spent thinking about the data and algorithm design, I could have saved myself some ugly hacks and the time spent developing them.

Note that this is written in ANSI C, but was only tested on a Linux system.