Storing things

Memory in a computer is just a very long row of bytes. We'll get into technicalities and practicalities in the module on optimisation, but for now it is sufficient to know that each byte has an address. When we talk about data structures, we're really talking about how to organise a set of memory addresses so that they are convenient for us.

The most basic data structure is the array — conceptually, just a compact row of elements defined by a starting address s, a word size w, and a length N. To find the kth element in the array, simply access the memory at the address s+w*k. This is O(1), superfast.

But what if you need to remove an element of the array?

You'd probably end up having to copy the array, incurring O(N). Doing that once or twice is probably fine, most algorithms are anyway at least O(N). But suppose that you're removing all the elements of the array, one by one --- then we've got O(N2). 

Linked lists

An alternative structure for a row of elements is the linked list:

Diagram showing linked list data structures

To remove a given element, simply change one or two links: O(1) time complexity. However, to find the kth in a linked list, you have to start at the head and iterate through them until you reach k, with a cost of O(n).

Data structures and algorithms are tightly intertwined — usually an algorithm relies on a given datastructure and a datastructure is used for a certain type of algorithm. 

Stacks

Suppose you only need to access the end of a list? For instance, to explore a maze in an exhaustive depth-first search, one approach is to remember all intersections. When you reach a dead end, simply retrieve the previous intersection and choose another way. This can be conceptualised by pushing an intersection on a stack when an intersection is passed, and popping the most recent unexplored intersection whenever a dead end is reached. Both operations are O(1). 

The stack is the canonical Last In First Out (LIFO) data structure. 

Operations on a stack

Queues

The opposite of LIFO is FIFO. This structure is useful for e.g. storing a set of work items to be processed in the order of submission.

Operations on a queue

Other linked lists

The C++ Standard Template Library vector is array-like, but with O(1) cost to add elements to the beginning or end.

Dense matrices are typically built as arrays of arrays (or an array of links to arrays), but unstructured sparse matrices are often constructed as a linked list of elements.

 

Trees

Elements don't need to be stored in a linear sequence. Trees are a class of data structure with an enormous range of properties.

example of a tree

In the above example, node 1 is the root. The roots children are nodes 2 and 3. Nodes 1, 2, and 5 are internal nodes, while 3, 4, 6, 7 and 8 are leaf nodes

The depth of a node is the number of steps to the root. The height of a tree is the maximum depth. The height of a balanced tree is O(log  N).

A common type of tree is a binary tree, in which each node has at most 2 children.

Tree traversal

There are three logical ways to traverse an entire tree, starting from the root, giving three different orderings.

  • Pre-order: 1,2,4,5,6,7,8,3
  • In-order (dropping node 8 to make tree binary): 4,2,6,5,7,1,3
  • Post-order: 4,6,7,8,5,2,3,1 Links to an external site.