# Graphs, Kites and Darts

Non-periodic tilings with Penrose’s kites and darts

We continue our investigation of the tilings using Haskell with Haskell Diagrams. What is new is the introduction of a planar graph representation. This allows us to define more operations on finite tilings, in particular forcing and composing.

Previously in Diagrams for Penrose Tiles we implemented tools to create and draw finite patches of Penrose kites and darts (such as the samples depicted in figure 1). The code for this and for the new graph representation and tools described here can be found on GitHub https://github.com/chrisreade/PenroseKiteDart.

To describe the tiling operations it is convenient to work with the half-tiles: LD (left dart), RD (right dart), LK (left kite), RK (right kite) using a polymorphic type HalfTile (defined in a module HalfTile)

data HalfTile rep
= LD rep | RD rep | LK rep | RK rep   deriving (Show,Eq)

Here rep is a type variable for a representation to be chosen. For drawing purposes, we chose two-dimensional vectors (V2 Double) and called these Pieces.

type Piece = HalfTile (V2 Double)

The vector represents the join edge of the half tile (see figure 2) and thus the scale and orientation are determined (the other tile edges are derived from this when producing a diagram).

Finite tilings or patches are then lists of located pieces.

type Patch = [Located Piece]

Both Piece and Patch are made transformable so rotate, and scale can be applied to both and translate can be applied to a Patch. (Translate has no effect on a Piece unless it is located.)

In Diagrams for Penrose Tiles we also discussed the rules for legal tilings and specifically the problem of incorrect tilings which are legal but get stuck so cannot continue to infinity. In order to create correct tilings we implemented the decompose operation on patches.

The vector representation that we use for drawing is not well suited to exploring properties of a patch such as neighbours of pieces. Knowing about neighbouring tiles is important for being able to reason about composition of patches (inverting a decomposition) and to find which pieces are determined (forced) on the boundary of a patch.

However, the polymorphic type HalfTile allows us to introduce our alternative graph representation alongside Pieces.

# Tile Graphs

In the module Tgraph.Prelude, we have the new representation which treats half tiles as triangular faces of a planar graph – a TileFace – by specialising HalfTile with a triple of vertices (clockwise starting with the tile origin). For example

LD (1,3,4)       RK (6,4,3)
type Vertex = Int
type TileFace = HalfTile (Vertex,Vertex,Vertex)

When we need to refer to particular vertices from a TileFace we use originV (the first vertex – red dot in figure 2), oppV (the vertex at the opposite end of the join edge – dashed edge in figure 2), wingV (the remaining vertex not on the join edge).

originV, oppV, wingV :: TileFace -> Vertex

Tgraphs

The Tile Graphs implementation uses a type Tgraph which has a list of graph vertices and a list of tile faces.

data Tgraph = Tgraph { vertices :: [Vertex]
, faces    :: [TileFace]
}  deriving (Show)

For example, fool (short for a fool’s kite) is a Tgraph with 6 faces and 7 vertices, shown in figure 3.

fool = Tgraph { vertices = [1,2,3,4,5,6,7]
, faces = [RD (1,2,3),LD (1,3,4),RK (6,2,5)
,LK (6,3,2),RK (6,4,3),LK (6,7,4)
]
}

(The fool is also called an ace in the literature)

With this representation we can investigate how composition works with whole patches. Figure 4 shows a twice decomposed sun on the left and a once decomposed sun on the right (both with vertex labels). In addition to decomposing the right graph to form the left graph, we can also compose the left graph to get the right graph.

After implementing composition, we also explore a force operation and an emplace operation to extend tilings.

There are some constraints we impose on Tgraphs.

• No spurious vertices. Every vertex of a Tgraph face must be one of the Tgraph vertices and each of the Tgraph vertices occurs in at least one of the Tgraph faces.
• Connected. The collection of faces must be a single connected component.
• No crossing boundaries. By this we mean that vertices on the boundary are incident with exactly two boundary edges. The boundary consists of the edges between the Tgraph faces and exterior region(s). This is important for adding faces.
• Tile connected. Roughly, this means that if we collect the faces of a Tgraph by starting from any single face and then add faces which share an edge with those already collected, we get all the Tgraph faces. This is important for drawing purposes.

In fact, if a Tgraph is connected with no crossing boundaries, then it must be tile connected. (We could define tile connected to mean that the dual graph excluding exterior regions is connected.)

Figure 5 shows two excluded graphs which have crossing boundaries at 4 (left graph) and 13 (right graph). The left graph is still tile connected but the right is not tile connected (the two faces at the top right do not have an edge in common with the rest of the faces.)

Although we have allowed for Tgraphs with holes (multiple exterior regions), we note that such holes cannot be created by adding faces one at a time without creating a crossing boundary. They can be created by removing faces from a Tgraph without necessarily creating a crossing boundary.

Important We are using face as an abbreviation for half-tile face of a Tgraph here, and we do not count the exterior of a patch of faces to be a face. The exterior can also be disconnected when we have holes in a patch of faces and the holes are not counted as faces either. In graph theory, the term face would generally include these other regions, but we will call them exterior regions rather than faces.

In addition to the constructor Tgraph we also use

checkedTgraph:: [TileFace] -> Tgraph

which creates a Tgraph from a list of faces, but also performs checks on the required properties of Tgraphs. We can then remove or select faces from a Tgraph and then use checkedTgraph to ensure the resulting Tgraph still satisfies the required properties.

selectFaces, removeFaces  :: [TileFace] -> Tgraph -> Tgraph
selectFaces fcs g = checkedTgraph (faces g intersect fcs)
removeFaces fcs g = checkedTgraph (faces g \\ fcs)

Edges and Directed Edges

We do not explicitly record edges as part of a Tgraph, but calculate them as needed. Implicitly we are requiring

• No spurious edges. The edges of a Tgraph are the edges of the faces of the Tgraph.

To represent edges, a pair of vertices (a,b) is regarded as a directed edge from a to b. A list of such pairs will usually be regarded as a directed edge list. In the special case that the list is symmetrically closed [(b,a) is in the list whenever (a,b) is in the list] we will refer to this as an edge list rather than a directed edge list.

The following functions on TileFaces all produce directed edges (going clockwise round a face).

  -- join edge - dashed in figure 2
joinE  :: TileFace -> (Vertex,Vertex)
-- the short edge which is not a join edge
shortE :: TileFace -> (Vertex,Vertex)
-- the long edge which is not a join edge
longE  :: TileFace -> (Vertex,Vertex)
-- all three directed edges clockwise from origin
faceDedges :: TileFace -> [(Vertex,Vertex)]

For the whole Tgraph, we often want a list of all the directed edges of all the faces.

graphDedges :: Tgraph -> [(Vertex,Vertex)]
graphDedges g = concatMap faceDedges (faces g)

Because our graphs represent tilings they are planar (can be embedded in a plane) so we know that at most two faces can share an edge and they will have opposite directions of the edge. No two faces can have the same directed edge. So from graphDedges g we can easily calculate internal edges (edges shared by 2 faces) and boundary directed edges (directed edges round the external regions).

internalEdges, boundaryDedges :: Tgraph -> [(Vertex,Vertex)]

The internal edges of g are those edges which occur in both directions in graphDedges g. The boundary directed edges of g are the missing reverse directions in graphDedges g.

We also refer to all the long edges of a Tgraph (including kite join edges) as phiEdges (both directions of these edges).

phiEdges :: Tgraph -> [(Vertex, Vertex)]

This is so named because, when drawn, these long edges are phi times the length of the short edges (phi being the golden ratio which is approximately 1.618).

# Drawing Tgraphs (Patches and VPatches)

The module Tgraph.Convert contains functions to convert a Tgraph to our previous vector representation (Patch) defined in TileLib so we can use the existing tools to produce diagrams.

makePatch :: Tgraph -> Patch

drawPatch :: Patch -> Diagram B -- defined in module TileLib

drawGraph :: Tgraph -> Diagram B
drawGraph = drawPatch . makePatch

However, it is also useful to have an intermediate stage (a VPatch = Vertex Patch) which contains both face (vertices) and vectors. This allows vertex labels to be drawn and for faces to be identified and retained/excluded after the vector information is calculated.

data VPatch  = VPatch {lVertices :: [Located Vertex]
,lHybrids :: [Located Hybrid]
}

A Vpatch has a list of located vertices and a list of located hybrids, where a Hybrid is a HalfTile with a dual representation of the face (vertices) and vector (join edge). We make VPatch transformable so it can also be an argument type for rotate, translate, and scale.

The conversion functions include

makeVPatch   :: Tgraph -> VPatch
dropVertices :: VPatch -> Patch -- discards vertex information
drawVPatch   :: VPatch -> Diagram B  -- draws labels as well

drawVGraph   :: Tgraph -> Diagram B
drawVGraph = drawVPatch . makeVPatch

One consequence of using abstract graphs is that there is no unique predefined way to orient or scale or position the patch arising from a graph representation. Our implementation selects a particular join edge and aligns it along the x-axis (unit length for a dart, philength for a kite) and tile-connectedness ensures the rest of the patch can be calculated from this.

We also have functions to re-orient a Vpatch and lists of VPatchs using chosen pairs of vertices. [Simply doing rotations on the final diagrams can cause problems if these include vertex labels. We do not, in general, want to rotate the labels – so we need to orient the Vpatch before converting to a diagram]

# Decomposing Graphs

We previously implemented decomposition for patches which splits each half-tile into two or three smaller scale half-tiles.

decompose :: Patch -> Patch

We now have a Tgraph version of decomposition in the module Tgraphs:

decomposeG :: Tgraph -> Tgraph

Graph decomposition is particularly simple. We start by introducing one new vertex for each long edge (the phiEdges) of the Tgraph. We then build the new faces from each old face using the new vertices.

As a running example we take fool (mentioned above) and its decomposition foolD

*Main> foolD = decomposeG fool

*Main> foolD
Tgraph { vertices = [1,8,3,2,9,4,5,13,10,6,11,14,7,12]
, faces = [LK (1,8,3),RD (2,3,8),RK (1,3,9)
,LD (4,9,3),RK (5,13,2),LK (5,10,13)
,RD (6,13,10),LK (3,2,13),RK (3,13,11)
,LD (6,11,13),RK (3,14,4),LK (3,11,14)
,RD (6,14,11),LK (7,4,14),RK (7,14,12)
,LD (6,12,14)
]
}

which are best seen together (fool followed by foolD) in figure 6.

# Composing graphs, and Unknowns

Composing is meant to be an inverse to decomposing, and one of the main reasons for introducing our graph representation. In the literature, decomposition and composition are defined for infinite tilings and in that context they are unique inverses to each other. For finite patches, however, we will see that composition is not always uniquely determined.

In figure 7 (Two Levels) we have emphasised the larger scale faces on top of the smaller scale faces.

How do we identify the composed tiles? We start by classifying vertices which are at the wing tips of the (smaller) darts as these determine how things compose. In the interior of a graph/patch (e.g in figure 7), a dart wing tip always coincides with a second dart wing tip, and either

1. the 2 dart halves share a long edge. The shared wing tip is then classified as a largeKiteCentre and is at the centre of a larger kite. (See left vertex type in figure 8), or
2. the 2 dart halves touch at their wing tips without sharing an edge. This shared wing tip is classified as a largeDartBase and is the base of a larger dart. (See right vertex type in figure 8)

[We also call these (respectively) a deuce vertex type and a jack vertex type later in figure 10]

Around the boundary of a graph, the dart wing tips may not share with a second dart. Sometimes the wing tip has to be classified as unknown but often it can be decided by looking at neighbouring tiles. In this example of a four times decomposed sun (sunD4), it is possible to classify all the dart wing tips as largeKiteCentres or largeDartBases so there are no unknowns.

If there are no unknowns, then we have a function to produce the unique composed graph.

composeG:: Tgraph -> Tgraph

Any correct decomposed graph without unknowns will necessarily compose back to its original. This makes composeG a left inverse to decomposeG provided there are no unknowns.

For example, with an (n times) decomposed sun we will have no unknowns, so these will all compose back up to a sun after n applications of composeG. For n=4 (sunD4 – the smaller scale shown in figure 7) the dart wing classification returns 70 largeKiteCentres, 45 largeDartBases, and no unknowns.

Similarly with the simpler foolD example, if we classsify the dart wings we get

largeKiteCentres = [14,13]
largeDartBases = [3]
unknowns = []

In foolD (the right hand graph in figure 6), nodes 14 and 13 are new kite centres and node 3 is a new dart base. There are no unknowns so we can use composeG safely

*Main> composeG foolD
Tgraph { vertices = [1,2,3,4,5,6,7]
, faces = [RD (1,2,3),LD (1,3,4),RK (6,2,5)
,RK (6,4,3),LK (6,3,2),LK (6,7,4)
]
}

which reproduces the original fool (left hand graph in figure 6).

However, if we now check out unknowns for fool we get

largeKiteCentres = []
largeDartBases = []
unknowns = [4,2]    

So both nodes 2 and 4 are unknowns. It had looked as though fool would simply compose into two half kites back-to-back (sharing their long edge not their join), but the unknowns show there are other possible choices. Each unknown could become a largeKiteCentre or a largeDartBase.

The question is then what to do with unknowns.

# Partial Compositions

In fact our composeG resolves two problems when dealing with finite patches. One is the unknowns and the other is critical missing faces needed to make up a new face (e.g the absence of any half dart).

It is implemented using an intermediary function for partial composition

partCompose:: Tgraph -> ([TileFace],Tgraph) 

partCompose will compose everything that is uniquely determined, but will leave out faces round the boundary which cannot be determined or cannot be included in a new face. It returns the faces of the argument graph that were not used, along with the composed graph.

Figure 9 shows the result of partCompose applied to two graphs. [These are force kiteD3 and force dartD3 on the left. Force is described later]. In each case, the excluded faces of the starting graph are shown in pale green, overlaid by the composed graph on the right.

Then composeG is simply defined to keep the composed faces and ignore the unused faces produced by partCompose.

composeG:: Tgraph -> Tgraph
composeG = snd . partCompose 

This approach avoids making a decision about unknowns when composing, but it may lose some information by throwing away the uncomposed faces.

For correct Tgraphs g, if decomposeG g has no unknowns, then composeG is a left inverse to decomposeG. However, if we take g to be two kite halves sharing their long edge (not their join edge), then these decompose to fool which produces an empty graph when recomposed. Thus we do not have g = composeG (decomposeG g) in general. On the other hand we do have g = composeG (decomposeG g) for correct whole-tile Tgraphs g (whole-tile means all half-tiles of g have their matching half-tile on their join edge in g)

Later (figure 21) we show another exception to g = composeG(decomposeG g) with an incorrect tiling.

We make use of

selectFacesVP    :: [TileFace] -> VPatch -> VPatch
removeFacesVP    :: [TileFace] -> VPatch -> VPatch
selectFacesGtoVP :: [TileFace] -> Tgraph -> VPatch
removeFacesGtoVP :: [TileFace] -> Tgraph -> VPatch

for creating VPatches from selected tile faces of a Tgraph or VPatch. This allows us to represent and draw a subgraph which need not be connected nor satisfy the no crossing boundaries property provided the Tgraph it was derived from had these properties.

# Forcing

When building up a tiling, following the rules, there is often no choice about what tile can be added alongside certain tile edges at the boundary. Such additions are forced by the existing patch of tiles and the rules. For example, if a half tile has its join edge on the boundary, the unique mirror half tile is the only possibility for adding a face to that edge. Similarly, the short edge of a left (respectively, right) dart can only be matched with the short edge of a right (respectively, left) kite. We also make use of the fact that only 7 types of vertex can appear in (the interior of) a patch, so on a boundary vertex we sometimes have enough of the faces to determine the vertex type. These are given the following names in the literature (shown in figure 10): sun, star, jack (=largeDartBase), queen, king, ace, deuce (=largeKiteCentre).

The function

force :: Tgraph -> Tgraph

will add some faces on the boundary that are forced (i.e new faces where there is exactly one possible choice). For example:

• When a join edge is on the boundary – add the missing half tile to make a whole tile.
• When a half dart has its short edge on the boundary – add the half kite that must be on the short edge.
• When a vertex is both a dart origin and a kite wing (it must be a queen or king vertex) – if there is a boundary short edge of a kite half at the vertex, add another kite half sharing the short edge, (this converts 1 kite to 2 and 3 kites to 4 in combination with the first rule).
• When two half kites share a short edge their common oppV vertex must be a deuce vertex – add any missing half darts needed to complete the vertex.

Figure 11 shows foolDminus (which is foolD with 3 faces removed) on the left and the result of forcing, ie force foolDminus on the right which is the same graph we get from force foolD.

foolDminus =
removeFaces [RD(6,14,11), LD(6,12,14), RK(5,13,2)] foolD

Figures 12, 13 and 14 illustrate the result of forcing a 5-times decomposed kite, a 5-times decomposed dart, and a 5-times decomposed sun (respectively). The first two figures reproduce diagrams from an article by Roger Penrose illustrating the extent of influence of tiles round a decomposed kite and dart. [Penrose R Tilings and quasi-crystals; a non-local growth problem? in Aperiodicity and Order 2, edited by Jarich M, Academic Press, 1989. (fig 14)].

In figure 15, the bottom row shows successive decompositions of a dart (dashed blue arrows from right to left), so applying composeG to each dart will go back (green arrows from left to right). The black vertical arrows are force. The solid blue arrows from right to left are (force . decomposeG) being applied to the successive forced graphs. The green arrows in the reverse direction are composeG again and the intermediate (partCompose) figures are shown in the top row with the ignored faces in pale green.

Figure 16 shows the forced graphs of the seven vertex types (with the starting graphs in red) along with a kite (top right).

These are related to each other as shown in the columns. Each graph composes to the one above (an empty graph for the ones in the top row) and the graph below is its forced decomposition. [The rows have been scaled differently to make the vertex types easier to see.]

# Adding Faces to a Tgraph

This is technically tricky because we need to discover what vertices (and implicitly edges) need to be newly created and which ones already exist in the Tgraph. This goes beyond a simple graph operation and requires use of the geometry of the faces. We have chosen not to do a full conversion to vectors to work out all the geometry, but instead we introduce a local representation of angles at a vertex allowing a simple equality test.

Integer Angles

All vertex angles are integer multiples of 1/10th turn (mod 10) so we use these integers for face internal angles and boundary external angles. The face adding process always adds to the right of a given directed edge (a,b) which must be a boundary directed edge. [Adding to the left of an edge (a,b) would mean that (b,a) will be the boundary direction and so we are really adding to the right of (b,a)]. Face adding looks to see if either of the two other edges already exist in the graph by considering the end points a and b to which the new face is to be added, and checking angles.

This allows an edge in a particular sought direction to be discovered. If it is not found it is assumed not to exist. However, this will be undermined, there are crossing boundaries . In this case there must be more than two boundary directed edges at the vertex and there is no unique external angle.

Establishing the no crossing boundaries property ensures these failures cannot occur. We can easily check this property for newly created graphs (with checkedTgraph) and the face adding operations cannot create crossing boundaries.

Touching Vertices and Crossing Boundaries

When a new face to be added on (a,b) has neither of the other two edges already in the graph, the third vertex needs to be created. However it could already exist in the Tgraph – it is not on an edge coming from a or b but from another non-local part of the Tgraph. We call this a touching vertex. If we simply added a new vertex without checking for a clash this would create a nonsense graph. However, if we do check and find an existing vertex, we still cannot add the face using this because it would create a crossing boundary.

Our version of forcing prevents face additions that would create a touching vertex/crossing boundary by calculating the positions of boundary vertices.

No conflicting edges

There is a final (simple) check when adding a new face, to prevent a long edge (phiEdge) sharing with a short edge. This can arise if we force an incorrect graph (as we will see later).

# Implementing Forcing

Our order of forcing prioritises updates (face additions) which do not introduce a new vertex. Such safe updates are easy to recognise and they do not require a touching vertex check. Surprisingly, this pretty much removes the problem of touching vertices altogether.

As an illustration, consider foolDMinus again on the left of figure 11. Adding the left dart onto edge (12,14) is not a safe addition (and would create a crossing boundary at 6). However, adding the right dart RD(6,14,11) is safe and creates the new edge (6,14) which then makes the left dart addition safe. In fact it takes some contrivance to come up with a Tgraph with an update that could fail the check during forcing when safe cases are always done first. Figure 17 shows such a contrived Tgraph formed by removing the faces shown in green from a twice decomposed sun on the left. The forced result is shown on the right. When there are no safe cases, we need to try an unsafe one. The four green faces at the bottom are blocked by the touching vertex check. This leaves any one of 9 half-kites at the centre which would pass the check. But after just one of these is added, the check is not needed again. There is always a safe addition to be done at each step until all the green faces are added.

Boundary information

The implementation of forcing has been made more efficient by calculating some boundary information in advance. This boundary information uses a type Boundary

data Boundary
= Boundary
{ bDedges     :: [(Vertex,Vertex)]
, bvFacesMap  :: Mapping Vertex [TileFace]
, bvLocMap    :: Mapping Vertex (Point V2 Double)
, allFaces    :: [TileFace]
, allVertices :: [Vertex]
, nextVertex  :: Vertex
} deriving (Show)

This records the boundary directed edges (bDedges) plus a mapping of the boundary vertices to their incident faces (bvFacesMap) plus a mapping of the boundary vertices to their positions (bvLocMap). It also keeps track of all the faces and vertices. The boundary information is easily incremented for each face addition without being recalculated from scratch, and a final graph with all the new faces is easily recovered from the boundary information when there are no more updates.

makeBoundary  :: Tgraph -> Boundary
recoverGraph  :: Boundary -> Tgraph

The saving that comes from using boundaries lies in efficient incremental changes to boundary information and, of course, in avoiding the need to consider internal faces. As a further optimisation we keep track of updates in a mapping from boundary directed edges to updates, and supply a list of affected edges after an update so the update calculator (update generator) need only revise these. The boundary and mapping are combined in a force state.

type UpdateMap = Mapping DEdge Update
type UpdateGenerator = Boundary -> [DEdge] -> UpdateMap
data ForceState = ForceState
{ boundaryState:: Boundary
, updateMap:: UpdateMap
}


Forcing then involves using a specific update generator (allUGenerator) and initialising the state, then using the recursive forceAll which keeps doing updates until there are no more, before recovering the final graph.

force:: Tgraph -> Tgraph
force = forceWith allUGenerator

forceWith:: UpdateGenerator -> Tgraph -> Tgraph
forceWith uGen = recoverGraph . boundaryState .
forceAll uGen . initForceState uGen

forceAll :: UpdateGenerator -> ForceState -> ForceState
initForceState :: UpdateGenerator -> Tgraph -> ForceState


In addition to force we can easily define

wholeTiles:: Tgraph -> Tgraph
wholeTiles = forceWith wholeTileUpdates 

which just uses the first forcing rule to make sure every half-tile has a matching other half.

We also have a version of force which counts to a specific number of face additions.

stepForceWith :: UpdateGenerator -> Int -> ForceState -> ForceState


This proved essential in uncovering problems of accumulated innaccuracy in calculating boundary positions (now fixed).

# Some Other Experiments

Below we describe results of some experiments using the tools introduced above. Specifically: emplacements, sub-Tgraphs, incorrect tilings, and composition choices.

# Emplacements

The finite number of rules used in forcing are based on local boundary vertex and edge information only. We may be able to improve on this by considering a composition and forcing at the next level up before decomposing and forcing again. This thus considers slightly broader local information. In fact we can iterate this process to all the higher levels of composition. Some graphs produce an empty graph when composed so we can regard those as maximal compositions. For example composeG fool produces an empty graph.

The idea now is to take an arbitrary graph and apply (composeG . force) repeatedly to find its maximally composed graph, then to force the maximal graph before applying (force . decomposeG) repeatedly back down to the starting level (so the same number of decompositions as compositions).

We call the function emplace, and call the result the emplacement of the starting graph as it shows a region of influence around the starting graph.

With earlier versions of forcing when we had fewer rules, emplace g often extended force g for a Tgraph g. This allowed the identification of some new rules. Since adding the new rules we have not yet found graphs with different results from force and emplace. [Although, the vertex labelling of the result will usually be different].

# Sub-Tgraphs

In figure 18 on the left we have a four times decomposed dart dartD4 followed by two sub-Tgraphs brokenDart and badlyBrokenDart which are constructed by removing faces from dartD4 (but retaining the connectedness condition and the no crossing boundaries condition). These all produce the same forced result (depicted middle row left in figure 15).

However, if we do compositions without forcing first we find badlyBrokenDart fails because it produces a graph with crossing boundaries after 3 compositions. So composeG on its own is not always safe, where safe means guaranteed to produce a valid Tgraph from a valid correct Tgraph.

In other experiments we tried force on Tgraphs with holes and on incomplete boundaries around a potential hole. For example, we have taken the boundary faces of a forced, 5 times decomposed dart, then removed a few more faces to make a gap (which is still a valid Tgraph). This is shown at the top in figure 19. The result of forcing reconstructs the complete original forced graph. The bottom figure shows an intermediate stage after 2200 face additions. The gap cannot be closed off to make a hole as this would create a crossing boundary, but the channel does get filled and eventually closes the gap without creating a hole.

# Incorrect Tilings

When we say a Tgraph g is a correct graph (respectively: incorrect graph), we mean g represents a correct tiling (respectively: incorrect tiling). A simple example of an incorrect graph is a kite with a dart on each side (called a mistake by Penrose) shown on the left of figure 20.

*Main> mistake
Tgraph { vertices = [1,2,4,3,5,6,7,8]
, faces = [RK (1,2,4),LK (1,3,2),RD (3,1,5)
,LD (4,6,1),LD (3,5,7),RD (4,8,6)
]
}

If we try to force (or emplace) this graph it produces an error in construction which is detected by the test for conflicting edge types (a phiEdge sharing with a non-phiEdge).

*Main> force mistake
Tgraph {vertices = *** Exception: doUpdate:(incorrect tiling)
Conflicting new face RK (11,1,6)
with neighbouring faces
[RK (9,1,11),LK (9,5,1),RK (1,2,4),LK (1,3,2),RD (3,1,5),LD (4,6,1),RD (4,8,6)]
in boundary
Boundary ...

In figure 20 on the right, we see that after successfully constructing the two whole kites on the top dart short edges, there is an attempt to add an RK on edge (1,6). The process finds an existing edge (1,11) in the correct direction for one of the new edges so tries to add the erroneous RK (11,1,6) which fails a noConflicts test.

So it is certainly true that incorrect graphs may fail on forcing, but forcing cannot create an incorrect graph from a correct graph.

If we apply decomposeG to mistake it produces another incorrect graph (which is similarly detected if we apply force), but will nevertheless still compose back to mistake if we do not try to force.

Interestingly, though, the incorrectness of a graph is not always preserved by decomposeG. If we start with mistake1 which is mistake with just two of the half darts (and also an incorrect tiling) we still get a similar failure on forcing, but decomposeG mistake1 is no longer incorrect. If we apply composeG to the result or force then composeG the mistake is thrown away to leave just a kite (see figure 21). This is an example where composeG is not a left inverse to either decomposeG or (force . decomposeG).

# Composing with Choices

We know that unknowns indicate possible choices (although some choices may lead to incorrect graphs). As an experiment we introduce

makeChoices :: Tgraph -> [Tgraph]

which produces $2^n$ alternatives for the 2 choices of each of $n$ unknowns (prior to composing). This uses forceLDB which forces an unknown to be a largeDartBase by adding an appropriate joined half dart at the node, and forceLKC which forces an unknown to be a largeKiteCentre by adding a half dart and a whole kite at the node (making up the 3 pieces for a larger half kite).

Figure 22 illustrates the four choices for composing fool this way. The top row has the four choices of makeChoices fool (with the fool shown embeded in red in each case). The bottom row shows the result of applying composeG to each choice.

In this case, all four compositions are correct tilings. The problem is that, in general, some of the choices may lead to incorrect tilings. More specifically, a choice of one unknown can determine what other unknowns have to become with constraints such as

• a and b have to be opposite choices
• a and b have to be the same choice
• a and b cannot both be largeKiteCentres
• a and b cannot both be largeDartBases

This analysis of constraints on unknowns is not trivial. The potential exponential results from choices suggests we should compose and force as much as possible and only consider unknowns of a maximal graph.

For calculating the emplacement of a graph, we first find the forced maximal graph before decomposing. We could also consider using makeChoices at this top step when there are unknowns, i.e a version of emplace which produces these alternative results (emplaceChoices)

The result of emplaceChoices is illustrated for foolD in figure 23. The first force and composition is unique producing the fool level at which point we get 4 alternatives each of which compose further as previously illustrated in figure 22. Each of these are forced, then decomposed and forced, decomposed and forced again back down to the starting level. In figure 23 foolD is overlaid on the 4 alternative results. What they have in common is (as you might expect) emplace foolD which equals force foolD and is the graph shown on the right of figure 11.

# Future Work

I am collaborating with Stephen Huggett who suggested the use of graphs for exploring properties of the tilings. We now have some tools to experiment with but we would also like to complete some formalisation and proofs. For example, we do not know if force g always produces the same result as emplace g. [Update (August 2022): We now have an example where force g strictly includes emplace g].

It would also be good to establish that g is incorrect iff force g fails.

We have other conjectures relating to subgraph ordering of Tgraphs and Galois connections to explore.

# Penrose Kite and Dart Tilings with Haskell Diagrams

Revised version (no longer the full program in this literate Haskell)

Infinite non-periodic tessellations of Roger Penrose’s kite and dart tiles.

As part of a collaboration with Stephen Huggett, working on some mathematical properties of Penrose tilings, I recognised the need for quick renderings of tilings. I thought Haskell diagrams would be helpful here, and that turned out to be an excellent choice. Two dimensional vectors were well-suited to describing tiling operations and these are included as part of the diagrams package.

This literate Haskell uses the Haskell diagrams package to draw tilings with kites and darts. It also implements the main operations of compChoicescompChoices and decompose which are essential for constructing tilings (explained below).

Firstly, these 5 lines are needed in Haskell to use the diagrams package:

{-# LANGUAGE NoMonomorphismRestriction #-}
{-# LANGUAGE FlexibleContexts          #-}
{-# LANGUAGE TypeFamilies              #-}
import Diagrams.Prelude
import Diagrams.Backend.SVG.CmdLine

and we will also import a module for half tiles (explained later)

import HalfTile

These are the kite and dart tiles.

The red line marking here on the right hand copies, is purely to illustrate rules about how tiles can be put together for legal (non-periodic) tilings. Obviously edges can only be put together when they have the same length. If all the tiles are marked with red lines as illustrated on the right, the vertices where tiles meet must all have a red line or none must have a red line at that vertex. This prevents us from forming a simple rombus by placing a kite top at the base of a dart and thus enabling periodic tilings.

All edges are powers of the golden section $\phi$ which we write as phi.

phi::Double
phi = (1.0 + sqrt 5.0) / 2.0

So if the shorter edges are unit length, then the longer edges have length phi. We also have the interesting property of the golden section that $phi^2 = phi + 1$ and so $1/phi = phi-1$, $phi^3 = 2phi +1$ and $1/phi^2 = 2-phi$.

All angles in the figures are multiples of tt which is 36 deg or 1/10 turn. We use ttangle to express such angles (e.g 180 degrees is ttangle 5).

ttangle:: Int -> Angle Double
ttangle n = (fromIntegral (n mod 10))*^tt
where tt = 1/10 @@ turn

## Pieces

In order to implement compChoices and decompose, we need to work with half tiles. We now define these in the separately imported module HalfTile with constructors for Left Dart, Right Dart, Left Kite, Right Kite

data HalfTile rep = LD rep -- defined in HalfTile module
| RD rep
| LK rep
| RK rep

where rep is a type variable allowing for different representations. However, here, we want to use a more specific type which we will call Piece:

type Piece = HalfTile (V2 Double)

where the half tiles have a simple 2D vector representation to provide orientation and scale. The vector represents the join edge of each half tile where halves come together. The origin for a dart is the tip, and the origin for a kite is the acute angle tip (marked in the figure with a red dot).

These are the only 4 pieces we use (oriented along the x axis)

ldart,rdart,lkite,rkite:: Piece
ldart = LD unitX
rdart = RD unitX
lkite = LK (phi*^unitX)
rkite = RK (phi*^unitX)

Perhaps confusingly, we regard left and right of a dart differently from left and right of a kite when viewed from the origin. The diagram shows the left dart before the right dart and the left kite before the right kite. Thus in a complete tile, going clockwise round the origin the right dart comes before the left dart, but the left kite comes before the right kite.

When it comes to drawing pieces, for the simplest case, we just want to show the two tile edges of each piece (and not the join edge). These edges are calculated as a list of 2 new vectors, using the join edge vector v. They are ordered clockwise from the origin of each piece

pieceEdges:: Piece -> [V2 Double]
pieceEdges (LD v) = [v',v ^-^ v'] where v' = phi*^rotate (ttangle 9) v
pieceEdges (RD v) = [v',v ^-^ v'] where v' = phi*^rotate (ttangle 1) v
pieceEdges (RK v) = [v',v ^-^ v'] where v' = rotate (ttangle 9) v
pieceEdges (LK v) = [v',v ^-^ v'] where v' = rotate (ttangle 1) v

Now drawing lines for the 2 outer edges of a piece is simply

drawPiece:: Piece -> Diagram B
drawPiece = strokeLine . fromOffsets . pieceEdges

It is also useful to calculate a list of the 4 tile edges of a completed half-tile piece clockwise from the origin of the tile. (This is useful for colour filling a tile)

tileEdges:: Piece -> [V2 Double]
tileEdges (LD v) = pieceEdges (RD v) ++ map negated (reverse (pieceEdges (LD v)))
tileEdges (RD v) = tileEdges (LD v)
tileEdges (LK v) = pieceEdges (LK v) ++ map negated (reverse (pieceEdges (RK v)))
tileEdges (RK v) = tileEdges (LK v)

To fill whole tiles with colours, darts with dcol and kites with kcol we can use leftFillDK. This uses only the left pieces to identify the whole tile and ignores right pieces so that a tile is not filled twice.

leftFillDK:: Colour Double -> Colour Double -> Piece -> Diagram B
leftFillDK dcol kcol c =
case c of (LD _) -> (strokeLoop $glueLine$ fromOffsets $tileEdges c) # fc dcol (LK _) -> (strokeLoop$ glueLine $fromOffsets$ tileEdges c)
# fc kcol
_      -> mempty

To fill half tiles separately, we can use fillPiece which fills without drawing edges of a half tile.

fillPiece:: Colour Double -> Piece -> Diagram B
fillPiece col piece = drawJPiece piece # fc col # lw none

For an alternative fill operation  we can use fillDK which fills darts and kites with given colours and draws the edges with drawPiece.

fillDK:: Colour Double -> Colour Double -> Piece -> Diagram B
fillDK dcol kcol piece = drawPiece piece <> fillPiece col piece where
col = case piece of (LD _) -> dcol
(RD _) -> dcol
(LK _) -> kcol
(RK _) -> kcol


By making Pieces transformable we can reuse generic transform operations. These 4 lines of code are required to do this

type instance N (HalfTile a) = N a
type instance V (HalfTile a) = V a
instance Transformable a => Transformable (HalfTile a) where
transform t ht = fmap (transform t) ht

So we can also scale a piece  and rotate a piece by an angle. (Positive rotations are in the anticlockwise direction.)

scale:: Double -> Piece -> Piece
rotate :: Angle Double -> Piece -> Piece

## Patches

A patch is a list of located pieces (each with a 2D point)

type Patch = [Located Piece]

To turn a whole patch into a diagram using some function cd for drawing the pieces, we use

decomps pc = vsep 1 $fmap drawPatch$ take 5 $decompositions [pc at origin]  We have made use of the fact that we can create an infinite list of finer and finer decompositions of any patch, using: decompositions:: Patch -> [Patch] decompositions = iterate decompose We could get the n-fold decomposition of a patch as just the nth item in a list of decompositions. For example, here is an infinite list of decomposed versions of sun. suns = decompositions sun The coloured tiling shown at the beginning is simply 6 decompositions of sun displayed using leftFillDK sun6 = suns!!6 filledSun6 = patchWith (leftFillDK red blue) sun6 # lw ultraThin The earlier figure illustrating larger kites and darts emphasised from the smaller ones is also sun6 but this time drawn with experimentFig = patchWith experiment sun6 # lw thin where pieces are drawn with experiment:: Piece -> Diagram B experiment pc = emph pc <> (drawJPiece pc # dashingN [0.002,0.002] 0 # lw ultraThin) where emph pc = case pc of -- emphasise join edge of darts in red (LD v) -> (strokeLine . fromOffsets) [v] # lc red (RD v) -> (strokeLine . fromOffsets) [v] # lc red -- emphasise long edges for kites (LK v) -> (strokeLine . fromOffsets) [rotate (ttangle 1) v] (RK v) -> (strokeLine . fromOffsets) [rotate (ttangle 9) v] ## Compose Choices You might expect composition to be a kind of inverse to decomposition, but it is a bit more complicated than that. With our current representation of pieces, we can only compose single pieces. This amounts to embedding the piece into a larger piece that matches how the larger piece decomposes. There is thus a choice at each composition step as to which of several possibilities we select as the larger half-tile. We represent this choice as a list of alternatives. This list should not be confused with a patch. It only makes sense to select one of the alternatives giving a new single piece. The earlier diagram illustrating how decompositions are calculated also shows the two choices for embedding a right dart into either a right kite or a larger right dart. There will be two symmetric choices for a left dart, and three choices for left and right kites. Once again we work with located pieces to ensure the resulting larger piece contains the original in its original position in a decomposition. compChoices :: Located Piece -> [Located Piece] compChoices lp = case viewLoc lp of (p, RD vd)-> [ RD vd' at (p .+^ v') , RK vk at p ] where v' = (phi+1) *^ vd -- vd*phi^2 vd' = rotate (ttangle 9) (vd ^-^ v') vk = rotate (ttangle 1) v' (p, LD vd)-> [ LD vd' at (p .+^ v') , LK vk at p ] where v' = (phi+1) *^ vd -- vd*phi^2 vd' = rotate (ttangle 1) (vd ^-^ v') vk = rotate (ttangle 9) v' (p, RK vk)-> [ LD vk at p , LK lvk' at (p .+^ lv') , RK rvk' at (p .+^ rv') ] where lv' = phi*^rotate (ttangle 9) vk rv' = phi*^rotate (ttangle 1) vk rvk' = phi*^rotate (ttangle 7) vk lvk' = phi*^rotate (ttangle 3) vk (p, LK vk)-> [ RD vk at p , RK rvk' at (p .+^ rv') , LK lvk' at (p .+^ lv') ] where v0 = rotate (ttangle 1) vk lv' = phi*^rotate (ttangle 9) vk rv' = phi*^rotate (ttangle 1) vk rvk' = phi*^rotate (ttangle 7) vk lvk' = phi*^rotate (ttangle 3) vk As the result is a list of alternatives, we need to select one to make further composition choices. We can express all the alternatives after n steps as compNChoices n where compNChoices :: Int -> Located Piece -> [Located Piece] compNChoices 0 lp = [lp] compNChoices n lp = do lp' <- compChoices lp compNChoices (n-1) lp' This figure illustrates 5 consecutive choices for composing a left dart to produce a left kite. On the left, the finishing piece is shown with the starting piece embedded, and on the right the 5-fold decomposition of the result is shown. fiveCompChoices = hsep 1$ fmap drawPatch [[ld,lk'], multiDecomp 5 [lk']] where
-- two separate patches
ld  = (ldart at origin)
lk  = compChoices ld  !!1
rk  = compChoices lk  !!1
rk' = compChoices rk  !!2
ld' = compChoices rk' !!0
lk' = compChoices ld' !!1

Finally, at the end of this literate haskell program we choose which figure to draw as output.

fig::Diagram B
fig = filledSun6
main = mainWith fig

That’s it. But, What about composing whole patches?, I hear you ask. Unfortunately we need to answer questions like what pieces are adjacent to a piece in a patch and whether there is a corresponding other half for a piece. These cannot be done easily with our simple vector representations. We would need some form of planar graph representation, which is much more involved. That is another story.

Many thanks to Stephen Huggett for his inspirations concerning the tilings. A library version of the above code is available on GitHub

## Further reading on Penrose Tilings

As well as the Wikipedia entry Penrose Tilings I recommend two articles in Scientific American from 2005 by David Austin Penrose Tiles Talk Across Miles and Penrose Tilings Tied up in Ribbons.

There is also a very interesting article by Roger Penrose himself: Penrose R Tilings and quasi-crystals; a non-local growth problem? in Aperiodicity and Order 2, edited by Jarich M, Academic Press, 1989.

More information about the diagrams package can be found from the home page Haskell diagrams

# Multigrid Methods with Repa

## Introduction

We implement the multigrid and full multigrid schemes using the Haskell parallel array library Repa. Whilst this file is a literate Haskell program it omits some preliminaries. The full code can be found on GitHub at multiGrid.

Repa (short for ‘regular parallel arrays’) is a Haskell library for efficiently calculating with arrays functionally. It allows parallel computation to be expressed with a single primitive (computeP). This is possible because we only use immutable arrays so calculations are deterministic without any need to worry about locks/mutual exclusion/race conditions/etc.

The multigrid and full multigrid schemes are used for approximating solutions of partial differential equations but they are strategies rather than solvers. They can be used to drastically improve more basic solvers provided convergence and error analysis are considered for the basic solver.

We only consider linear partial differential equations and the Poisson equation in particular where linear partial differential equations can be summarised in the form

$A u = -f$

Here, $A$ is a linear operator (such as the laplace operator $\nabla^2$ or higher order or lower order partial derivatives or any linear combination of these), $f$ is given and $u$ is the function we want to solve for, and both are defined on a region $\Omega$. The multigrid scheme starts with a fine grid $\Omega^h$ (where $h$ is the grid spacing) on which we want to obtain an approximate solution by finite difference methods. The scheme involves the use of a coarser grid (e.g. $\Omega^{2h}$) and, recursively, a stack of coarser and coarser grids to apply error correction on approximations. The full multigrid scheme also uses coarser grids to improve initial approximations used by multigrid.

## Outline of Coarse Grid Error Correction

Assume we have a basic method to approximate solutions which we call ‘approximately solve’. Then the scheme for coarse grid correction of errors is

1. Take an initial guess $v_0$ on the fine grid ($\Omega^h$) and use it to approximately solve $A u = -f$ to get a new approximation $v_1$.

We want to estimate errors

$e = u - v_1$

We do not have $u$ to calculate them, but the errors satisfy

$A e = A u - A v_1 = - f - A v_1$

The negative of these values $A e$ are called the residuals ($r$) and these we can calculate.

2. Compute residuals $r = A v_1 + f$
3. Move to the coarse grid $\Omega^{2h}$ and (recursively) solve $A e = -r$

(This is a problem of the same form as we started with, but it requires solving with restrictions of $A$ and $r$ for the coarse grid.)

We use zeros as initial guess for errors and this recursion results in an approximation for errors $e^{2h}$,

4. Interpolate coarse errors ($e^{2h}$) into the fine grid to get $e^h$ and get a corrected guess

$v^h = v_1 + e^h$

5. Approximately solve $A u = -f$ again (on the fine grid), but now starting with $v^h$ to get a final approximation.

So a basic requirement of the multigrid scheme is to be able to move values to a coarser grid (restriction) and from a coarser grid to a finer grid (interpolation). We will write functions for these below. First we give a quick summary of Repa and operations we will be using, which experienced Repa users may prefer to skip.

## Background Repa Summary

Repa array types have an extent (shape) and an indication of representation as well as a content type. For example Array U DIM2 Double is the type of a two dimensional array of Doubles. The U indicates a ‘manifest’ representation which means that the contents are fully calculated rather than delayed. A delayed representation is expressed with D rather than U. The only other representation type we use explicitly is (TR PC5) which is the type of a partitioned, array resulting from a stencil mapping operation. Non-manifest arrays are useful for enabling the compiler to combine (fuse) operations to optimise after inlining code. Some operations require the argument array to be manifest, so to make a delayed array manifest we can use

• computeP which employes parallel evaluation on the array elements. The type of computeP requires that it is used within a monad. This is to prevent us accidentally writing nested calls of computeP.

The extent DIM2 abbreviates Z :. Int :. Int (where Z represents a base shape for the empty zero dimensional array). The integers give the sizes in each dimension. We can have higher dimensions (adding more with :.) but will only be using DIM2 here. Values of an extent type are used to express the sizes of an array in each dimensions and also used as indexes into an array where indexing starts at 0.

We only use a few other Repa operations

• fromListUnboxed takes an extent and a list of values to create a new manifest array.

• traverse takes 3 arguments to produce a new delayed array. The first is the (old) array. The second is a function which produces the extent of the new array when given the extent of the old array. The third is a function to calculate any item in the new array [when supplied with an operation (get) to retrieve values from the old array and an index in the new array for the item to calculate].

• szipWith is analgous to zipWith for lists, taking a binary operation to combine two arrays. (There is also a zipWith for arrays, but the szipWith version is tuned for partitioned arrays)

• smap is analgous to map for lists, mapping an operation over an array. (There is also a map for arrays, but the smap version is tuned for partitioned arrays)

• mapStencil2 takes a boundary handling option (we only use BoundClamp here), a stencil, and a two dimensional array. It maps (convolves) the stencil over the array to produce a new (partititioned) array.

There is a convenient way of writing down stencils which are easy to visualize. We will see examples later, but this format requires pragmas

> {-# LANGUAGE TemplateHaskell     #-}
> {-# LANGUAGE QuasiQuotes         #-}

## Interpolation (Prolongation) and Restriction

To interpolate from coarse to fine $\Omega^{2 h} \rightarrow \Omega^h$ we want a (linear) mapping that is full rank. E.g identity plus averaging neighbours.

We can restrict by injection $\Omega^h \rightarrow \Omega^{2 h}$ or take some weighting of neighbours as well. A common requirement is that this is also linear and a constant times the transpose of the interpolation when these are expressed as matrix operations.

## Repa examples of restriction and interpolation in 2D.

We will be using DIM2 grids and work with odd extents so that border points remain on the border when moving between grids. A common method for restriction is to take a weighted average of fine neighbours with each coarse grid point and we can easily express this as a stencil mapping. We calculate one sixteenth of the results from mapping this stencil

> restStencil :: Stencil DIM2 Double
> restStencil =   [stencil2|  1 2 1
>                             2 4 2
>                             1 2 1 |]

I.e we map this stencil then divide by 16 and take the coarse array of items where both the row and column are even. Taking the coarse grid items can be done with an array traversal

> {-# INLINE coarsen #-}
> coarsen :: Array U DIM2 Double -> Array D DIM2 Double
> coarsen !arr
>  = traverse arr   -- i+1 and j+1 to deal with odd extents for arr correctly
>             (\ (e :. i :. j) -> (e :. (i+1) div 2 :. (j+1) div 2))
>             (\ get (e :. i :. j) -> get (e :. 2*i :. 2*j))

Here the second argument for traverse – the function to calculate the new extent from the old – adds one before the division to ensures that odd extents are dealt with appropriately.
The inline pragma and bang pattern (!) on the argument are needed for good optimisation.

Coarsening after mapping the above stencil works well with odd extents but is not so good for even extents. For completeness we allow for even extents as well and in this case it is more appropriate to calculate one quarter of the results from mapping this stencil

> sum4Stencil :: Stencil DIM2 Double
> sum4Stencil = [stencil2|  0 0 0
>                           0 1 1
>                           0 1 1 |] 

We treat mixed even and odd extents this way as well so we can express restriction for all cases as

> {-# INLINE restrict #-}
> restrict :: Array U DIM2 Double -> Array D DIM2 Double
> restrict !arr
>   | odd n && odd m
>     = coarsen
>       $smap (/16) >$ mapStencil2 BoundClamp restStencil arr
>   | otherwise
>     = coarsen
>       $smap (/4) >$ mapStencil2 BoundClamp sum4Stencil arr
>   where _ :. m :. n = extent arr

For interpolation in the case of odd extents we want to distribute coarse to fine values according to the “give-to” stencil

1/4 1/2 1/4
1/2  1  1/2
1/4 1/2 1/4

This means that the fine value at the centre becomes the coarse value at the corresponding position (if you picture the coarse grid spaced out to overlay the fine grid). The fine values around this central value inherit proportions of the coarse value as indicated by the give-to stencil (along with proportions from other neighbouring coarse values).

For the even extent case we simply make four copies of each coarse value using a traverse

> {-# INLINE inject4 #-}
> inject4 :: Source r a => Array r DIM2 a -> Array D DIM2 a
> inject4 !arr
>  = traverse arr -- mod 2s deal with odd extents
>             (\ (e :. i :. j) -> (e :. 2*i - (i mod 2) :. 2*j - (j mod 2)))
>             (\get (e :. i :. j) -> get(e :. i div 2 :. j div 2))

Again, we just use the latter version to cover cases with mixed even and odd extents. There is no primitive for “give-to” stencils but it is easy enough to define what we want with a traverse.

> {-# INLINE interpolate #-}
> interpolate :: Array U DIM2 Double -> Array D DIM2 Double
> interpolate !arr
>   | odd m && odd n
>        = traverse arr
>                   (\ (e :. i :. j) -> (e :. 2*i - (i mod 2) :. 2*j - (j mod 2)))
>                   (\get (e :. i :. j) -> case () of
>                    _ | even i && even j     -> get(e :. i div 2 :. j div 2)
>                      | even i               -> (0.5)*(get(e :. i div 2 :. j div 2)
>                                                       + get(e :. i div 2 :. (j div 2)+1))
>                       -- odd i
>                      | even j               -> (0.5)*(get(e :. i div 2 :. j div 2)
>                                                       + get(e :. (i div 2)+1 :. j div 2))
>                       -- odd i and j
>                      | otherwise            -> (0.25)*(get(e :. i div 2 :. j div 2)
>                                                       + get(e :. i div 2 :. (j div 2)+1)
>                                                       + get(e :. (i div 2)+1 :. j div 2)
>                                                       + get(e :. (i div 2)+1 :. (j div 2)+1))
>                   )
>   | otherwise = inject4 arr
>  where _ :. n :. m = extent arr

The uses of mod 2 in the new size calculation are there to ensure that odd extents remain odd.

### An alternative interpolation calculation

As an aside, we found that the above interpolation for odd extents can be implemented with a stencil convolution using sum4Stencil (which we defined above) after applying inject4 and then dividing by 4.
So we could define (for odd extents)

interpolate' :: Monad m
=> Array U DIM2 Double -> m (Array (TR PC5) DIM2 Double)

interpolate' arr =
do fineArr <- computeP $inject4 arr return$ smap (/4)
$mapStencil2 BoundClamp sum4Stencil fineArr We have to make the intermediate fineArr manifest (using computeP) before mapping the stencil over it which is why a monad is used. The final array is not manifest but a structured array resulting from the stencil map. By not making this manifest, we allow the computation to be combined with other computations improving inlining optimisation opportunities. To illustrate the effect of interpolate', suppose the coarse array content is just a b c d e f g h i Then the injected array content will be a a b b c c a a b b c c d d e e f f d d e e f f g g h h i i g g h h i i After mapping the stencil and dividing by 4 we have (assuming top left is (0,0)) at (1,1) (a+b+d+e)/4 (odd row, odd column) at (2,1) (d+e+d+e)/4 = (d+e)/2 (even row, odd column) at (1,2) (b+b+e+e)/4 = (b+e)/2 (odd row, even column) at (2,2) (e+e+e+e)/4 = e (even row, even column) This even handles the bottom and right boundaries as in the original interpolation. Slightly more generally, after inject4 and for any w x y z then convolution with the stencil 0 0 0 0 w x 0 y z will produce the same as interpolating with the “give-to” stencil  z (z+y) y (z+x) (z+y+x+w) (y+w) x (x+w) w  Conversely after inject4, the give-to stencil has to have this form for a stencil convolution to produce the same results. Since the stencil version does require making an intermediate array manifest it is not clear at face value which is more efficient, so we will stick with interpolate. We note that for even extents, the combination of an interpolation followed by a restriction is an identity (inject4 followed by coarsen). For odd extents, the combination preserves internal values but not values on the boarder. This will not be a problem if boundary conditions of the problem are used to update boarders of the array. ## MultiGrid with Coarse Grid Correction Scheme The problem to solve for $u$ is $A u = -f$ where $A$ is a linear operator. This linear operator could be represented as a matrix or as a matrix multiplication, but this is not the most efficient way of implementing a solver. In the multigrid scheme described above we need knowledge of $A$ to implement an approximate solver, and also to calculate residuals. There are ways to implement these latter two operations with some mathematical analysis of the linear operator using finite differences. We will therefore be using an approximator and a residual calculator directly rather than a direct representation of $A$ as a parameter. Let us assume that the approximate solver is a single step improvement which we want to iterate a few times. We can write down the iterator (iterateSolver) > {-# INLINE iterateSolver #-} > iterateSolver !opA !steps !arrInit > = go steps arrInit > where > go 0 !arr = return arr > go n !arr > = do arr' <- computeP$ opA arr
>              go (n - 1) arr'

In the definition of multiGrid we make use of a function to create an array of zeros of a given shape (extent)

> {-# INLINE zeroArray #-}
> zeroArray  :: Shape sh => sh -> Array U sh Double
> zeroArray !sh = fromListUnboxed sh $replicate (size sh) 0.0 and a function to determine when we have the coarsest grid (the bases case for odd extents will be 3 by 3) > {-# INLINE coarsest #-} > coarsest :: Array U DIM2 Double -> Bool > coarsest !arr = m<4 where (Z :. _ :. m) = extent arr We also use some global parameters to indicate the number of iteration steps (to simplify expressions by passing fewer arguments) > steps1,steps2 :: Int > steps1 = 5 -- number of iterations for first step in multiGrid > steps2 = 5 -- number of iterations for last step in multiGrid To write down a first version of multiGrid, we temporarily ignore boundary conditions. We will also assume that the grid spacing is the same in each dimension and use just one parameter (h) as the grid spacing. {- version ignoring boundary conditions -} multiGrid approxOp residualOp h f uInit = if coarsest uInit then do computeP$ approxOp h f uInit
else
do v  <- iterateSolver (approxOp h f) steps1 uInit
r  <- computeP $residualOp h f v -- calculate fine residuals r' <- computeP$ restrict r                       -- move to coarser grid
err <- multiGrid approxOp residualOp (2*h) r' $zeroArray (extent r') vC <- computeP$ szipWith (+) v
$interpolate err -- correct with errors on fine grid iterateSolver (approxOp h f) steps2 vC -- solve again with improved approximation The parameters are 1. approxOp – an approximate solver to be iterated. 2. residualOp – a residual calculator 3. h – the grid spacing 4. f – a representation for the (negative of) the function on the right hand side of the equation 5. uInit – an array representing a suitable first guess to start from. Note that both approxOp and residualOp need to be passed h and f as well as an array when they are used. Also, the recursive call of multiGrid requres the two function parameters to work at each coarseness of grid. The grid spacing doubles and the array of residuals has to be converted to the coarser grid for the recursive call. Next we consider how we deal with the boundary conditions. We will only deal with the Dirichlet type of boundary conditions here. ## MultiGrid with Dirichlet boundary conditions For $u$ as above, Dirichlet boundary conditions have the form $u = g$ for some function $g$ on the boundary $\partial \Omega$. We will take the standard approach of representing the boundary conditions using a mask array (with 0’s indicating boundary positions and 1’s at non-boundary positions) along with a boundary values array which has 0’s at non-boundary positions. This will enable us to adapt these two arrays for different coarsenesses of grid. We are assuming we are working in just two dimensions so $u=u(x,y)$. We can thus assume two dimensional arrays represent $f$ and the boundary mask and boundary values. We can now define multiGrid with boundary conditions as:  multiGrid approxOp residualOp h f boundMask boundValues uInit = if coarsest uInit then do computeP$ approxOp h f boundMask boundValues uInit
else
do v  <- iterateSolver (approxOp h f boundMask boundValues) steps1 uInit
r  <- computeP $residualOp h f boundMask v -- calculate fine residuals boundMask' <- computeP$ coarsen boundMask       -- move to coarser grid
r' <- computeP $szipWith (*) boundMask'$ restrict r
let zeros = zeroArray (extent r')
err <- multiGrid approxOp residualOp (2*h) r' boundMask' zeros zeros
vC <- computeP $szipWith (+) v$ szipWith (*) boundMask
$interpolate err -- correct with errors on fine grid iterateSolver (approxOp h f boundMask boundValues) steps2 vC  In the base case (when the grid size is 3*3) there is only one value at the centre point which needs to be evaluated (assuming the surrounding 8 points are given by the boundary conditions). This will be solved exactly with a single step of the approximate solver. The recursive call uses zeros for both the boundValues (redundantly) as well the initial guess. We note that the residual calculation makes use of the mask but not the boundary values as these should be zero for residuals. The restriction of residuals for the coarse grid also requires applying the mask adapted for the coarse grid. The interpolation of errors uses the (fine version of the) mask but the boundary values will already be set in v which is added to the errors. ## Full MultiGrid Before looking at specific examples, we can now write down the algorithm for the full multigrid scheme which extends multigrid as follows. Before calling multiGrid we precalculate the initial guess using a coarser grid and interpolate the result. The precalculation on the coarser grid involves the same full multigrid process recursively on coarser and coarser grids. fullMG approxOp residualOp h f boundMask boundValues = if coarsest boundValues then do computeP$ approxOp h f boundMask boundValues boundValues
-- an approximation with  1 interior point will be exact using boundValues as initial
else do
-- recursively precalculate for the coarser level
f' <- computeP $restrict f boundMask' <- computeP$ coarsen boundMask
boundValues' <- computeP $coarsen boundValues v' <- fullMG approxOp residualOp (2*h) f' boundMask' boundValues' -- move to finer level v <- computeP$ szipWith (+) boundValues
$szipWith (*) boundMask$ interpolate v'
-- solve for finer level
multiGrid approxOp residualOp h f boundMask boundValues v

Note that in the base case we need an initial guess for the coarsest initial array. We simply use the boundValues array for this. The interpolation of v requires resetting the boundary values.

## Improved inlining using modules

The previous (higher order) versions of multiGrid and fullMG which take approxOp and residualOp as arguments can by substantially improved by importing the operations in a module and specialising the definitions for the imported functions. Thus we define a module which imports a separate module (defining approxOp and residualOp) and which exports

> multiGrid :: Monad m =>
>              Double
>              -> Array U DIM2 Double
>              -> Array U DIM2 Double
>              -> Array U DIM2 Double
>              -> Array U DIM2 Double
>              -> m (Array U DIM2 Double)

> multiGrid !h !f !boundMask !boundValues !uInit
>  = if coarsest uInit
>    then
>     do computeP $approxOp h f boundMask boundValues uInit > else > do v <- iterateSolver (approxOp h f boundMask boundValues) steps1 uInit > r <- computeP$ residualOp h f boundMask v
>        boundMask' <- computeP $coarsen boundMask > r' <- computeP$ szipWith (*) boundMask' $restrict r > let zeros = zeroArray (extent r') > err <- multiGrid (2*h) r' boundMask' zeros zeros > vC <- computeP$ szipWith (+) v
>                            $szipWith (*) boundMask >$ interpolate err
>        iterateSolver (approxOp h f boundMask boundValues) steps2 vC 

and the main full multigrid operation

> fullMG :: Monad m =>
>           Double
>           -> Array U DIM2 Double
>           -> Array U DIM2 Double
>           -> Array U DIM2 Double
>           -> m (Array U DIM2 Double)

> fullMG !h !f !boundMask !boundValues
>  = if coarsest boundValues
>      then do computeP $approxOp h f boundMask boundValues boundValues > else do > f' <- computeP$ restrict f
>             boundMask' <- computeP $coarsen boundMask > boundValues' <- computeP$ coarsen boundValues
>             v'  <- fullMG (2*h) f' boundMask' boundValues'
>             v   <- computeP $szipWith (*) boundValues >$ szipWith (*) boundMask
>                                $interpolate v' > multiGrid h f boundMask boundValues v These versions perform significantly better when optimised. ## Poisson’s Equation For the two dimensional case, where $u = u(x,y)$, Poisson’s equation has the form $\nabla^2 u = -f(x,y)$ for $(x,y) \in \Omega$ subject to the Dirichlet boundary condition $u(x,y) = g(x,y)$ for $(x,y) \in \partial \Omega$ So the linear operator here is the Laplace operator $\Delta = \nabla^2 = {\partial^2 \over\partial x^2} + {\partial^2 \over\partial y^2}$ We need to make use of finite difference analysis to implement the approximation step and the residual calculation for Poisson’s equation. Finite difference analysis with the central difference operator leads to the following (five point difference) formula approximating Poisson’s equation $u_{i-1,j} + u_{i+1,j} + u_{i,j-1} + u_{i,j+1} - 4u_{i,j} = -h^2 f_{ij}$ Rewriting the above as $u_{ij} = \frac{1}{4} \{ u_{i-1,j} + u_{i+1,j} + u_{i,j-1} + u_{i,j+1} + h^2 f_{ij} \}$ gives us an algorithm (approxOp) for the approximating iteration step. This is known as the the Jacobi method. We map a suitable stencil to add the four neighbours of each item, then we add corresponding elements of the array f multiplied by $h^2$, then we divide by 4 and reset the boundary with the mask and values. > {-# INLINE approxOp #-} > approxOp :: Double > -> Array U DIM2 Double > -> Array U DIM2 Double > -> Array U DIM2 Double > -> Array U DIM2 Double > -> Array (TR PC5) DIM2 Double > approxOp !h !f !boundMask !boundValues !arr > = szipWith (+) boundValues >$ szipWith (*) boundMask
>          $smap (/4) >$ szipWith (+) (R.map (*hSq) f )
>          $mapStencil2 (BoundConst 0) > [stencil2| 0 1 0 > 1 0 1 > 0 1 0 |] arr > where hSq = h*h We have chosen to leave the resulting array in a non-manifest form so that inlining can combine this operation with any subsequent operations to optimise the combination. To calculate residuals from an estimation $v$, we can use the same formula as above to approximate $r = \nabla^2 v + f$. Rearranging we have $r_{ij} = \frac{1}{h^2} \{ v_{i-1,j} + v_{i+1,j} + v_{i,j-1} + v_{i,j+1} - 4 v_{ij} \} + f_{ij}$ This leads us to implement residualOp with a five point stencil that adds four neighbours and subtracts four times the middle item. After this we divide all items by $h^2$ and then add f and reset the boundary. > {-# INLINE residualOp #-} > residualOp :: Double > -> Array U DIM2 Double > -> Array U DIM2 Double > -> Array U DIM2 Double > -> Array (TR PC5) DIM2 Double > residualOp !h !f !boundMask !v > = szipWith (*) boundMask >$ szipWith (+) f
>          $smap (*hFactor) >$ mapStencil2 (BoundConst 0)
>             [stencil2|   0 1  0
>                          1 -4 1
>                          0 1  0 |] v
>          where hFactor = 1/(h*h)

As noted earlier, the boundary is reset simply with the mask. The boundary values should all be zero for residuals, so we can drop any resetting of boundary values as the mask will have done that already.

The above two functions (approxOP and residualOp) are put in a module PoissonOps to be imported by the module MultiGrid_Poisson.

## Setting up to run an example

This example is taken from (Iserles 2009 p.162). On the unit square

$f(x,y) = -\{x^2+y^2\}$

The Dirichlet boundary conditions are

$u(x,0) = 0 \ \ \ \text{and}\ \ \ u(x,1) = \frac{1}{2} x^2 \ \ \ \text{for}\ \ \ 0 \le x \le 1$

and

$u(0,y) = \sin {\pi x} \ \ \ \text{and}\ \ \ u(1,y) = e^{\pi x} \sin {\pi y} + \frac{y^2}{2} \ \ \ \text{for}\ \ \ 0 \le y \le 1$

We define the following to calculate values from the number of grids to be used.

> fineGridSpaces:: Int -> Int
> fineGridSpaces gridStackSize
>   = 2^gridStackSize

> fineGridShape:: Int -> DIM2
> fineGridShape gridStackSize
>   = (Z :. n :. n) where n = 1+fineGridSpaces gridStackSize

We will set a global value for the sides of the area (the unit square) and also set a stack size of 9 grids so the finest will have $2^9+1$ by $2^9+1$ (i.e. 513 by 513) grid points

> distance :: Double
> distance = 1.0        -- length of sides for square area
> gridStackSize :: Int
> gridStackSize = 9     -- number of grids to be used including finest and coarsest

> intervals = fineGridSpaces gridStackSize
> shapeInit = fineGridShape gridStackSize

> hInit :: Double
> hInit = distance / fromIntegral intervals -- initial finest grid spacing

The initial arrays are defined using

> boundMask :: Array U DIM2 Double
> boundMask =
>  fromListUnboxed shapeInit $concat > [edgeRow, > take ((intervals-1)*(intervals+1)) (cycle mainRow), > edgeRow > ] > where edgeRow = replicate (intervals+1) 0.0 > mainRow = 0.0: replicate (intervals-1) 1.0 Prelude.++ [0.0] > coordList :: [Double] > coordList = Prelude.map ((hInit*) . fromIntegral) [0..intervals] > boundValues :: Array U DIM2 Double > boundValues = > fromListUnboxed shapeInit$ concat
>      [Prelude.map (\j -> sin (pi*j)) coordList,
>       concat $Prelude.map mainRow$ tail $init coordList, > Prelude.map (\j -> exp pi * sin (pi*j) + 0.5*j^2) coordList > ] > where mainRow i = replicate intervals 0.0 Prelude.++ [0.5*i^2] > fInit :: Array U DIM2 Double > fInit = -- negative of RHS of Poisson Equation > fromListUnboxed shapeInit$ concat $Prelude.map row coordList > where row i = Prelude.map (item i) coordList > item i j = -(i^2 + j^2) > uInit :: Array U DIM2 Double > uInit = boundValues We are now in a position to calculate > test1 = multiGrid hInit fInit boundMask boundValues uInit and > test2 = fullMG hInit fInit boundMask boundValues as well as comparing with iterations of the basic approximation operation on its own > solverTest :: Monad m => Int -> m(Array U DIM2 Double) > solverTest n = iterateSolver (approxOp hInit fInit boundMask boundValues) n uInit We note that this particular example of Poisson’s equation does have a known exact solution $u(x,y) = e^{\pi x} \sin {\pi y} + \frac{1}{2} (xy)^2$ So we can calculate an array with values for the exact solution and look at the errors. We just look at the maximum error here. > exact :: Array U DIM2 Double > exact = > fromListUnboxed shapeInit$
>  concat $Prelude.map row coordList > where row i = Prelude.map (item i) coordList > item i j = exp (pi*i) * sin (pi*j) + 0.5*i^2*j^2 > maxError :: Monad m => m(Array U DIM2 Double) -> m (Double) > maxError test = > do ans <- test > err :: Array U DIM2 Double > <- computeP >$ R.map abs
>                $R.zipWith (-) ans exact > foldAllS max 0.0 err ## A note on errors The Jacobi method used for the approximation step is known to converge very slowly, but fullMG significantly improves convergence rates. The literature analysing multigrid and full multigrid shows that the errors are reduced rapidly if the approximation method smooths the high frequency errors quickly (error components with a wavelength less than half the grid size). Unfortunately the Jacobi method does not do this. Gauss-Seidel and SOR methods do smooth high frequency errors but their usual implementation relies on in-place updating of arrays which is problematic for parallel evaluation. However, although multiGrid with the Jacobi approxOp does not reduce errors rapidly, fullMG does (with only gridStackSize=7 and step1=step2=3). This shows that the improved initial guess using fullMG outweighs the slow reduction of errors in multiGrid. ## Some results Clearly this code is designed to be run on multi-cores to use the parallelism. However, even using a single processor, we can see the impressive performance of Repa with optimisations on some simple tests of the Poisson example (running on a 2.13 GHz Intel Core 2 Duo). Results for fullMG (steps1 = 5, steps2 = 5) Grid Stack Fine Grid cpu Time(MS) Max Error 6 65*65 22 2.1187627917047536e-3 7 129*129 83 5.321669730857792e-4 8 257*257 276 1.3324309721163274e-4 9 513*513 1417 3.332619984242058e-5 10 1025*1025 6017 8.332744698691386e-6 11 2049*2049 26181 2.0832828990791086e-6 So increasing the grid stack size by 1 roughly quadruples the size of the fine grid, increases runtime by a similar factor, and improves the error by a factor of 4 as well. As a comparison, multiGrid alone (with the Jacobi approxOp) has high errors and increasing grid stack size does not help. Results for multiGrid (steps1 = 5, steps2 = 5) Grid Stack cpu Time(MS) Max Error 6 14 1.116289597967647 7 49 1.1682431240072333 8 213 1.1912130219418806 9 1118 1.2018233367410431 10 4566 1.206879281001907 11 20696 1.209340584664126 To see the problems with the basic Jacobi method on its own we note that after 400 iteration steps (with grid stack size of 6) the max error is 5.587133822631921. Increasing the grid stack size just makes this worse. ### Compiler options As we had observed in previous work Repa Laplace and SOR the large amount of inline optimisation seems to trigger a ghc compiler bug warning. This is resolved by adding the compiler option -fsimpl-tick-factor=1000 (along with -O2). ## Some variations We also tried two variations on opproxOp. The first was to add in an extra relaxation step which weights the old value with the new value at each step using a weighting $\omega$ between 0 and 1. This slightly increases runtime, but does not improve errors for fullMG. It’s purpose is to improve smoothing of errors, but as we have seen although this may be significant for multiGrid it is not for fullMG. The improved starting points are much more significant for the latter. Note that successive over relaxing (SOR) with $\omega$ between 1 and 2 is not suitable for use with the Jacobi method in general as this is likely to diverge. omega :: Double omega = 2/3 {-# INLINE approxOp #-} approxOp :: Double -> Array U DIM2 Double -> Array U DIM2 Double -> Array U DIM2 Double -> Array U DIM2 Double -> Array (TR PC5) DIM2 Double approxOp !h !f !boundMask !boundValues !arr = szipWith weightedSum arr -- extra relaxation step$ szipWith (+) boundValues
$szipWith (*) boundMask$ smap (/4)
$szipWith (+) (R.map (*hSq) f )$ mapStencil2 (BoundConst 0)
[stencil2|   0 1 0
1 0 1
0 1 0 |] arr
where hSq = h*h
weightedSum old new = (1-omega) * old + omega * new

The second variation is based on the ‘modified 9 point scheme’ method discussed in (Iserles 2009 p.162). This has slower runtimes than the 5 point Jacobi scheme as it involves two stencil mappings (one across f as well as u). It had no noticable improvement on error reduction for fullMG.

-- modified 9 point scheme
approxOp ::  Double
-> Array U DIM2 Double
-> Array U DIM2 Double
-> Array U DIM2 Double
-> Array U DIM2 Double
-> Array (TR PC5) DIM2 Double

approxOp !h !f !boundMask !boundValues !arr
=   R.szipWith (+) boundValues
$R.szipWith (*) boundMask$ R.smap (/20)
$R.szipWith (+) (R.map (*hFactor)$ mapStencil2 (BoundConst 0)
[stencil2|   0 1 0
1 8 1
0 1 0 |] f )
$mapStencil2 (BoundConst 0) [stencil2| 1 4 1 4 0 4 1 4 1 |] arr where hFactor = h*h/2 This algorithm was obtained from rearranging the formula $\frac{-10}{3}u_{ij} + \frac{1}{6} \{ {u_{i-1,j-1} + 4 u_{i-1,j} + u_{i-1,j+1} + 4 u_{i,j-1} + 4 u_{i,j+1} + u_{i+1,j-1}+ 4 u_{i+1,j} + u_{i+1,j+1} \} } = \frac{h^2}{12} { \{ 8 f_{i,j} + f_{i-1,j} + f_{i,j-1} + f_{i,j+1} + f_{i+1,j+1} \} }$ ## Parallel Processing The important point to note is that this code is ready to run using parallel processing. Results using multi-core processors will be posted in a separate blog. ## Acknowledgements and References Iserles (2009) gives some analysis of the multigrid method and this book also provided the example used (and the modified 9 point scheme). There are numerous other finite methods books covering multigrid and some devoted to it. Online there are some tutorial slides by Briggs part1 and part2 and some by Stüben amongst many others. A paper by Lippmeier, Keller, and Peyton Jones discussing Repa design and optimisation for stencil convolution in Haskell can be found here. Iserles, Arieh. 2009. A First Course in Numerical Analysis of Differential Equations. Second edition. CUP. # Red-Black Neighbourhood Stencil Diagrams # Red-Black Neighbourhood Stencil Diagrams (for Laplace) In a previous blog Repa Laplace and SOR, I used Repa to implement a Laplace solver using the Red-Black scheme. The explanation of alternating stencils probably needed a diagram, so here it is. This diagram illustrates the shapes of the stencils for adding neighbours of red and black cells. It shows that two different stencils are needed (for odd and even rows) and these are swapped over for red and black. (The diagram was produced with Haskell Diagrams) # Repa Laplace and SOR # Using the Repa Library for Laplace’s Equation and SOR This describes the result of some experiments to enhance an existing elegant (haskell parallel array) laplace solver by using successive over-relaxation (SOR) techniques. The starting point was a laplace solver based on stencil convolution and using the Haskell Repa library (Regular Parallel Arrays) as reported in a paper by Lippmeier, Keller and Peyton-Jones. (There is also a 2011 published paper with the first two authors (Lippmeier and Keller 2011)) ## Background #### Laplace’s Equation and Iterative solvers The Laplace equation $\displaystyle {\partial^2 u\over\partial x^2} + {\partial^2 u\over\partial y^2} = 0$ is usually abbreviated to simply $\nabla^2 u = 0$, where $u(x,y)$ is a scalar potential (such as temperature) over a smooth surface. For numerical solutions we will restrict attention to a rectangular surface with some fixed boundary values for u. (The boundary is not necessarily the border of the rectangle, as we can allow for fixed values at internal regions as well). The problem is made discrete for finite difference methods by imposing a grid of points over the surface and here we will assume this is spaced evenly ($\Delta x$ in the $x$ direction and $\Delta y$ in the $y$ direction). Approximating solutions numerically amounts to approximating a fixedpoint for the grid values satisfying the boundary values and also, for non-boundary points (i,j) satisfying $\displaystyle u_{i,j} = { u_{i-1,j} + u_{i+1,j} + \beta^2 ( u_{i,j-1} + u_{i,j+1}) \over 2(1+\beta^2)}$ where $\beta = {\Delta x \over \Delta y }$ If we also assume $\Delta x = \Delta y$ then this simplifies to $\displaystyle u_{i,j} = { u_{i-1,j} + u_{i+1,j} + u_{i,j-1} + u_{i,j+1} \over 4}$ Iterating to find this fixed poiint from some suitable starting values is called relaxation. The simplest (Jacobi) method involves iterating the calculation of new values $u^{n+1}$ from previous values $u^n$ until the iterations are close enough to converging. $\displaystyle u^{n+1}_{i,j} = { u^n_{i-1,j} + u^n_{i+1,j} + u^n_{i,j-1} + u^n_{i,j+1} \over 4}$ This method, whilst very simple, can be extremely slow to converge. One improvement is given by the Gauss-Seidel method which assumes the next set of values in each iteration are calculated row by row in a scan across the columns allowing some new values to be used earlier during the same iteration step (changing $n$ to $n+1$ for calculated values in the previous column and in the previouus row). $\displaystyle u^{n+1}_{i,j} = { u^{n+1}_{i-1,j} + u^n_{i+1,j} + u^{n+1}_{i,j-1} + u^n_{i,j+1} \over 4}$ Unfortunately, this assumption about the order of evaluation of array elements interferes with opportunities to use parallel computation of the array elements (as we wish to do using Repa array primitives). #### Successive over-relaxation (SOR) Successive over-relaxation is a method used to speed up convergence of the iterations by using a weighted sum of the previous iteration with the new one at each relaxation step. Using $0 < \omega < 2$ as the weight we calculate first $u'$ (substituting $u'$ for $u^{n+1}$ in the above equation), then calculate $\displaystyle u^{n+1}_{i,j} = \omega u'_{i,j} + (1- \omega ) u^n_{i,j}$ Values of $\omega$ less than 1 will slow down the convergence (under-relaxation). A value of $\omega$ between 1 and 2 should speed up convergence. In fact the optimal value for $\omega$ will vary for each iteration, but we will work with constant values only here. It is also important to note that starting with values above 1 will generally cause oscillations (divergence) if just the Jacobi scheme is used, so successive over-relaxation relies on a speed up of the basic iteration such as that provided by the Gauss-Seidel scheme. #### The original stencil version using Repa In the paper a stencil mapping technique is used to get fast performance using the Repa array primitives. The code uses the Repa library version 3 The main relaxation step is expressed as relaxLaplace arr = computeP$ szipWith (+) arrBoundValue   -- boundary setting
$szipWith (*) arrBoundMask -- boundary clearing$ smap (/ 4)
$mapStencil2 (BoundConst 0) [stencil2| 0 1 0 1 0 1 0 1 0 |] arr We will be taking this as a starting point, so we review details of this (working from the bottom up). A stencil is described in a quoted form to express which neighbours are to be added (using zeros and ones) with the element being scanned assumed to be the middle item. Thus this stencil describes adding the north, west, east, and south neighbours for each item. The mapStencil2 is a Repa array primitive which applies the stencil to all elements of the previous iteration array (arr). It is also supplied with a border directive (Boundconst 0) which just says that any stencil item arising from indices outside the array is to be treated as 0. This stencil mapping will result in a non-manifest array, in fact, in a partitioned array split up into border parts and non-border parts to aid later parallel computation. After the stencil map there is a mapping across all the results to divide by 4 using smap (which improves over the basic repa array map for partitioned arrays). These calculations implement the Jacobi scheme. The subsequent two szipWith operations are used to reset the fixed boundary values. (The primitive szipWith again improves over the basic repa array zipWith by catering explicitly for partitioned arrays.) More specifically, szipWith(*) is used with an array (arrBoundMask) which has zero for boundary positions and 1 for other positions – thus setting the boundary items to 0. After this szipWith(+) is used with an array (arrBoundValues) which has the fixed initial values for the boundary items and zero for all other items. Thus the addition reinstates the initial boundary values. These array calculations are all delayed so a final computeP is used to force the parallel evaluations of items to produce a manifest array result. This technique allows a single pass with inlining and fusion optimisations of the code to be possible for the calculation of each element. Use of computeP requires the whole calculation to be part of a monad computation. This is a type constraint for the primitive to exclude the possibility of nested calls of computeP overlapping. A significant advantage of this stencil implementation is that programming directly with array indices is avoided (these are built into the primitives once and for all and are implemented to avoid array bound checks being repeated unnecessarily). Unfortunately, though, this implementation is using the Jacobi scheme for the iteration steps which is known to have very slow convergence. We would like to improve on this by using successive over-relaxation, but as pointed out earlier, this will not work with the Jacobi scheme. Furthermore, the Gauss-Seidel improvement will not help because it is based on retrieving values from an array still being updated. This is not compatible with functional array primitives and stencils and prevents simple exploitation of parallel array calculations. To the rescue – the red-black scheme for calculating iterations. ## Red-black Iterations This scheme is well known and based on the observation that on a red-black checkerboard, for any red square, the north, west, east, and south neighbours are all black and conversely neighbours of black squares are all red. This leads to the simple idea of calculating updates to all the red squares first, then using these updated values to calculate the new black square values. ### Implementation using stencils and repa #### The set up We split the original array into a red array and black array with simple traverse operations. We assume the original array (arr) has an even number of columns so the number of red and black cells are the same. As a convention we will take the (0,0) item to be red so we want, for red array (r) r(i,j) = arr(i, 2*j + (i mod 2)) where arr has i <- 0..n-1, j <- 0..2m-1 r has i <- 0..n-1, j <- 0..m-1 The traverse operation from the Repa library takes as arguments, the original array, a mapping to express the change in shape, and a lookup function to calculate values in the new array (when given a get operation for the original array and a coordinate (i,j) in the new array) > projectRed :: Array U DIM2 Double -> Array D DIM2 Double > projectRed arr = > traverse arr > (\ (e :. i :. j) -> (e :. i :. (j div 2))) > (\get (e :. i :. j) -> get (e :. i :. 2*j + (i mod 2))) Here (and throughout) we have restricted the type to work with two dimensional arrays of Double, although a more general type is possible. Notice also that the argument array is assumed to be manifest (U) and the result is a delayed array (D). Similarly for the black array (b) we want b(i,j) = arr(i, 2*j + ((i+1) mod 2)) with the same extent (shape) as for r, hence > projectBlack :: Array U DIM2 Double -> Array D DIM2 Double > projectBlack arr = > traverse arr > (\ (e :. i :. j) -> (e :. i :. (j div 2))) > (\ get (e :. i :. j) -> get(e :. i :. 2*j + ((i+1) mod 2))) We can also use these same functions to set up boundary mask and value arrays separately for red and black from the starting array (arrInit), initial mask (arrBoundMask), and boundary values (arrBoundValue) do redBoundMask <- computeP$ projectRed arrBoundMask
blackBoundMask  <- computeP $projectBlack arrBoundMask redBoundValue <- computeP$ projectRed arrBoundValue
blackBoundValue <- computeP $projectBlack arrBoundValue redInit <- computeP$ projectRed arrInit
blackInit       <- computeP $projectBlack arrInit This is part of a monad computation with each step using a computeP (parallel array computation) to create manifest versions of each of the arrays we need before beginning the iterations. These calculations are independent, so the sequencing is arbitrary. #### The finish At the end of the iterations we will reverse the split into red and black by combining using the traverse2 operation from the Repa library. We need arr(i,j) = r(i, j div 2) when even(i+j) = b(i, j div 2) otherwise  where arr has i <- 0..n-1 , j <- 0..2m-1 r has i <- 0..n-1 , j <- 0..m-1 b has i <- 0..n-1 , j <- 0..m-1 The traverse2 operation takes two arrays (here – of the same extent), a mapping to express the new extent (when given the two extents of the argument arrays), and a function to calculate values in the new array (when given the respective get operations for the original arrays (here – get1 and get2) and a coordinate (i,j) in the new array). > combineRB r b = > traverse2 r b > (\ (e :. i :. j) _ -> (e :. i :. 2*j)) > (\ get1 get2 (e :. i :. j) -> > (if even(i+j) then get1 else get2) (e :. i :. j div 2) > )  #### Stencils in the iteration step We use stencils as in the original, but when we consider a stencil on black cells which are to be combined as neighbours of the red cell r(i,j) we need the following shape, where b(i,j) corresponds to the middle item  0 1 0 1 1 0 0 1 0 BUT this is only when processing EVEN rows from the black array. For odd rows we need a different shape  0 1 0 0 1 1 0 1 0 That is, we need to apply one of two different stencils depending on whether the row is even or odd. Similarly in processing the red array to combine red neighbours of b(i,j) we need the same shaped stencils but the first shape above is used on odd rows of red cells and the second shape on even rows. We define and name the stencils as leftSt and rightSt > leftSt :: Stencil DIM2 Double > leftSt = [stencil2| 0 1 0 > 1 1 0 > 0 1 0 |] > rightSt :: Stencil DIM2 Double > rightSt = [stencil2| 0 1 0 > 0 1 1 > 0 1 0 |] Critically, we will need an efficient way to apply alternate stencils as we map across an array. At first, it may seem that we might have to rebuild a version of the primitive mapStencil2 to accommodate this, but that will get us into some complex array representation handling which is built into that primitive. On reflection, though, the combination of lazy evaluation and smart inlining/fusion optimisation by the compiler should allow us to simply apply both stencils on all elements and then choose the results we actually want. The laziness should ensure that the unwanted stencil applications are not actually calculated, and the administration of choosing should be simplified by compiler optimisations. To apply our stencils we will use the Repa array primitive mapStencil2 which takes as arguments, a border handling directive, a stencil, and a (manifest) array, to produce a resulting (delayed) array.  mapStencil2 :: Boundary Double -> Stencil DIM2 Double -> Array U DIM2 Double -> Array D DIM2 Double  The choice of stencil will depend on the position (actually the evenness of the row). As we cannot refer to the indices when using the stencil mapping operation we are led to using traverse2 after mapping both stencils to select results from the two arrays produced. Our alternate stencil mapping operation will have a similar type to mapStencil2 except that it expects two stencils rather than one. altMapStencil2 :: Boundary Double -> Stencil DIM2 Double -> Stencil DIM2 Double -> Array U DIM2 Double -> Array D DIM2 Double altMapStencil2 !bd !s1 !s2 !arr = traverse2 (mapStencil2 bd s1 arr) (mapStencil2 bd s2 arr) (\ e _ -> e) (\ get1 get2 (e :. i :. j) -> (if even i then get1 else get2) (e :. i :. j) ) This function needs to be inlined (with a pragma) and the bang annotations on arguments are there to improve optimisation opportunities. #### The iteration step The main relaxLaplace iteration step involves the following (where r and b are the previous red and black arrays) • Calculating a new version of r using stencils over b r1 = smap (/4)$ altMapStencil2 (BoundConst 0) leftSt rightSt b
• Over-relaxation of this taking weighted sums (using r and r1)
r2 = szipWith weightedSum r r1
where weightedSum old new = (1-omega)*old + omega*new
• Reset the boundaries to get the new version of r (r’)
r' = szipWith (+) redBoundValue      -- boundary resetting
$szipWith (*) redBoundMask r2 -- boundary clearing • Do similar steps for calculating b’ but starting with stencils over r’ (not r) and switching the left and right stencils This is combined in the following monad computation (where r and b are the old arrays). The monad is necessary because we want to use computeP to ensure that the final returned arrays are manifest.  do r' <- computeP$ relaxStep r b redBoundValue redBoundMask leftSt rightSt
b' <- computeP
$relaxStep b r' blackBoundValue blackBoundMask rightSt leftSt ... which uses relaxStep !arrOld !arrNbs !boundValue !boundMask !stencil1 !stencil2 = szipWith (+) boundValue$ szipWith (*) boundMask
$szipWith weightedSum arrOld$ smap (/4)
$altMapStencil2 (BoundConst 0) stencil1 stencil2 arrNbs weightedSum !old !new = (1-omega)*old + omega*new The first argument for relaxStep is the old (red or black) array we are calculating an update for, and the second argument is the neighbour array to use stencils on. The old array is needed for the over-relaxation step with weightedSum. It is also worth pointing out that we have the over-relaxation step being done before the two boundary resetting steps, but this could just as easily be done after the boundary resetting as it does not make a difference to the final array produced. #### Laplace with Red-Black and Stencils Finally the main function (solveLaplace) sets up the initial arrays and passes them to the function with the main loop (iterateLaplace) > solveLaplace:: > Monad m > => Int -- ^ Number of iterations to use. > -> Double -- ^ weight for over relaxing (>0.0 and <2.0) > -> Array U DIM2 Double -- ^ Boundary value mask. > -> Array U DIM2 Double -- ^ Boundary values. > -> Array U DIM2 Double -- ^ Initial state. Should have even number of columns > -> m (Array U DIM2 Double) > solveLaplace !steps !omega !arrBoundMask !arrBoundValue !arrInit > do redBoundMask <- computeP$ projectRed arrBoundMask
>    blackBoundMask  <- computeP $projectBlack arrBoundMask > redBoundValue <- computeP$ projectRed arrBoundValue
>    blackBoundValue <- computeP $projectBlack arrBoundValue > redInit <- computeP$ projectRed arrInit
>    blackInit       <- computeP $projectBlack arrInit > iterateLaplace steps omega redInit blackInit > redBoundValue blackBoundValue redBoundMask blackBoundMask where iterateLaplace !steps !omega !redInit !blackInit !redBoundValue !blackBoundValue !redBoundMask !blackBoundMask = go steps redInit blackInit where go 0 !r !b = computeP$ combineRB r b -- return final combined array
go n !r !b
= do r' <- computeP
$relaxStep r b redBoundValue redBoundMask leftSt rightSt b' <- computeP$ relaxStep b r' blackBoundValue blackBoundMask rightSt leftSt
go (n - 1) r' b'

{-# INLINE relaxStep #-}
relaxStep !arrOld !arrNbs !boundValue !boundMask !stencil1 !stencil2
= szipWith (+) boundValue
$szipWith (*) boundMask$ szipWith weightedSum arrOld
$smap (/4)$ altMapStencil2 (BoundConst 0) stencil1 stencil2 arrNbs

{-# INLINE weightedSum #-}
weightedSum !old !new = (1-omega)*old + omega*new

{-# INLINE iterateLaplace #-}

## Performance and a New Version

The number of calculations in an iteration of red-black is comparable to the original stencil implementation (with just the weighting operations addded in). Although there are two arrays to update, they are half the size of the original. We would expect optimised performance to be only fractionally slower than the original. The speedup in progress towards convergence can be dramatic (one iteration per 8 of the original for our test examples). So, in principle, this would be a big improvement. Unfortunately optimisation did not seem to be achieving the same speedups as the original and the code was roughly 12 times slower.

After much experimentation it looked as though the inner traverse2 operation of altMapStencil2 was inhibiting the optimisations (fusions with stencil mapping code and subsequent maps and zips).

A better performance was achieved by separating the stencil mapping from the traverse2 for alternate row selection and delaying the alternate row selection until after all the other operations. The new version drops altMapStencil2 and instead simply uses altRows

> altRows :: forall r1 r2 a . (Source r1 a, Source r2 a)
>         => Array r1 DIM2 a -> Array r2 DIM2 a -> Array D DIM2 a
>
> altRows !arr1 !arr2 =  -- assumes argument arrays with the same shape
>     traverse2 arr1 arr2
>               (\ e _ -> e)
>               (\ get1 get2 e@(_ :. i :. _) ->
>                   if even i then get1 e else get2 e
>               )
>
> {-# INLINE altRows #-}

The function relaxStep in the following revised version of iterateLaplace has altRows done last, with mapStencil2 applied first (using a different stencil in each alternative)

> iterateLaplace ::
>   Monad m
>   => Int
>   -> Double
>   -> Array U DIM2 Double
> 	-> Array U DIM2 Double
> 	-> Array U DIM2 Double
> 	-> Array U DIM2 Double
> 	-> Array U DIM2 Double
> 	-> Array U DIM2 Double
> 	-> m (Array U DIM2 Double)
>
> iterateLaplace !steps !omega !redInit !blackInit
>                !redBoundValue !blackBoundValue !redBoundMask !blackBoundMask
>      = go steps redInit blackInit
>        where
>          go 0 !r !b = computeP $combineRB r b -- return final combined array > go n !r !b > = do r' <- computeP >$ relaxStep r b redBoundValue redBoundMask leftSt rightSt
>                  b' <- computeP
>                        $relaxStep b r' blackBoundValue blackBoundMask rightSt leftSt > go (n - 1) r' b' > > {-# INLINE relaxStep #-} > relaxStep !arrOld !arrNbs !boundValue !boundMask !stencil1 !stencil2 > = altRows (f stencil1) (f stencil2) > where > {-# INLINE f #-} > f s = szipWith (+) boundValue >$ szipWith (*) boundMask
>                           $szipWith weightedSum arrOld >$ smap (/4)
>                           \$ mapStencil2 (BoundConst 0) s arrNbs
>
>          {-# INLINE weightedSum #-}
>          weightedSum !old !new = (1-omega)*old + omega*new
>
> {-# INLINE iterateLaplace #-}

This now seems to be only a little slower to run than the original stencil solution (about 4 times slower so about 3 times faster than the previous version of red-black). Thus, with an approximately 8-fold speed up in convergence, this does give an overall improvement.

In order to compile this version, it was necessary to use not just the ghc -O2 flag, but also -fsimpl-tick-factor=1000. The need for this flag was indicated by a ghc compiler bug message.

All the code above with birdfeet ( >) in this (incomplete) literate haskell document is essentially that in the module RedBlackStencilOpt.hs (which has some extra preliminaries and imports and checking of arguments in functions which were elided here for clarity). This code can be found here along with the module RedBlackStencil.hs which contains the first version. These can both be loaded by the wrapper newWrapper.hs which is just an adaptation of the original wrapper to allow for passing the extra $\omega$ parameter.

# Acknowledgements and References

There are many numerical methods books covering relevant background mathematics, but online, there are also lecture notes. In particular, on Computational Numerical Analysis of Partial Differential Equations by J.M.McDonough and on Numerical Solution of Laplace Equation by G.E.Urroz. T. Kadin has some tutorial notes with a diagram for the red-black scheme (and an implementation using Fortran 90). There is a useful Repa tutorial online and more examples on parallel array fusion are reported in (Lippmeier et al. 2012).

Lippmeier, Ben, and Gabriele Keller. 2011. “Efficient Parallel Stencil Convolution in Haskell.” In Proceedings of the 4th ACM Symposium on Haskell, 59–70. Haskell ’11. New York, NY, USA: ACM. doi:10.1145/2034675.2034684. http://doi.acm.org/10.1145/2034675.2034684.

Lippmeier, Ben, Manuel Chakravarty, Gabriele Keller, and Simon Peyton Jones. 2012. “Guiding Parallel Array Fusion with Indexed Types.” SIGPLAN Not. 47 (12) (September): 25–36. doi:10.1145/2430532.2364511. http://doi.acm.org/10.1145/2430532.2364511.