The second of six posts covering content from SHPC4001 and SHPC4002
“Just ignore the higher-order terms”
SHPC4001 Tutor
Once the dust of Fortran coding exercises had settled, we were quickly thrust into applications. Numerical computing is a remarkably broad field. SHPC4001 dedicated one lecture to solving first-order IVPs (initial value problems) numerically, and another to root-finding. This post will briefly cover post topics. For the sake of completeness, this post will also include some numerical methods that weren’t covered in the lecture (e.g. RK4). To get the best out of this post, some familiarity with calculus is required. First things first, let’s introduce differential equations.
Differential Equations
Differential equations (i.e equations that involve variables and their rates of change) are everywhere. There are many types of differential equations, but for the purpose of this post we will be focusing on ordinary differential equations (ODEs); these involve functions of a single independent variable. We will denote derivatives with primes, e.g \( y'(x), y^{\prime\prime}(x) \) etc, where (unless specified) the derivative is with respect to the variable in the function. A common example of an ODE is the decay equation \[ N'(t) = -\frac{N(t)}{\tau} \] which has the exact solution \( N(t) = N(0)e^{-t/\tau} \) for some initial value N(0) and decay rate \( \tau \). While it is fairly easy to arrive at this exact solution, many equations in physics are not so straightforward to solve exactly. This is where numerical methods come in; instead of explicitly attempting to find the exact solution, an algorithm is used to estimate the solution by repeatedly computing its approximate value (with the ideal aim of converging to the actual answer).
Euler Method
The simplest numerical method to approximate the solution of an IVP (i.e an ODE such that an initial value is specified) is the Euler method. Consider some derivative \( y'(t) = f(t,y(t)) \). Then we can approximate its numerical solution via \[ y(t + \Delta t) = y(t) + \Delta t f(t,y(t)) \] Alternatively, some sources write the equation in short-hand as \[ y_{n+1} = y_n + h f(t_n,y(t_n)) \] where \( h \equiv t_{n+1} – t_n \) is the difference between the two times, also known as the step size. The Euler method is essentially saying that the difference between the value of y(t) between a start and end time is the same as the width of the time interval multiplied by the derivative of the function at the start time. In practice, the Euler method is merely approximating the curve by extending the tangent line at the point \( (t_n,y(t_n)) \), as illustrated below:
Thus, at each step, the value of \( y_n \) is changed by the gradient of the tangent line (at \( t_n \)) multiplied by the width of the interval. The method is not very accurate, nor is it stable (particularly for “jumpy” or periodic functions), and it generally requires a very tiny step size \( h \) to yield any meaningful results. The particular method discussed above is called the forward Euler method as the method uses the derivative at \( t_n \) to estimate \( y_{n+1} \). Alternatively, if we wanted to estimate \( y_{n+1} \) using the derivative at \( t_{n+1} \), we arrive at the backward Euler method \[y_{n+1} = y_n + h f(t_{n+1},y(t_{n+1}))\] Both essentially do the same thing, and both are first-order (they converge with order 1 with respect to \( h \) ), single-step methods.
Returning to our example decay rate problem, we can arrive at the necessary calculation by performing the relevant substitution. Recall that \( f(t,y(t)) = y'(t) \), hence for our example \[ N'(t) = -\frac{N(t)}{\tau} \] we have \( f(t,y(t)) = -N(t)/\tau \). Thus we get (using \( \Delta t \) instead of \( h \) to emphasise the time step) \begin{align*} N(t+\Delta t) &= N(t)- \Delta t N(t)/\tau \\ &= N(t) \left(1- \Delta t/\tau\right)\end{align*}
Crank-Nicolson / Trapezoidal Method
We can combine the forward and backward Euler methods as follows: \[ y_{n+1} = y_n + \frac{h}{2} \left( f(t_n,y(t_n)) + f(t_{n+1},y(t_{n+1})) \right) \] This is the Crank-Nicolson method (a generalisation of the Trapezoidal Rule) and, as it uses a trapezium to estimate the integral rather than a single rectangle, it turns out to have second-order convergence. It is also important to note that this particular method is implicit (i.e it depends on values that have to be calculated). With more complicated derivatives, a system of equations must be solved.
Adams-Bashforth Method
To avoid having to solve systems of equations at every step, we can use explicit numerical methods (that is, solved using known quantities rather than quantities that have to be calculated). The Euler method takes a single step (from \( t_n \) to \( t_{n+1} \) ). What if we took another step? The Adams-Bashforth method is one of a family of linear multistep methods that use a linear combination of previous points and previous derivatives to estimate the current point (NB: these are different to Runge-Kutta methods, which attempt to estimate values in-between two points). The Euler method is the simplest linear multistep method as it uses one step (it is also a Runge-Kutta method by virtue of the fact that it only estimates a single point). The Adams-Bashforth method uses two steps: \[ y_{n+2} = y_{n+1} + \frac{3}{2}hf(t_{n+1},y(t_{n+1}))- \frac{1}{2}hf(t_n,y_n) \] As most first-order IVPs only give 1 initial value, the Euler method is typically used to obtain the second value needed for the Adams-Bashforth method.
Runge-Kutta Methods
Runge-Kutta methods attempt to estimate values in-between two points; so-called intermediate points. Like the vanilla Euler method, Runge-Kutta methods attempt to determine successive values by calculating gradients, but unlike the vanilla Euler method, Runge-Kutta methods explicitly calculate intermediate points in order to better account for the overall slope of the curve.
Midpoint Method
The forward and backward Euler methods estimate the derivative at the start and end of the interval respectively. But what if the function curves or veers dramatically from the start point to the end point? A better estimation can be made by finding the derivative at the midpoint, in the sense that this best captures the behaviour of the function within the interval. The question is how to estimate this midpoint?
We can find the midpoint using the Euler method, but this time over a half-interval: \[ y\left(t_n + \frac{h}{2}\right) \approx y(t_n) + \frac{h}{2}f(t_n,y(t_n)) \] Now we can use the derivative at the midpoint to obtain \( y_{n+1} \): \begin{align*} y(t_n + h) &\approx y(t_n) + h f(t_n + h / 2, y(t_n + h / 2)) \\ &= y(t_n) + h f(t + h/2, y(t_n) + h / 2 f(t_n,y(t_n)))\end{align*} The algorithm first calculates the value of \( y \) at the midpoint, then attempts to use the derivative at the midpoint to estimate \( y_{n+1} \) from \( y_n \). This particular method is also known as the modified Euler method, as well as Runge’s second order method. The Runge-Kutta methods can be described in a more orderly form by introducing the stage values \( k_i \).
The Forward Euler Method is an example of a one-stage (explicit) Runge-Kutta method: \[ k_1 = f(t_n,y_n) \\ y_{n+1} = y_n + h k_1 \] The midpoint method is a two-stage Runge-Kutta method: \[k_1 = f(t_n,y_n) \\
k_2 = f(t_n + \frac{1}{2}h, y_n + \frac{1}{2}h k_1)
y_{n+1} = y_n + h k_2 \]
RK4
RK4 is, as you may have guessed, a four-stage Runge-Kutta method. It estimates the slope of the solution curve at four locations; once at the start point (using the Euler method), twice at the midpoint, and lastly at the end point.
The equations behind the approximations are quite elegant: \[ k_1 = hf(x_n,y_n) \\
k_2 = hf(x_n + \frac{1}{2}h, y_n + \frac{1}{2}k_1) \\
k_3 = hf(x_n + \frac{1}{2}h, y_n + \frac{1}{2}k_2) \\
k_4 = hf(x_n + h, y_n + k_3) \\
y_{n+1} = y_n + \frac{1}{6}(k_1 + 2k_2 + 2k_3 + k_4) \] Notice how the midpoints are calculated twice; once with \( k_1 \) to get \( k_2 \), and again with \( k_2 \) to get \( k_3 \). RK4 puts more weight on these midpoints, while still considering the slopes at the extremes of the interval.
Higher-order Runge-Kutta methods
In the Runge-Kutta methods we have seen so far, the overall approach appears to be this “chain”-like calculation of \( k \) values, then using a linear combination of them to estimate \( y_{n+1} \). Can we extend such calculations indefinitely? Yes, we can. The pattern is remarkably sleek: \begin{align*}k_1 &= f(t_n,y_n) \\
k_2 &= f(t_n + c_2 h, y_n + h a_{21} k_1) \\
k_3 &= f(t_n + c_3 h, y_n + h (a_{31} k_1 + a_{32} k_2 )) \\
\vdots & \vdots \\
k_s &= f(t_n + c_s h, y_n + h(a_{s1} k_1 + a_{s2} k_2 + \ldots + a_{s,s-1} k_{s-1}))\\
y_{n+1} &= y_0 + h(b_1k_1 + b_2k_2 + \ldots + b_sk_s) \end{align*} where the coefficients \(a_{ij}, b_j \) and \( c_i \) are chosen empirically (e.g \( a_{31} = 0 \) for RK4). To better illustrate the pattern, we can use the compact notation known as a Butcher tableau to cleanly encode the entire set of equations: \begin{array} {c|cccccc}
0\\
c_2 & a_{21}\\
c_3 & a_{31} & a_{32} \\
\vdots & \vdots & & \ddots \\
c_s & a_{s1} & a_{s2} & \ldots & a_{s,s-1}\\
\hline & b_1 & b_2 & \ldots & b_{s-1} & b_s
\end{array} The Forward Euler Method in this new notation is:
\begin{array} {c|c}
0\\
\hline & 1
\end{array} and similarly for the Midpoint Method:
\begin{array} {c|cc}
0\\
\frac{1}{2} & \frac{1}{2} \\
\hline & 0 & 1
\end{array}
Root Finding with the Bisection Method
I will return to this topic in more detail in a later post, but for now let’s explore root finding using the bisection method. A root is any value \( x \) such that \( f(x) = 0 \). The bisection method is one of the simplest methods to find the root of a function. Here we compare two values (or rather their sign) at the ends of an interval. If the signs are opposite then the function must cross the \( x \)-axis somewhere in between. We then repeatedly decrease the size of the interval, checking for opposite signs, until we converge to within the desired tolerance. There are multiple ways of implementing the bisection method; this is a rough summary of the method I chose (in Python-esque pseudocode that probably won’t compile).
def bisection(a,b):
"""returns root located within c"""
while abs(a-b) > tolerance:
c = 0.5*(a+b)
#check if a root exists within [a,c] and adjust bounds accordingly
if f(a)*f(c) > 0:
a=c
else:
b=c
return 0.5*(a+b)
a = start; b = a + step; roots=[]
while b < end:
if f(a)*f(b) < 0:
roots.append(bisection(a,b))
a = b
b += step
Here the program covers the main interval [start,end]
by looking at subintervals of width [step]
. It is important to ensure that the step size is small enough to detect the presence of a root. The core idea behind the bisection method stems from the intermediate value theorem: if \(f(a) < 0 < f(b)\) then \(f(c) = 0\) for some value \(c \in (a,b) \) (this is one way of saying Bolzano’s theorem). In other words, if \(f(a)\) and \(f(b)\) have different signs (thus \(f(a) \times f(b) \) is negative), then there must exist some root in the interval \( (a,b) \).
The bisection method is not perfect. By its very definition, it fails if the interval [a,b] contains 2 roots (such as attempting to find the roots of \( y = x^2 \) in the interval [-1,1] ). More generally, the method fails if the interval contains an even number of roots, and returns a single root if the interval contains an odd number of roots. This is by virtue of the intermediate value theorem.
Summary
Numerical methods are often used to solve for differential equations that cannot be, or are difficult to solve analytically. Linear multistep methods and Runge-Kutta methods can be used to solve differential equations; the former does this by estimating the slope at multiple iterations, while the latter attempts to chain a series of intermediate points within some interval. The Euler method is the simplest method, but far too inaccurate for most uses. Stronger methods such as RK4 provide more robust estimates as they estimate the slope at multiple points. The bisection method can be used to find roots by analysing the signs of the endpoints of an interval; however, if the interval contains 2 roots, the algorithm fails, thus necessitating the use of small intervals.