## Epidemic modelling of COVID-19 in the UK using an SIR model

I used to do a bit of epidemic modelling during my computer science degree and decided to re-map some of the models using the latest UK data for Coronavirus found here.

(Wikipedia is obviously not the most reputable source of data in the world! however it also aligns to data from other sources e.g. worldometers and theguardian)

This blog was inspired from a similar model used to map the epidemic in Germany, found here

The model used is an SIR (Susceptible, Infected, Recovered) compartmental epidemic model based on the following three Ordinary Differential Equations (ODEs):

Where S is the number of Susceptible population, I is the number of Infected, R is the Recovered population, and N is the sum of these three. β is the contact rate (average number of contacts per person per time) and γ is the rate of recovery. We can also determine the reproductive number (R0 = β / γ), R0 represents the expected number of new infections from a single infection.

These differential equations govern the rate of change between the different compartments, and can be used to predict future development of an epidemic. Before we can predict the future of the epidemic, we first need to find the best fit of the parameters of the model (β and γ) given the number of confirmed cases observed so far. (See here for more information on the SIR model)

Firstly if we look at the current confirmed cases in the UK, we can plot a graph of how the epidemic has spread so far:

```# UK
Infected <- c(2,2,3,3,4,8,8,9,13,13,19,23,35,40,51,85,114,160,206,271,321,373,456,590,797,1061,1391,1543,1950,2626,3269)
Day <- 1:(length(Infected))
N <- 66000000 # pupulation of the UK

old <- par(mfrow = c(1, 2))
plot(Day, Infected, type ="b")
plot(Day, Infected, log = "y")
abline(lm(log10(Infected) ~ Day))
title("Confirmed infections COVID-19 in the UK", outer = TRUE, line = -2)
```

As we can see from both graphs, and in particular the right hand side log scale graph, the current spread has been exponential. At the time of writing (19th March 2020) the confirmed number of cases in the UK stands at 3,269. Please also note that this is based on confirmed cases (The number of actual cases is unknown but will be higher).

The next step is to fit the above SIR model to the data we’ve got so far for the UK, to achieve this I’m using the R ‘optim’ function to optimise the parameter values given the current observed data. (‘optim’ is a general-purpose optimisation based on Nelder–Mead, quasi-Newton and conjugate-gradient algorithms, here we use the `L-BFGS-B` method of Byrd et. al. 1995. See here for more details). The ‘ode’ function of deSolve is then used to evaluate the ODEs (in Fig. 1) using the optimised parameters.

```# UK
Infected <- c(2,2,3,3,4,8,8,9,13,13,19,23,35,40,51,85,114,160,206,271,321,373,456,590,797,1061,1391,1543,1950,2626,3269)
Day <- 1:(length(Infected))
N <- 66000000 # pupulation of the UK

old <- par(mfrow = c(1, 2))
plot(Day, Infected, type ="b")
plot(Day, Infected, log = "y")
abline(lm(log10(Infected) ~ Day))
title("Confirmed infections COVID-19 in the UK", outer = TRUE, line = -2)

SIR <- function(time, state, parameters) {
par <- as.list(c(state, parameters))
with(par, {
dS <- -beta/N * I * S
dI <- beta/N * I * S - gamma * I
dR <- gamma * I
list(c(dS, dI, dR))
})
}

library(deSolve)
init <- c(S = N-Infected, I = Infected, R = 0)
names(parameters) <- c("beta", "gamma")
out <- ode(y = init, times = Day, func = SIR, parms = parameters)
fit <- out[ , 3]
sum((Infected - fit)^2)
}

Opt <- optim(c(0.5, 0.5), RSS, method = "L-BFGS-B", lower = c(0, 0), upper = c(1, 1)) # optimize with some sensible conditions
Opt\$message
##  "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

Opt_par <- setNames(Opt\$par, c("beta", "gamma"))
print(Opt_par)
##      beta     gamma
## 0.6746089 0.3253912

t <- 1:90 # time in days
fit <- data.frame(ode(y = init, times = t, func = SIR, parms = Opt_par))
col <- 1:3 # colour
print(fit)
matplot(fit\$time, fit[ , 2:4], type = "l", xlab = "Day", ylab = "Number of subjects", lwd = 2, lty = 1, col = col)
matplot(fit\$time, fit[ , 2:4], type = "l", xlab = "Day", ylab = "Number of subjects", lwd = 2, lty = 1, col = col, log = "y")

points(Day, Infected)
legend("bottomright", c("Susceptibles", "Infecteds", "Recovereds"), lty = 1, lwd = 2, col = col, inset = 0.05)
title("Predicted Cases 2019-nCoV UK (worst case)", outer = TRUE, line = -2)
```

This gives us the optimal parameter values: β = 0.6235803, γ = 0.3764198, R0 = 1.7.

Recall from above the key value R0 (basic reproduction number) is the expected number of new infections from a single infection. Therefore the model predicts that each person will infect nearly 2 others (on average).

The following graph showing the predicted confirmed number of cases in the short term (3-5 days)

And the following data table shows the predicted values for the upcomming short term range:

Of course, the above predictions are based on the epidemic progressing as it currently has been, i.e. it doesn’t account for additional measures put in place to curb the spread of the virus (e.g. School closures, social distancing, lockdowns etc…), and so hopefully the actual numbers will be lower than the above predictions. However if the spread continues as current then the model predicts there will be between 8-9 thousand confirmed cases by Monday (23rd March 2020).

We can extrapolate the date range further to understand the potential impact that the virus could have if it continues to spread as it has been in the longer term – up to 90 days

Please keep in mind that this is a worst case prediction, as you can see we are currently in the very early stages of the epidemic spread (assuming no additional action is taken to prevent the spread) and the total Infected confirmed cases would peak at just over 6 Million (~10% of the UK population) on the 68th day of the outbreak (That would be on Sunday 26th April 2020). (For the full data please see Appendix 1).

Please also note that similar predictions were estimated for China (up to 200 million being Infected – significantly larger due to the larger population), however the actual recorded number of Infected people was significantly less and peaked at a maximum so far of 80,000 where it seems to have become stable – potentially as a result of the thorough social distancing and lockdown applied in China. See here for a more detailed model of the epidemic in China and here for more data on China (where you can see that the epidemic is limiting out at 80k).

A last note would be to highlight these models are simplistic in nature compared to the extremely complex dynamics of an actual epidemic, in particular they don’t take into account measures taken to restrict the spread of the virus.

Further extensions to the model could also consider:

• Using an SEIR model (adding an Exposed compartment for people who are infected but not yet infectious)
• Adding a “Q” layer since a lot of people are being Quarantined or isolated
• considering the “hidden” population that is infected but is denied being tested due to shortage of tests
• Feasibility of a second wave / outbreak of the epidemic later in the year (as seen in previous outbreaks, such as Swine Flu)

Update 27/03/2020

Below is an updated model for the short term (using the latest data over the last week):

Using the latest data over the last week has also shown a reduction in the predicted peak cases which would now peak at just over 2million on the 66th day (That would be on Sunday 24th April 2020). (For the full data please see Appendix 2).

Update 29/03/20

As of Sunday 29th March 2020, the following graph shows a re-plot of the confirmed Infected cases:

As we can see from the log scale graph on the right hand side, the number of cases seems to be falling away from the linear line, showing that the exponential curve may be dropping away slightly – possibly as a result of measures taken to prevent the spread of the virus (such as isolation). Hopefully this will continue to drop away once impacts of isolation become more clear over the coming few weeks (there may be a delayed impact due to the recent changes in UK government advice and the incubation period of the infection).

GitHub Link for the code: https://github.com/tomwilding/covid-19Model-R/blob/master/sir.r

Full Data for models at different dates:

Appendix 1. 20/03/20 full data for the 90 days predicted by the model

Appendix 2. 27/03/20 Below is the full data for the 90 days predicted by the model

This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink.

### 6 Responses to Epidemic modelling of COVID-19 in the UK using an SIR model

1. Antoine Soetewey says:

I really found it interesting so I have included it in my article covering the top R resources on the virus (https://www.statsandr.com/blog/top-r-resources-on-covid-19-coronavirus/).

Let me know if there is any inconsistency or if you prefer that I don’t mention it.

• thomaswilding says:

Thanks Antoine for linking to your blog – such a great idea to collaborate resources together in this way. I’ve updated the UK model with the latest data from this week.

2. andy says:

how do you include the Recovered case in the SIR model fitting? The RSS in your code only consider the cumulative cases, right?

• thomaswilding says:

Hi Andy, sorry for the delayed reply. If I understand your questions correctly, the number of Recovered is determined from the third differential equation dR/dT = γI. This shows the rate of change of recovered is determined by the number of Infected (I) and the transition rate (rate of recovery γ). For more info in the DEs see this wiki page: https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology.
The prediction for Recovered values is calculated in R by passing a function of the 3 ODE equations into the R ‘ode’ solver, which will extrapolate the future Susceptible, Recovered and Infected values.
Thanks

• Farah Allouch says:

I had a similar question. First of all, thank you for supplying this code. I’m trying to apply similar prediction modeling for the outbreak in Lebanon.

From your code, it seems like you’re using the cumulative number of cases instead of the daily cases to predict the outbreak. However, this can’t be right as the cumulative number of cases would not go down because it counts the number of people who were ever infected (recovered + dead). Am I missing something? Shouldn’t we be using daily cases instead of cumulative cases here?

3. Pingback: Modelling COVID-19 in Morocco