### So you want to be a model

[NB: corrected the date]

The results from yet another computer model of pandemic influenza spread have just been published in the journal Nature. They are being variously interpreted and reported by news sources. CBC News says they show there is no magic bullet to control flu, but HealthDay says fast treatment and isolation of the sick and their households is "key to effective control of any flu outbreak." On the glass-is-half-empty side we have an AP story:

It is a truism among modelers that all models are wrong, but some models are useful. The art of mathematical modeling consists in stripping down to its bare logical skeleton the part of the real world you are interested in. If you have included enough of what is important and discarded enough of what isn't you may be able to see things that are useful when you turn the mathematical crank. (I'm simplifying, but then that's what modelers do. Call this meta-modeling.)

One of the oldest and simplest mathematical models of infectious disease dynamics is called an SIR model: S stands for susceptibles, I stands for infectives, R stands for recovered or removed (possibly by death or immunity). The game consists in modeling what happens when you introduce a certain number of infectives into a susceptible population. If you assume that every susceptible has the same chance of coming in contact with an infective that a certain fixed proportion of those contacts result in new infections in each time interval, that the disease lasts a certain length of time with a fixed proportion of those infected recovering and another fixed proportion dying, then you can make some statements about what will happen given various starting points of initial infectives and susceptibles, and contact, transmission, recovery and mortality rates.

I teach students to do this using the techniques of ordinary differential equations but it can also be done by simulating the whole thing with a computer. You start out with the infectives. On day one they come in contact with the designated number of infectives, a proportion of those cause disease, etc. That gives you the number of new infectives and susceptibles at the end of day one. The computer moves on to day two. And so on.

This is a pretty simple model and some of the assumptions are highly questionable. Mixing isn't random in the population, not everyone is alike in his or her susceptibility or ability to transmit, infection and mortality rates differ with age, etc. With some ingenuity and a couple of supercomputers working in parallel you can incorporate a lot of complexity and do it for very large populations, whole nation-sized populations even.

A lot of the hard work consists in trying to figure out the parameters of the model. The parameters are the numbers like contact rates in different age groups and subpopulations, transmission rates, etc. Inevitably important elements aren't included. In the recent Nature paper, no account was taken of any resistance developing to the drug Tamiflu. You hope that the omitted factors are only important on the margin, not centrally. Anyway, you'll get a grant to stick them in the model next go around.

The Los Alamos modelers and the modelers who just published in Nature know what they are doing. They are among the best infectious disease modelers we have and these papers as well as those from some other groups are tours de force. But of course they are wrong. All models are wrong. The question is whether they are useful. This question needs to be made a bit sharper: useful for what?

If you know a lot about the mechanism and are sure of the parameters, mathematical models can be highly accurate and predictive. We use them to put spacecraft into orbit or figure out the efficiency of engines. That's because we have Newton's mechanics and thermodynamics to help us. We aren't so lucky in modeling infectious disease dynamics. But even if there is a lot of uncertainty about mechanisms and parameters the models can still be useful for some things.

We might be able to get some qualitative idea of disease dynamics, say, without being able to make exact quantitative predictions. For example, does the disease spread, involving more and more of the population with time until all are infected? Or does the number of diseased bounce around, going up and down, perhaps chaotically or regularly? Or does the disease sputter and peter out? Often this kind of qualitative description is extremely useful because the same general behavior occurs over large ranges of parameter values, for example, uncontrolled spread occurs over large ranges of transmission rates, contact rates and initial numbers of infectives. This means you don't have to get all these factors exactly right, but it might also mean that there are some things, like rate or pattern of spread you can't predict. As you make your models more and more "realistic" (meaning you think you have included more and more important things that will allow you to come up with more specified predictions), your model can also become more sensitive to the actual choices of parameters. You may test how sensitive your model is to the guesses so that you can get an idea of how far off it might be.

Back to the influenza models. You will find that they differ in how parameters are estimated, which elements of the real world are included (and how finely), how transmission and contact rate are modeled -- and in lots of other ways as well. All the choices are quite defensible and you hope the ones that were made don't screw things up by leaving out something important or including something unimportant that obscures what you want to see. Modeling is a process of successive approximations. You hope you are getting closer and closer to the real world. Unfortunately, most of the time we can't test them. There is a heavy dose of faith needed (this is not a religious statement!).

So here's where I am with the models. I think they are valuable for suggesting in broad outline general behaviors, such as closing borders doesn't seem to affect spread and peak case loads much. That's not conclusive that closing borders won't work, but it lends weight to what many experts believe on the basis of inference from experience and there is always something about saying it came from a computer that is persuasive to policy makers.

But we still have to use these models carefully. Take the quote from the AP article. Use of antivirals and isolation might reduce peak rates from 33% to 28% according to the model. What this really says is that even unrealistically widespread use of antiviral therapy and isolation doesn't do much. The "not much" of 5% suggested by the model shouldn't be taken too seriously because relatively small alteration in model assumptions could change the numbers somewhat even though the "not much" would still be true. On the other hand, there are cases where abrupt changes in qualitative outcome (for example uncontrolled spread versus containment) are very close to practically attainable parameter values (such as 50% effective antiviral use). We shouldn't believe that 50% works but 45% doesn't, even if the model says so. The models are being made to bear more weight than they can handle in that case.

To sum up: All models are wrong, but some models are useful. And many models are used wrongly. They are valuable if used wisely and understood properly but can be misleading if used blindly.

Not that that would ever happen.

The results from yet another computer model of pandemic influenza spread have just been published in the journal Nature. They are being variously interpreted and reported by news sources. CBC News says they show there is no magic bullet to control flu, but HealthDay says fast treatment and isolation of the sick and their households is "key to effective control of any flu outbreak." On the glass-is-half-empty side we have an AP story:

If pandemic influenza hits in the next year or so, the few weapons the United States has to keep it from spreading will do little, a new computer model shows.What are we to make of these results, especially compared to a slightly different set announced two weeks ago from another group at Los Alamos National Laboratory? Let's talk about models.

A pandemic flu is likely to strike one in three people if nothing is done, according to the results of computer simulation published in Thursday's journal Nature. If the government acts fast enough and has enough antiviral medicine to use as preventive dosings — which the United States does not — that could drop to about 28 percent of the population getting sick, the study found. (Seth Bornstein, AP)

It is a truism among modelers that all models are wrong, but some models are useful. The art of mathematical modeling consists in stripping down to its bare logical skeleton the part of the real world you are interested in. If you have included enough of what is important and discarded enough of what isn't you may be able to see things that are useful when you turn the mathematical crank. (I'm simplifying, but then that's what modelers do. Call this meta-modeling.)

One of the oldest and simplest mathematical models of infectious disease dynamics is called an SIR model: S stands for susceptibles, I stands for infectives, R stands for recovered or removed (possibly by death or immunity). The game consists in modeling what happens when you introduce a certain number of infectives into a susceptible population. If you assume that every susceptible has the same chance of coming in contact with an infective that a certain fixed proportion of those contacts result in new infections in each time interval, that the disease lasts a certain length of time with a fixed proportion of those infected recovering and another fixed proportion dying, then you can make some statements about what will happen given various starting points of initial infectives and susceptibles, and contact, transmission, recovery and mortality rates.

I teach students to do this using the techniques of ordinary differential equations but it can also be done by simulating the whole thing with a computer. You start out with the infectives. On day one they come in contact with the designated number of infectives, a proportion of those cause disease, etc. That gives you the number of new infectives and susceptibles at the end of day one. The computer moves on to day two. And so on.

This is a pretty simple model and some of the assumptions are highly questionable. Mixing isn't random in the population, not everyone is alike in his or her susceptibility or ability to transmit, infection and mortality rates differ with age, etc. With some ingenuity and a couple of supercomputers working in parallel you can incorporate a lot of complexity and do it for very large populations, whole nation-sized populations even.

A lot of the hard work consists in trying to figure out the parameters of the model. The parameters are the numbers like contact rates in different age groups and subpopulations, transmission rates, etc. Inevitably important elements aren't included. In the recent Nature paper, no account was taken of any resistance developing to the drug Tamiflu. You hope that the omitted factors are only important on the margin, not centrally. Anyway, you'll get a grant to stick them in the model next go around.

The Los Alamos modelers and the modelers who just published in Nature know what they are doing. They are among the best infectious disease modelers we have and these papers as well as those from some other groups are tours de force. But of course they are wrong. All models are wrong. The question is whether they are useful. This question needs to be made a bit sharper: useful for what?

If you know a lot about the mechanism and are sure of the parameters, mathematical models can be highly accurate and predictive. We use them to put spacecraft into orbit or figure out the efficiency of engines. That's because we have Newton's mechanics and thermodynamics to help us. We aren't so lucky in modeling infectious disease dynamics. But even if there is a lot of uncertainty about mechanisms and parameters the models can still be useful for some things.

We might be able to get some qualitative idea of disease dynamics, say, without being able to make exact quantitative predictions. For example, does the disease spread, involving more and more of the population with time until all are infected? Or does the number of diseased bounce around, going up and down, perhaps chaotically or regularly? Or does the disease sputter and peter out? Often this kind of qualitative description is extremely useful because the same general behavior occurs over large ranges of parameter values, for example, uncontrolled spread occurs over large ranges of transmission rates, contact rates and initial numbers of infectives. This means you don't have to get all these factors exactly right, but it might also mean that there are some things, like rate or pattern of spread you can't predict. As you make your models more and more "realistic" (meaning you think you have included more and more important things that will allow you to come up with more specified predictions), your model can also become more sensitive to the actual choices of parameters. You may test how sensitive your model is to the guesses so that you can get an idea of how far off it might be.

Back to the influenza models. You will find that they differ in how parameters are estimated, which elements of the real world are included (and how finely), how transmission and contact rate are modeled -- and in lots of other ways as well. All the choices are quite defensible and you hope the ones that were made don't screw things up by leaving out something important or including something unimportant that obscures what you want to see. Modeling is a process of successive approximations. You hope you are getting closer and closer to the real world. Unfortunately, most of the time we can't test them. There is a heavy dose of faith needed (this is not a religious statement!).

So here's where I am with the models. I think they are valuable for suggesting in broad outline general behaviors, such as closing borders doesn't seem to affect spread and peak case loads much. That's not conclusive that closing borders won't work, but it lends weight to what many experts believe on the basis of inference from experience and there is always something about saying it came from a computer that is persuasive to policy makers.

But we still have to use these models carefully. Take the quote from the AP article. Use of antivirals and isolation might reduce peak rates from 33% to 28% according to the model. What this really says is that even unrealistically widespread use of antiviral therapy and isolation doesn't do much. The "not much" of 5% suggested by the model shouldn't be taken too seriously because relatively small alteration in model assumptions could change the numbers somewhat even though the "not much" would still be true. On the other hand, there are cases where abrupt changes in qualitative outcome (for example uncontrolled spread versus containment) are very close to practically attainable parameter values (such as 50% effective antiviral use). We shouldn't believe that 50% works but 45% doesn't, even if the model says so. The models are being made to bear more weight than they can handle in that case.

To sum up: All models are wrong, but some models are useful. And many models are used wrongly. They are valuable if used wisely and understood properly but can be misleading if used blindly.

Not that that would ever happen.

<< Home