August 14, 2012

A simple genetic drift simulation

Here's the tl;dr version of this post: I wrote a genetic drift simulation that you can download and play with if you want.

When I was in high school, I had a really great teacher for biology. That's not just me saying that; she won awards at the state level honoring her as a great teacher. I don't remember all the topics that we covered in that class, but I do remember that she took me on a tour of the brand new (at the time) microbiology facility at Iowa State University. Besides being a really shiny building, it was the first time I had ever seen biology being done with computers.

Last year when I started teaching at the community college, one of the biology professors said he wanted a genetic drift simulation. He explained what he was looking for, and I nodded politely, only vaguely following along. (That's physics training, for you.) After I read a bit about genetic drift, I found an example of a very simple genetic drift simulation activity. I thought something similar would be doable in python.

Here's the user interface for the simulation:

 It is extremely basic.  All you do is choose the size of your population and the number of generation that you wish the simulation to run for.

Here is an example of a small population run for 100 generation:

As you can see, this simulation started with the Q (recessive) allele making up 70% of the population.  The P allele is quickly wiped out of the population.

Here is a simulation of a much larger population:

For a large population the simulation will start with a much closer match between the P and Q alleles. It was almost 50/50 in this run. And, after 500 generations neither allele has been eliminated, although the Q allele is starting to have a significantly higher fraction of the population.

Here's a snippet of the code that does the actual simulation part.  The mechanics of the simulation is explained in the comments:


I like this because 1.) It quickly shows the difference between genetic drift for large and small populations, and 2.) the simulations start with random conditions and have random progressions, but over a large set of simulations clear patterns emerge. I could envision a class of students running the simulations multiple times each, then compiling the data together into a class set where more conclusions are drawn.

Hopefully my code is expandable.  I'd like to add features, such as setting the initial populations, selective processes, and possibly multiple traits.

In case you want to try this (PLEASE TRY IT and give me feedback) you'll need: python, matplotlib, numpy, and Qt runtime libs.  Maybe you'll need more, I'm not really sure.  I needed coffee when I was working on it.  YMMV.

No comments: