Sunday, September 4, 2011
Sunday Spinelessness - An online fungal foray
One of my goals with these Sunday posts is to do a little to broaden the understanding of just how extraordinarily diverse the biological world is. In almost two years of writing about spineless creatures I've limited myself to animals, which, as amazing as we are, only comprise one twig in the tree of life. You can call that an institutional bias, since I'm from a Zoology Department (there are few that haven't yet been swallowed by Schools of Life Sciences) and maybe it's a necessary step to narrow the field of targets a little bit. But this is my blog, and my series, and today I want to write about fungi.
In fact, fungi are a prefect example of the gap between our everyday experience of the biological world and what's really out there. Most of us only notice fungi when mushrooms start popping up in the autumn or when the fruit we bought, and were definitely going to eat this time, starts turning furry. In fact, there might be something like one and a half million species of fungi on earth; there are deep-sea fungi, forest fungi and freshwater fungi; there are fungi that live on tree roots and others that live on human skin; there are even mind-controlling fungi that hijack the nervous system of certain ant species for their own gain:
Some fungi play important roles in ecosystems, and probably the most important of all are a taxonomically diverse group that livr in or on the roots of plants. These so called 'mycorrhizal' fungi greatly increase their host's ability to take up and process nutrients and water from the soil, while the fungi can take advantage of the plant's ability to create sugars from carbon dioxide and sunlight. Between 80 and 90 per cent of plant species can form relationships with mycorrhizal fungi (Wang and Qui 2006 doi: 10.1007/s00572-005-0033-6) and, as you might imagine, the presence or absence of mycorrhiza can have a big impact on the health of individuals plants, crops and forests.
On Wednesday, I heard David Orlovich (@davidorlovich) speak about mycorrhiza in southern beech (Nothafagus) forests in New Zealand. Beech forests form a major part of New Zealand's natural heritage, and some our most important conservation sites (Westland, Fiordland...) are covered almost exclusively by beech species. Without mycorrhizal fungi there would be no Southern Beech: seedlings raised in sterile soil simply fail to develop. David went as far as to say that we should see beech trees as giant antennae that fungi used to fix carbon from the atmosphere. I'm not sure I'd follow him quite that far, but his talk on using DNA sequences to quantify the number of fungal species associated with local silver beech forests and the the specificity of fungal species was really interesting.
It also got me thinking about a blog post by Rod Page (@rdmpage) in which he shows how we can take advantage of data that is stored in The Big DNA Database (called GenBank) but goes almost unused. Every record in GenBank has certain information attached to the DNA sequence in describes (the source of the sample, the name of the gene, a scientific paper the sequence is attached to) but the information in a given record is not limited to the required fields - researchers can add any pertinent information they want to. Researchers in biodiversity and related fields often wring their hands about the lack of any infrastructure to hold data collected on various species and taxonomic groups, but, as Rod has pointed out in the past, existing databases (and wikipedia) already contain considerably more information than we're taking advantage off.
So, when I decided I needed a break from thinking about snails this Friday night, I set out to see if there was enough information about fungal hosts in GenBank for us to start examining the fungal diversity of New Zealand forests*. You could make a start at that project using the pointy-clicky web-interface but I decided to use Biopython (a library for the Python programming language), because writing code for a project is really the best way to document what you're doing , helps make you research reproducible and allows you to pick up a project where you left it. So, the first step was finding records that corresponded to sequences from mycorrhizal fungi in New Zealand. As far as I can tell you can't search within particular submitter-defined features of a file, so here's how I did the search with Biopython's Entrez.esearch() function.
The 'ids' object collected form that search is a list of unique identifiers for sequence records that matched our search, so let's get all of those records in a sequence file and use SeqIO from Biopython to deal with them
Now the heavy lifting! We need to get the host information from each record which means looping through a bunch of attributes in the Biopython object representing that record. We want to store the data as a "one to many" relationship, since each host species might have multiple fungal species associated with it. There are couple of different ways of doing that, but I used python's very cool "defaultdict" dictionary which can create a list for each host and add new information to that list when it encounters the host.
And this is where everything went wrong. Well not quite, the search term I used found 84 records, but they were all for fungi collected from silver beech (Nothofagus menziesii) trees. So much for comparing diversity between hosts! Still, those 84 records give us a change to estimate the taxonomic diversity of fungi associated with this tree, so lets count up all the unique taxonomic names among these records:
And (dropping python for R and ggplot2) a graph (click for a larger version):
I'm not finished with this little project, I have some ideas to widen the net for fungal species next time I get really sick of snails, but even this little exercise shows some interesting things. First, silver beech have lots of fungi on their roots, and they come from lots fo different groups! It would be fascinating to know if the fungal families represented above were playing different roles in the root-tip, or if each was competing with others. Or how that make-up of the fungal biota attached to a given tree or a given forests effects its health. More importantly, GenBank is potentially a really useful way for researchers to share more than just sequence data. If people working on mycorrhizal fungi decided on a de facto standard for the way they annotated their GenBank submissions then data from hundreds of published (and unpublished) studies could be almost effortlessly combined to create a big picture of the dynamics of these important fungi. Even as it is now, there is a source of data that is almost never used by researchers or people building the various "encyclopedia of life" projects, and it doesn't take too much tinkering to see how it could be put to use.
*Yes, I'm the sort of person who takes a break from science by doing some other science. On a Friday night. What of it?
Labels: biopython, code, fungi, pretty data, python, sci-blogs, sunday spinelessness, taxonomy