February 15, 2012

Big genomes, little genomes, Madonna’s genome?

Genes tend to be surprisingly similar between many other organisms and humans.

Genomes are getting cheaper and cheaper to sequence with every passing day.
In 2010, I spent a day checking out the first SOLiD-3 analysis system installed at the University of Michigan, which was the first in the state of Michigan. Good thing, since I heard there are already three of these systems at the University of Toronto alone, in Ontario, Canada. Why is it a good thing to have this machine in the Cancer Center at U. Mich?

Speed, cost, and keeping up with modern genomics is why. When I was a post-doc at the NIH in 1995, the Human Genome Project was just ramping up into the massive sequencing work, using several centers in the US, UK, and several other countries. High capacity sequencers that could sequence hundreds of DNA samples, 500-1000 bases each, were themselves arrayed in large factory like spaces gradually crunching on the genome. Our haploid human genome is 3 billion base pairs of sequence. Your genome is sort of like a book, a long one, with 23 chapters (your Chromosomes) and about 26,000 stories (your genes) collected into those chapters. Only four characters are used in this book, A, C, G, and T: the DNA sequence. So you can imagine the job, with 500 base reads you need about 6 million DNA sequencing reactions. Each reaction needed a “primer” 18-20 bases long, which had to be produced (about $5 each with large volume discounts I imagine:), so you needed about $30 million dollars just to make the sequencing primers!

A hundred sequencers working for a year could handle the sequencing if you could feed them fast enough with new DNA fragments and if you had all the primers ready to go. However, you had to design the next sequencing primers from recently obtained DNA sequence. Add in some sequencing repeats for accuracy and, in the end the first draft of the human genome was presented to us in 2000/2001. Computational programs that could be fed millions of DNA fragment sequences, and align them using overlapping shared sequences, were improved and made it possible to simply fragment genomic DNA. Celera used that approach and basically replicated the human genome assembly in about one year, starting later than the Human Genome consortium but finishing around the same time. This fragmentation idea was a great idea and set the stage for what was coming next. 

Next generation, NexGen, sequencing was developed rather quickly as we progressed from 2004 onward. These machines basically provide sequencing of hundreds of thousands to millions of DNA fragments at the same time. Simultaneous sequencing, of hundreds of millions of fragments is now the norm, 100 base pairs each, with the length of sequencing reads increasing each month. SOLiD-3, 454, and Illumina systems are installed in more and more facilities and they get us closer and closer to the $1,000 genome. Yes, get your personal genome sequenced, both copies of your Chromosomes, 6-billion base pairs of DNA, in a matter of a few days. Pipe dreams? Not at all, the costs of doing genomes of individuals are dropping now into the 10’s of thousands of dollars and maybe averaging $10,000 just in the reagent costs for your genome now. Someone still has to be payed to process and manage all of this of course, but the world of personal genomic medicine is arriving now. The plane is not at the gate yet, but it has touched down on the runway and it will be taxiing to our gate very soon.

So why did I like this SOLiD-3 system so much? 500 million is the answer. That is, 500-million DNA fragments are sequenced at the same time up to 100 bases each, on a glass slide (same size as a microscope slide), over a run time of 4 -5 days. Going into 2011, this machine had two such slides, to deliver 1-billion DNA fragments or 100-billion bases of DNA sequence per week. Remember, your diploid genome is only 6-billion bases long. Get the idea now? The machine and its computer cluster fit in a metal cart on wheels that takes up less space than a kitchen island and it runs itself automatically for the week. When done, its computer cluster, tucked away in neat rack servers in the lower half of the work cart, hums along for 15-20 hours aligning the fragments to reference genome sequences of Human, or Mouse or Rat and so on. Many genomes have been sequenced since 1995, and you can check them out at the genome browser at UC Santa Cruz (http://genome.ucsc.edu).

The implications of this seem amazing, but they are true. With this kind of NextGen sequencing, if the Human Genome project was started today, it could be completed using one machine in a matter of weeks. One interesting advantage of the SOLiD-3 approach is the use of DNA ligase, not polymerase for sequencing, and a very cool puzzle-like strategy for calling the bases using “color-space” and not “base-space”. The result is that every base call is called twice in the analysis and confirmed to be correct. Sequencing errors are flagged automatically by the method used. So not only do you get genomes sequenced in a week, you get sequence that is guaranteed to be at least 99.98% correct. That is very important, since other technologies do not match that degree of accuracy, and accuracy is essential to researchers looking for small changes between species during evolution, or single base pair differences (SNPs) between all of us humans. Even with 0.01% error, that means you can have 600,000 errors out of your 6-billion bases. Sampling the sequence many times the size of a genome, and getting tens or hundreds of many overlapping reads for any region, helps to confirm the sequence.

So what is about to happen to our brave new world now that we have this technology? For one thing, researchers are doing projects to compare the genomes of people that are closely related, or people from very different genetic backgrounds from all over the world, people with and without diseases that emerge in our aging population, such as AMD and diabetes and Alzheimer’s. Are there small changes, or SNPS, that combine to increase the risk of these diseases, or which can predict side effects to a drug?

Evolutionary biologists have proposed to start a vertebrate genome project. Sequence 10-thousand different vertebrate species to learn how our genomes make us different in the most fundamental ways. Wings versus hands, teeth versus beaks, eggs versus live birth, warm blooded versus cold, talking versus barking. Already we have learned some interesting things about our humanity. For example, the Chimpanzee’s genes seem to code the exact same proteins we have, even in their amino acid sequences. Essentially, we have learned that the same collection of Lego blocks can be used to make a Chimpanzee, and then changing the proportions of the same bricks, makes a Human that looks different and has the ability to talk, read, and write. The differences in our genomes mostly change the regulation of the timing of expression levels of what are otherwise the same set of genes.

So, do you talk because you have a gene that the Chimp does not have, or simply because a single transcription factor changes its concentration, causing changes the expression of several genes, causing us to have neurons that can make connections in different patterns to permit language? The evolution of something we perceive to be “complicated” like language, suddenly becomes totally possible on a rather fast time scale. We already know that a small DNA change can cause a transcription factor to disturb hundreds of genes required by photoreceptor cells in our retinas, and thus cause an inherited retinal disease. That is fact, so these kinds of changes can have dramatic consequences, which could be harmful or helpful.

Get ready to learn a lot more about life on this planet as thousands of additional genomes fall to the NextGen sequencers. Get ready for the $1,000 genome and then the $100 genome. Get ready for NextGen sequencers that will be big as a small microwave and start to appear in college teaching labs, and maybe become a new home hobby to set up in the garage or basement. There are billionaires driving around in million dollar cars today, and for about $150,000 you can now obtain a new sequencer on the block, called the Ion Plus. This system has millions of reaction wells that simultaneously monitor hydrogen ions (H+) generated during the addition of each DNA base in a sequencing reaction. This combination of biochemistry and silicon-chip technology automatically generates the sequencing of over a hundred million DNA fragments, then its attached computer can run programs to assemble a new genome or to align a patients DNA sequence with a reference "normal" genome. Programs with names like "Bowtie" do this alignment very fast, I can run them here on the little MacBook I am using to write this blog with now. The Ion Plus promises to sequence the basic framework of a human genome in a single day, already five times faster that then Solid-3 system I mentioned at the start of this posting. Thus the $1000 genome is arriving now.

Medicine is going to feel its effects very quickly. As sequencing your genome becomes a moderately priced test that can alert your doctor to damaged or disease causing gene mutations you may have. So, consider an Ion Plus instead of the Rolls Royce, and roll it into your kitchen, and have a very cool conversation piece for your next dinner party. Pass around the cheek swipes and offer your guests a full genome sequencing to remember you by. "Madonna, glad you could come. Here, just swipe your cheek for my daughter's grade 8 science project. She is looking for gene sequence variations linked to video stardom."


No comments: