Chapter 11: Next Generation and nanopore Sequencing

Introduction

The availability of new reagents, tools, and biocomputing power has made it possible to greatly increase the amount of sequence data that can be generated in a relatively short time.

Next Generation sequencing is a new sequencing approach that does just that. Or it is relatively new – it emerged in 2005, which seems more recent to me than it does to you, probably.  In this section we’ll cover the advantages of Next Gen sequencing, when it is appropriate to use (and when it is not) and how it works. We will also briefly touch upon Nanopore sequencing, a newer approach that is gaining traction in research as it addresses one of the disadvantages of Next Gen sequencing: the short runs.

 


Learning Outcomes

  • Explain how DNA is prepared for Next Gen Sequencing
  • Explain how the linkers are added in the desired arrangement
  • Describe cluster formation through the bridge amplification
  • Explain how the sequencing stage works via the Illumina method outlined here (be able to say what is happening to the DNA during the process)
  • Identify the  main advantages of the Next Gen methods
  • Describe how nanopore sequencing works
  • Identify the main advantage of nanopore technology over Illumina

A. The limitations of chain termination sequencing for some types of research:

If you want to sequence single genes from several individuals, or even a few genes,  chain termination sequencing will be the best option – you don’t want to use large scale and expensive approaches to sequence a couple of genes. But when you are working on entire genomes, possibly of multiple organisms, the time and expense is too great with the traditional sequencing approach.

If we want to sequence a lot of genomes in a reasonable amount of time and without too much expense, we need to do massive parallel processing – a lot of sequences at the same time.  Entire genomes, that took years to sequence via the Sanger method, can be done in hours with Next Generation (Next Gen) technology.

 

Next Gen sequencing relies on new methodology, equipment and reagents but especially the computing power- the ability to deal with very large data sets. A person could not sort through millions of short sequence reads and generate an assembly even in years, while the computer can do it in a few hours.

Click here to download the powerpoint slides presented in the video below.


B. How Next Generation Sequencing works:

Next Generation sequencing can be used for many different types of situations in which you have a large data set. The example I often use is the sequencing of DNA from a soil sample to identify all the DNA sequence from the myriad of microorganisms in the soil. This is called metagenomics.  But it can be used to sequence all the cDNAs produced from a certain tissue or stage of an organism, or from different tissues,  or for many other situations where a lot of sequence is being generated for analysis.

This method can be divided into three stages. The first is preparing the sequences and attaching adaptors to each piece of DNA. The second is the generation of clusters in a flow cell (this may be the most difficult part to visualize) and the final step is the actual sequencing reactions which are based on Sanger sequencing but with some cool modifications.  The first and second steps are fairly standard but the actual sequencing part can be done through a variety of methods.  I am only currently familiar with the Illumina™ method and so we will only cover that one this time. But be aware that there are other methods because if you should end up working in the field, you may need to familiarize yourself with alternative approaches. I have provided a link below to a reference that I think explains the various approaches quite well, and with images that are quite helpful.

 

https://www.atdbio.com/content/58/Next-generation-sequencing

 


B-1. Preparing the sequences

To begin, we isolate the DNA we are going to be sequencing. It needs to be broken into quite small pieces. The limitation of Illumina sequencing is that reads of about 100 bases are usual.  So the reads are very short but there are millions of them with multiple overlaps, so that entire sequences will eventually be generated. The DNA pieces are usually sonicated (broken with sound waves) because you can adjust the frequency of sonication and the time you do it in order to generate pieces of the desired average size. And it is not a sequence-specific method as restriction digest is, so we can generate random pieces with a lot of overlap. As described in shotgun sequencing in Chapter 9, the overlap is essential and the reason why it is needed is exactly the same.

We break the DNA into pieces of about 100-200 base pairs. We can run the DNA out on a gel and gel purify the DNA that is in the correct size range. Gel purification is described in Chapter 5 in the Touch Down PCR section. Essentially we cut out a chunk of agarose that contains the desired DNA fragments and then use different methods to specifically extract the DNA from the rest of the material. Generally we buy a kit that contains agarase to dissolve the agarose and then we purify the DNA over a column. All components of the procedure come in the kit. It is very convenient compared to the older methods.

We want the DNA pieces to be blunt ended but when broken randomly that isn’t going to be the case for a lot of molecules. For 3′ overhangs, a DNA polymerase with 3′-5′ exonuclease activity can be used to chew back the 3′ end.  Klenow or T4 DNA polymerase can be used for that.  For a 5′ overhang, these polymerases can also “fill in” the missing nucleotides with 5′-3′ DNA polymerase activity.  There are other nucleases that can be used also. Mung bean nuclease can be used to chew back the 5′ overhangs.  Making the DNA blunt is often called “polishing” the DNA.  Once we have the blunt DNA pieces we want to attach adaptors to them.  Adaptors are double stranded pieces of DNA that can be added to DNA sequences by ligation. They are used for different purposes, such as adding needed restriction sites to the ends of your PCR product. They can also have specific overhangs – in which case one end of the adaptor is blunt ended (that end is ligated to your DNA) and the other has sticky ends.  So adaptors can be made according to our specifications although for Illumina and other types of Next Gen sequencing these have already been predetermined and are available for us to use.  When we get to the section on making the clusters you will see why we have to use adaptors with a specified sequence. The aim is to ensure that all of the different DNA pieces have the same sequences at their ends.  The adaptors are blunt ended at one end and have a stretch of non complementary sequence at the other.

We use T-A base pairing to attach the adaptors to the sequence. If we incubate our blunt ended molecules of DNA with some Taq enzyme it will add an A nucleotide to the 3′ end of each strand (review T-A cloning from Chapter 6 if you need to). The two ends of the adaptor are different: 1) the blunt end has the universal priming site and the top and bottom strands base pair with each other 2) the Y-end has two strands that don’t base pair with each other. The adaptors are manufactured with one T nucleotide added to the strand with its 3′ end exposed on the blunt side. This allows the “blunt” side to base pair with the single 3′ A nucleotide sticking out on either end of the DNA molecule, thereby ligating the Y-adaptor to both sides of DNA molecule.

Finally we do PCR for a few cycles to make the pieces complementary throughout. Only one primer is needed for this as will be explained in the lecture. When we have the pieces prepared this way, we are ready to put the DNA pieces into a flow cell and generate “clusters”

This is a simple overview of the process of preparing the DNA for sequencing:

 

Click here for the powerpoint slides presented in the video below.

Note: there is a mention in the recorded lecture of gel purification method being included in Chapter 4’s restriction enzyme section – however, gel purification was in fact briefly talked about in Chapter 5 instead, under Touch Down PCR.


B-2. Generating the clusters

Flow cells look a bit like a microscope slide but they are a bit thicker and there are 8 longitudinal chambers (like tunnels) in these.  Into these chambers, the DNA to be sequenced will be introduced in the appropriate buffer.  On the lower surface of each chamber are many oligonucleotides that have been fixed to the surface. These are complementary to the sequences at one or the other end of the DNA molecules. The DNA is denatured before we add it to the chambers so that the DNA sequences at the end of each molecule will base pair with the complementary oligonucleotide that is fixed to the bottom. There are two types of these, to match each of the two strands of the sequence.  When we add the DNA we have it fairly dilute so that when pieces of DNA bind to the oligo, they are somewhat spaced out. We need a lot of space between the original DNAs so that the clusters we generate won’t overlap with each other. The clusters will have hundreds of copies of the sequence, forward and reverse strands, attached to them.

In the first reaction, the oligonucleotides fixed to the surface of the flow cell acts as a primer. DNA polymerase is introduced, and dNTPs and the appropriate conditions for DNA synthesis are set up. A DNA strand complementary to the attached DNA is made.  The temperature is increased and the cell is washed with a buffer and the bound DNA is released. We are left with a forest of oligos of two different types, and occasionally one of them has been extended into a DNA fragment.

Next comes the cluster formation. We want to make clusters of many copies of identical sequences.  The piece of DNA attached to the oligo has on the far end of it, sequence that is complementary to the other oligo. This is the key.

Click here for the powerpoint slides presented in the video below.

 

The widely spaced DNA molecules in the chamber are surrounded by two types of oligos very thickly distributed on the chamber floor.  Imagine one such DNA molecule that is attached to the ‘red’ oligo from the video.  At the other end of this molecule is DNA that is complementary to the ‘green’ oligo. And the molecule is surrounded by green oligos. So the end of the DNA can base pair with a nearby complementary oligonucleotide. If we then provide Taq polymerase and nucleotides and the appropriate buffer, in a cycle like PCR the green oligo acts like a primer to make a complementary strand to the DNA molecule. When that cycle is finished, the temperature is increased and the molecules separate. Now there is a red oligo with a DNA strand extended from it and nearby is a green oligo with the complementary strand attached to it. We can repeat the cycles multiple times to generate a cluster of DNA sequences in which half are one strand (attached to the red oligos) and the other half are the reverse complement of that strand (attached to the green oligos).

In some procedures an oligo-specific cleavage removes all of one type of strand in each cluster – it could be all the red ones or all the green ones.  However, you can sequence the “green” sequences in all the clusters and then in a second set of reactions you can sequence all the “red” sequences. This allows you to collect the forward and reverse sequence for each cluster. You can do this because the primer used for the sequencing of the “green” sequences is different from the one used for the “red” sequences.

Now that we have the clusters, we can proceed with the actual sequencing. Except for the scale and some details about the fluorescent terminators the procedure is conceptually similar to Sanger sequencing.

Click here for the powerpoint slides presented in the video below.


B-3. The actual sequencing part

The sequencing of the short DNA segments makes use of terminators just like Sanger sequencing but in this method every nucleotide incorporated is a terminator! And it is fluorescently labeled as well. A primer specific to -for instance- the forward strand is added to the chamber, and will bind at the 3′ end of the DNA to be sequenced. The enzyme and buffer etc is added, along with the fluorescent terminators. Each nucleotide has a distinct fluorescent label. So only one nucleotide is added to the primer. The chambers are washed to remove unincorporated nucleotides. Then the flow chamber is exposed to a laser which excites the fluorophores on the nucleotides, which emit light. And the computer takes images of the chambers, detecting which colour each spot is showing. When that data has been collected, then the fluorophores and blocking groups are cleaved from the nucleotides, exposing a 3’OH to which the next nucleotide is added. This process is repeated about 100 – 200 times. Over time, mistakes can occur in incorporation of the new nucleotides or perhaps in a few cases the cleavage reaction doesn’t work.  Occasionally a nucleotide hasn’t got the blocking group so two nucleotides are added in a cycle instead of one. Eventually some of the clusters are giving a very poor signal because if the cleavage step doesn’t work, some of the molecules are no longer giving the correct fluorescent signal – they are “out of step” with the other molecules. And the colour they are emitting will be different and will make it harder for the computer to read the correct colour. That is the limitation of this method. Although it works on a very large scale and we can sequence vast amounts of DNA and process equally large amounts of data in a fairly short time, the sequence reads are fairly short due to this gradual deterioration of the quality of the signal (clear fluorescence that can be easily interpreted by the computer). So we generally cannot do entire new genomes of complex organisms; the technique is not robust enough for that. But we can sequence genomes if we have a reference genome to work from, i.e one that has been sequenced through Sanger or other methods, and to which we can compare all our many short reads from the Illumina sequencing. For instance we can sequence the genomes of individual people and determine what their unique genotypes are.  This is one of the approaches that makes personalized medicine a possibility for the near future

Click here for the powerpoint slides presented in the video below.


C. Nanopore sequencing: a very different method:

Nanopore sequencing takes a very different approach to Next Gen. It is a much more original method and though it has taken some time for bugs to be worked out, it is being used in research, including research in our department! It is referred to as third generation sequencing and the people who work with it are no doubt still working to improve it for the fourth generation.

 


C-1. How it works in theory

The nanopore concept is that you can feed a DNA molecule through a pore in an artificial membrane. An electric potential exists across the membrane. As each nucleotide passes through the pore it alters the electric potential in a predictable way and these changes are used to record the sequence. There is no synthesis of anything, no enzyme to decrease in activity and accuracy over time and no reagents to add in cycles.  In theory an entire chromosome could be threaded through the pore and its entire sequence read. I don’t believe we are quite there yet, but entire genomes have been sequenced using this method. The genomes were already known, so the experiments were aimed at determining the reliability and accuracy of the method. And it worked.

In this method a double stranded DNA is prepared and then fed through the manufactured “pore”. At the entry to the pore, the DNA is unwound and just one strand is fed through. The electric potential across the membrane is read as the DNA moves through. Each nucleotide affects the potential slightly differently and a trace is produced that can be used to read out the sequence of nucleotides that moved through the pore.

 


C-2. Advantages over other methods

The main advantage of nanopore sequencing over Illumina is the much longer reads that are possible. Reads of 10 kb are common and reads of up to about 100 kb are possible. Having longer sequences means there will be more overlap with other sequences. This is important in genomes with a lot of repetitive sequences. These can be simple types of repeats of varying length that are scattered around a genome. These can be longer or shorter sequences, depending on the number of repeats. Imagine you have a group of very short sequences and some of these contain the edge of a repeat region. If the region is found on a number of chromosomes or even multiple regions in the same chromosome it can be impossible to figure out exactly where your  sequence actually resides in the genome (unless of course you have a reference genome already completed that can be used for comparison). And many of your sequences might contain only the repeats and so you won’t know where these reside either. Longer range sequencing offers the ability to span repetitive genomic regions. If you have a very long sequence that is several thousand bases long it is likely to span all the way across a repeat region and will show the unique sequence on each side of the repeat.

This spanning of repeats and the fact that a longer sequence is more likely to have lots of overlap with other sequences, means that you can assemble full genomes more easily

 


C-3. Disadvantages of Nanopore Sequencing

With the proviso that I have never worked with this sequencing method personally, I have heard of one disadvantage, and that is the error rates in nanopore sequencing are higher than Illumina and other methods.  This might not be an issue if you sequence a lot of sequence because you might expect that sequencing the same DNA multiple times would overcome the occasional error. However it the errors are somehow sequence specific, then it would mean that the same mistake would happen every time you sequence the same DNA.  Colleagues who use this method say you need to be aware of the types of mistakes and be able to compensate for these. Perhaps if you were aware of certain “trouble spots” in the sequences you were working on your could use Sanger sequencing or another approach to cover those regions.

Click here for the powerpoint slides presented in the video below.

Next we are going to look at two interesting ways to study gene expression, qPCR and RNA-seq.

 


Previous (Chapter 9)Next (Chapter 11)

License

Share This Book