The Sad Truth about Illumina Data Clustering

This article is a continuation of Illumina Data Clustering, and is a perfect example of why we, as scientists, should resist the hubris of premature expectations.

The standard Illumina protocol for library preparation requires 18 cycles of PCR after adaptor ligation to enrich for fragments with doubly modified ends. Incomplete products from a previous round can snap-back during this step (“megaprimer snap-back”), creating artifactual templates which will then amplify along with the others. This is a first order process and should be fairly common when the 3′ of the elongating strand just happens to fall on a complementary REP site. A less common artifact could occur by a second order process involving megaprimer extension and reannealling in trans to a complementary REP site.

I found a group of closely spaced NlaIV restriction sites which would destroy megraprimer formation by the snap-back route when digested. If clusters arose from preexisting TIDs or by the rare megaprimer extension event, NlaIV digestion would have no effect on cluster amplification.

I did the experiment, and found that cutting the template DNA with NlaIV prevented amplification. I am forced to conclude that the beautiful clustering of REP-mediated TID joints found in our data is strictly man made by megaprimer snap-back!

Models for PCR Detection of REP-Mediated Clusters

-- Eric Kofoid

Tags: , , , ,

One Response to “The Sad Truth about Illumina Data Clustering”

  1. Kim Bunny says:

    Hi Eric,

    I haven’t really been following this, but took a look today at your blurb. This snap back stuff sounds just like the type of thing you have to avoid in RACE. I don’t really understand the illumina stuff so I could be talking out my arse, but can you use any of the supression and step-out PCR strategies to avoid this? See Appendix C of this user manual for RACE from Clontech:

    As I say I may be talking rubbish, but it made me think of the RACE.
    Hope all is well.

Leave a Reply