Posts Tagged ‘Illumina’

Sophie’s Mystery

Wednesday, March 28th, 2012

We’ve been developing methods to identify cells carrying TIDs (“Tandem Inverted Duplications”). Sophie Maisnier-Patin uses P22 transduction to introduce various tools which will introduce a drug resistance cassette only if they are able to recombine into such inverted structures.

Sophie asks whether a kanR MudJ element can nucleate TIDs by virtue of its terminal inverted repeat? This entity is 104 bp long and can form a perfectly paired 48 bp stem with an 8 base loop. The hypothesis is something like the following:

The tool Sophie used in this experiment contained two oppositely facing MudJ elements:

She transduced this into a recipient containing MudJ and a deletion which should prevent recombination with cob and his. :

The resulting strain, SMP1666, should have a lac+ camR kanR trp- phenotype. Unfortunately, no TIDs was found. Instead, Illumina and subsequent PCR analyses showed that one end of the tool inserted by a strand annealing mechanism, as the his:cob deletion was not quite large enough. The strain was unstable for chloramphenicol resistance, expected from the flanking trpB::MudJ direct repeats. Sophie showed by transduction to trp+ that the SMP1666 parent, as expected, contained a MudJ-bisected trpB locus, and that trp+transductants, whether camR or camS, were invariably lac-, another prediction of the hypothesis. The corrected model became:

The Mystery

A large fraction of the trp+ lac- transductants retained kanamycin resistance! How could this be, when elimination of the last MudJ in the chromosome should, by definition, also remove the remaining kanR and lac loci? Clearly, something was wrong. There had to be another MudJ with an impaired promoter incapable of driving lacZ. We wrestled for weeks with elaborate models. All of them suffered from the need for at least two concerted events, implying a low rearrangement frequency, when the opposite had been observed.

We remembered some interesting results from two other Sophian strains, SMP1750 & TT26263, in which the donor DNA had recombined with the recipient through the MudJ inverted repeat locus itself, even though the element orientations were divergent. Although the inverted-repeat region is short, when folded it is prone to attack by SbcCD. This would provide many 3′ ends which could then anneal and rescue the cell. A corollary is that the MudJ stem-loop structure is recombinationally potent.

In light of this, we have refined our model:

The head-to-head Muds in the middle would have no promoter at all for lac, but the internal constitutive kanR promoter would still be active. The trpB gene would, of course, have been restored exactly as expected in earlier models.

-- Eric Kofoid

The Sad Truth about Illumina Data Clustering

Tuesday, September 28th, 2010

This article is a continuation of Illumina Data Clustering, and is a perfect example of why we, as scientists, should resist the hubris of premature expectations.

The standard Illumina protocol for library preparation requires 18 cycles of PCR after adaptor ligation to enrich for fragments with doubly modified ends. Incomplete products from a previous round can snap-back during this step (“megaprimer snap-back”), creating artifactual templates which will then amplify along with the others. This is a first order process and should be fairly common when the 3′ of the elongating strand just happens to fall on a complementary REP site. A less common artifact could occur by a second order process involving megaprimer extension and reannealling in trans to a complementary REP site.

I found a group of closely spaced NlaIV restriction sites which would destroy megraprimer formation by the snap-back route when digested. If clusters arose from preexisting TIDs or by the rare megaprimer extension event, NlaIV digestion would have no effect on cluster amplification.

I did the experiment, and found that cutting the template DNA with NlaIV prevented amplification. I am forced to conclude that the beautiful clustering of REP-mediated TID joints found in our data is strictly man made by megaprimer snap-back!

Models for PCR Detection of REP-Mediated Clusters

-- Eric Kofoid

Illumina Data Clustering

Thursday, March 25th, 2010


Illumina data provides a wealth of information about DNA rearrangements which typify the genotype of a strain under study. Transient or low-level rearrangements can also be identified and have been discussed in the article, Finding Strange Structures in Solexa. In this article, I  confine myself to alterations consistent with red and blue read pairs.

Groups of related DNA structures are found, defined by reads which are physically close to each other and define a single type of rearrangement, but which are not so frequent as to have risen to fixation in the genotype. I would like to suggest that members of such groups, or “clusters”, may have originated by a mechanism common to the ensemble as a whole, and that these mechanisms may be of evolutionary importance.

Some Definitions

The following phrases are from the Roth Encyclopedia. Clicking on them will bring up the definition in a separate window.

Anomalous pairs

Anomalous types


Convergent joint


Divergent joint

Mated pair

Paired-end (PE) reads


Read pair

Reference sequence


Here are examples of blue and red clusters in the F’128(FC40) molecule of LT2 strain TT24815:


Here are data from the central group of the previous illustrations remapped as diff graphs:


From the first set of graphs, I can see that both ends of the DNA fragments have been randomly sheared, although the right ends are less variable than the left. This combined with the slopes in the diff graphs allow me to infer that both clusters are probably characterized by several different join points.


Clustering in Illumina data indicates that certain types of rearrangements occur more often in the subject genome than statistically expected. They may be counterselected but have a high enough formation rate that they are in steady state with the dominant genotype, or are selectively neutral and undergo a drunkard’s walk in frequency with time. Clearly, they are not under positive selection, as they are present at levels much smaller than one per genome equivalent of  DNA.

What is the nature of the rearrangements? They could represent the joints of inverted segments. We tend to disregard this possibility, as inversions require a concerted ballet of simultaneous low-probability events. They could represent transient “one-off” uninheritable creations, such as snap-back extensions which simply die unproductively after forming at a high rate owing to specific features of the DNA neighborhood. Again, we suspect this is not the case, as the structures would be lethal — selection would likely modulate the nucleating features over time to decrease such suicidal tendencies.

We favor the notion that these clusters represent the joints of unselected aTIDs, which we think form relatively often, are only moderately upsetting to the cell, and — if truly deleterious — are easily collapsed. If this idea is correct, then read-pair clusters are the very stuff of evolution, as they represent preexisting sites at which rapid amplification could occur should selection ever come into play.

-- Eric Kofoid