2

Analytical Genetics Meeting, August 2009
Solexa, Copy Number and a Process for Amplification
[Questions] We all love duplications,
...but how do they form? (draw)

(red guys are little repeats)
We now know that most are rec independent! (add to drawing)

...even though amplification and collapse are strongly rec dependent. (add to drawing)

This is true even if formed between big identities. (add to drawing)

Something makes duplications more efficiently than RecA. What is it?
[Model] "Snap-and-Switch" Model
Replicate through 2 small palindromes

Nascent chain detaches, snaps back, continues replicating (a lethal strategy)

Only those who strand-switch survive

Stretch out the new strand, and we get:

"sTID" for "symmetrical Tandem Inverted Duplication"
Under selection, host gets three copies of something nice, but also gets sick!
Perfect palindromes are nasty, and we have 2 of them!
Can amplify through terminal direct repeats
Also amplifies perfect palindromes
Costs quickly outweigh the benefits
Relief comes through asymmetric deletions.

Deletion 1 gives partial relief and a fitter host
Complete relief after deletion 2
Result is "aTID' for "asymmetric Tandem Inverted Duplication"
Complete relief also from one large asymmetric deletion

Our model provides an explanation for rec-independent tandem duplication formation!
[Evidence] Nice model -- Where's the evidence?
Many tandem dup joints determined by multiplex PCR -- easy.
Several aTID joints were found by accident -- extremely difficult.
Joints could not be found for 30% of amplifications formed under selection ("recalcitrant strains")
High amplification levels by Southern analysis
Probably TIDs.
[Solexa] New and wonderful stuff
Data generated are tethered reads (draw)

Reads typically 35 bp labelled by position in the reference sequence
Tether typically 300 bp
80-fold coverage common
Everything in the DNA pot is sequenced and recorded
We analyzed two strains
Amplified tandem duplication in F'128
Amplified TID in F'128
Read depth (draw)

==> approximate array boundaries
==> accurate amplification levels
What we're really interested in are atypical read-pairs
"The anomalous reads are everything!"
These data predict "Strange structures"
Things not expected from reference sequence
In both data sets, we find
Deletions
Dandem duplications
Ephemera such as palindromes & flowers, some derived from snap-backs
Direct determination of join sequences
The usual sequencing recipe: sort -> align -> consensus (draw)

Joint in tandem duplication strain identical to previously determined sequence
Both inverted join points of TID strain
A typical asymmetric divergent  join, identical to previously known sequence
A major convergent join, determined de novo by Solexa, confirmed afterwards by traditional PCR sequencing
Complex flower at the cross-over and no deleted material (draw)
The large flower is more tolerable than perfect palindrome.
Confirmation of hypothetical intermediate.



A minor convergent joint with the same complex flower
possibly by strand switching 100 bp upstream of the major join  (draw)

Sum up
We can easily calculate amplification levels and approximate array boundaries.
We can rapidly detect TIDS and sequence their joins
We have direct evidence for intermediates predicted by model
We can find atypical, low frequency structures
See my article on "Strange Structures" at our blog, "The Radio".