66


TT26511 - Chromosomal Duplication near zja
TT26511 – zja9229 duplication from parent strain TT26415 (rrnH- recA-)
TT26415 sty(LT2) zja9229::tetRA(sw); tet101(kanR) rrnH::specR(sw) recA651::rifR(sw) $pSIM5(camR)
Conclusions
4469030          4469037 (alr)
       \        /
        TGAACGTA
       /        \
3753670          3753677 (yhiH)
8 bp 38% GC SJ between alr and yhiH
History
Drug-in-drug trapped duplication
Variables
$ export tt26511=/Users/kofoid/projects/inverted_duplications/Solexa/3_5_12/tt26511
$ export dtt26511=/Users/kofoid/projects/inverted_duplications/Solexa/dData/d3_5_12/dtt26511
Data conversion
All .gz files moved from Genome Center server to $tt26511
$ cd $tt26511
$ gunzip *gz
$ mv TT26511_GAGTGG_L007_R1_001 TT26511_GAGTGG_L007_R1.fastq
$ cat *_R1_* >> TT26511_GAGTGG_L007_R1.fastq
$ rm *_R1_*
$ mv TT26511_GAGTGG_L007_R2_001 TT26511_GAGTGG_L007_R2.fastq
$ cat *_R2_* >> TT26511_GAGTGG_L007_R2.fastq
$ rm *_R2_*
$ maq sol2sanger TT26511_GAGTGG_L007_R1.fastq $dtt26511/foo_r1.fastq
$ maq sol2sanger TT26511_GAGTGG_L007_R2.fastq $dtt26511/foo_r2.fastq
$ cd $dtt26511he
Replace "#0/1" and "#0/2" tags on read ID lines
$ sed 's/\(@DJB.*\)/\1#0\/1/' foo_r1.fastq > tt26511_r1.fastq
$ rm foo_r1.fastq
$ sed 's/\(@DJB.*\)/\1#0\/2/' foo_r2.fastq > tt26511_r2.fastq
$ rm foo_r2.fastq
In the following, check the "@DJB775P1" target before proceeding:
$ awk 'BEGIN { RS="@DJB775P1"; FS="\n" } ; { print ">DJB7751P1"$1"\n"$2 }' tt26511_r1.fastq > tt26511_r1.fasta
$ awk 'BEGIN { RS="@DJB775P1"; FS="\n" } ; { print ">DJB7751P1"$1"\n"$2 }' tt26511_r2.fastq > tt26511_r2.fasta
$ maq fastq2bfq tt26511_r1.fastq tt26511_r1.bfq
$ maq fastq2bfq tt26511_r2.fastq tt26511_r2.bfq
Blast DB
$ cp tt26511_r1.fasta foo.fa
$ cat tt26511_r2.fasta >> foo.fa
$ makeblastdb -in=foo.fa -max_file_sz=80000000 -title=tt26511db -dbtype=nucl -out=tt26511db
$ rm foo.fa
$ export tt26511db=$dtt26511/tt26511db
Chromosome methods
$ mkdir dsty
$ cd dsty
$ sol ../tt26511_r1.bfq ../tt26511_r2.bfq $dtt26511/dsty/sty $rS/sty.bfa 1000
$ maq mapview sty.map > tt26511.view
$ sort tt26511.view > tt26511_sorted.view
$ mv tt26511_sorted.view tt26511.view
Chromosome results
Read-depth profile

Approx. dup. joint = 3756000 x 4466000
Overall read depth:        275.14
Std. dev.:                95.17
Read depth (unamplified):    238.46
Read depth (amplified):    489.06
Amplification ratio:        2.05
Chromosomal anomalous
$ cd $tt26511
$ mkdir sty_anom
$ cd sty_anom
$ find-anamo -L 300 -table true $dtt26511/dsty/sty.map > sty_anom.table
First time I tried this, failed with peculiar error "Fatal error: exception Assert_failure("/Users/kofoid/Desktop/./find_unmapped_pair.ml", 33, 68)". When I repeated with a previously successful input file, I got no error and output was as expected. Therefore, there is something peculiar about the map file input. I had similar errors for other strains in this same set.
It turns out that the Genome Center lopped the "/1" and "/2" tags from the identifiers for the paired reads. This screwed up Yong's script. I put them back with sed scripts mentioned above under "data conversion":
$ grep '    1    ' sty_anom.table>black.xls
...
$ grep '    2    ' sty_anom.table>green.xls
...
$ grep '    3    ' sty_anom.table>blue.xls
...
$ grep '    4    ' sty_anom.table>red.xls
...
Find read-pairs in "green" window -- 72 reads found; IDs collected and sorted to file "temp"
Contigs:
$ getfv -f temp tt26511.view > found
$ cap3 found -- 3 contigs
1: length=120; matches 4468723-4468842
2: length=321; contains joint 4469030-4469037 x 3753670-3753677
Joint:
4469030          4469037
       \        /
        TGAACGTA
       /        \
3753670          3753677
Probe made with  and overlapping per above, length = 101.
Blasted into $tt26511db; 141 reads found with left ends lying in window 1-35 and right in 66-101.
This is close to the read depth of the unamplified region, and the joint is confirmed.
Genotype features
Verified rrnE integrity using blast to find primer sequences rrne1,2
Verified rrnH KO using blast to find primer sequences rrnh1,2
Verified tetRA sw using blast to find primer sequences thra-did1,2
tetRA
$ mkdir tetra
$ cd tetra
$ sol $dtt26511/tt26511_r1.bfq $dtt26511/tt26511_r2.bfq $tt26511/tetra/tetra $rS/tetra.bfa 10

Overall read depth:    554.45
Std. dev.:            132.17
Amplification ratio:    2.33