454 sequencer
A Roche/454 GS FLX sequencer has been installed at the DNA Sequencing Facility, School of Medicine in early summer, 2008. The sequencer was funded by an NIH shared instruments grant (PI Frederic Bushman) and the start up cost including the first year salaries of a full-time technician and a half-time programmer/analyst have been provided jointly by the School of Medicine and the Penn Genome Frontier Institute. The sequencer uses massively parallel pyrosequencing technique to generate as much as one hundred million bases in a single 7.5 hr run with a read length of about 250 bases.
During this period we have set up sample preparation procedures including library preparation, emulsion PCR and pyrosequencing for both genomic DNA and amplicons. We have done several sequencing runs and have gradually enhanced the quality and throughput of sequence reads. With the recently introduced Titanium kit the throughput has gone up to ~500 Mb with a read length of 400 bases as announced by Roche. We have run one plate so far using the Titanium kit.
We are currently accepting samples for processing and pyrosequencing on 454 GS FLX sequencer using both standard and *Titanium chemistries.
*The Titanium chemistry is now available only for genomic DNA and long PCR products, greater than 1.5 Kb.
Applications
Please note that any new generation sequencing platform including Roche/454 is ideally suited for generating millions of sequenced bases from few samples. These sequencers cannot be fully exploited yet for sequencing only a few hundred bases (e.g. 1 or 2 exons of a gene to scan for few known SNP’s) from tens or hundreds of samples. The applications of 454 technology include de novo whole genome sequencing supported by paired-end mapping, amplicon resequencing (ultra deep targeted sequencing), transcriptome analysis, and gene regulation studies. The data analysis is facilitated by a complete software package from Roche for mapping, assembly, and amplicon variation detection. Compared to short read sequencers GS FLX generated sequence reads allow easier assembly of repeat-rich sequences and the use of bar coding strategy.
Sample Preparation for a 454 Sequencer Run
This consists of three steps – Library preparation, Titration and emulsion PCR (emPCR), and Pyrosequencing. Preparation of shotgun library for genomic DNA and long amplicons are done at the facility. Amplicon libraries are to be made by the user laboratory. The rest of the steps are carried out at the facility.
1. Library preparation (gDNA or long PCR products greater than 1.5 Kb): Genomic DNA or long PCR products submitted by the users are fragmented by nebulization, and end-polished. Following double-stranded adaptor ligation, the fragments are immobilized onto streptavidin coated beads, via the biotin moiety of one of the adaptors. A strand-displacing DNA polymerase does a fill-in to repair the gaps generated by the ligation of non-phosphorylated adaptors to the fragments. Next a single-stranded library is created by melting off the non-biotinylated strand of bead-bound fragments.
Library preparation (PCR products up to 700 - 800 b, preferably below 500 b ): The PCR products are generated by the user lab using sequence specific primers that have 454 primers A and B fused to its 5’ sides. The 454 primer A and B sequences are available on request. Single or pooled PCRs are then submitted to the facility.
Sequence capture: In stead of using a very labor-intensive and expensive method of generating large number of long or short amplicons to analyze several genes or large genomic regions, targeted regions of the genome can be captured following Nimblegen Sequence Capture Technology or Agilent Target Enrichment System. Few other companies also offer their capture technologies.
Nimblegen’s sequence capture arrays provide enrichment of up to 5 Mb of selected genomic regions, either contiguous or non-contiguous. Recently they have introduced Human Exome Array that targets all human protein coding and miRNA exons. The capture technology has been optimized to be used in conjunction with 454 sequencing. DNA after capture is processed to attach 454 adaptors at the facility. We have done 454 sequencing of Nimblegen captured cancer genes.
Agilent’s target enrichment method based on hybridization in solution also offers enrichment of a particular segment of the genome. At this point it is not clear if this method has been optimized for 454 sequencing.
Barcoding: In order to reduce the cost of sequencing, multiplexing can be done by attaching a unique tag to each primer before PCR amplification. After sequencing an equimolar mixture of PCR products from a number of samples, the sequences can be assigned to each sample based on the unique barcodes. Roche offers 12 barcoded adaptors to pool 12 samples together (sequences available).
A few publications involving 454 sequencing with barcoded samples are cited below.
1. Hoffman et. al. (2007) DNA bar coding and pyrosequencing to identify rare HIV drug resistant mutations. Nuc. Acids Res., 35, No.13, e91 – This one from Rick Bushman’s (Microbiology, Penn) group describes sequencing of barcoded and pooled amplicons.
2. Meyer et. al. (2008) Parallel tagged sequencing on the 454 platform. Nature Protocols, 3, No.2, 267-278 – This one describes method for barcoding shotgun DNA libraries as well as PCR products.
3. Hamady et. al. (2008) Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nature Methods, 5, 235-237.
2. Titration and Emulsion PCR: After preparation the quality assessment and the quantitation of a library are done by flurometry and analysis on a Bioanlyzer. A functional quantitation (titration) is performed by setting up small scale emPCR’s to determine the optimum number of DNA molecules per bead.
The library of DNA fragments is amplified from a single bead-bound copy to millions of copies per bead in an emulsion of water-in-oil mixture. Emulsion PCR ensures functional clonality by physically separating the DNA carrying beads in an emulsion during amplification. Following amplification on a thermocycler the emulsion is broken and the beads carrying the amplified library are recovered. The procedure generates a certain number of beads without any amplified DNA. The beads carrying amplified DNA are separated from empty beads by an enrichment process based on the binding of biotinylated amplification primers to streptavidin coated beads. The sequencing primer is then annealed to the bead bound amplified fragments.
3. Pyrosequencing: Beads with bound DNA are loaded into the wells of a picotiter plate (PTP) such that the wells contain single DNA beads. The DNA beads are spiked with control DNA beads of known sequence. The packing beads to stabilize all immobilized components and the enzyme beads carrying the enzymes for chemiluminescence are deposited thereafter sequentially. The loaded PTP is inserted into the FLX instrument and the sequencing reagents are sequentially flowed over the entire plate. The PPi released after incorporation of a nucleotide into the growing DNA strand by DNA polymerase is detected by ATP sulfurylase and luciferase in a coupled reaction. The light generated as a result is recorded by a CCD camera from every well on the plate simultaneously in a massively parallel fashion. Each flow of nucleotide is followed by a wash with apyrase to degrade unused nucleotides. After 100 flow cycles the FLX sequencer produces about 400,000 reads of length 200–300b in a 7.5 hr run. Sequence accuracy is estimated at 99%.
Data Analysis
Data processing occurs in two phases. The run-time phase includes 3 steps in succession – GS Sequencer (acquisition of the raw images), Image Processing, and Signal Processing. This is an automated process occurring as part of a sequencing run. However Image Processing, and Signal Processing can also be carried out on a separate server called DataRig. The end output is Standard Flowgram Format (SFF) files containing the flowgrams for individual reads, the basecalled read sequences, and per-base quality scores. Total number of reads and bases obtained from a run refers to the high quality reads that have passed all filters included in the GS Run Browser of the 454 GS FLX, namely mixed and dot filters, primer filter and signal intensity filter.
The post-run phase of data processing is the most time consuming step. Roche offers three software to generate the final output in desired format - De Novo Assembler assembles the reads into contigs to generate a consensus sequence. Reference Mapper maps the reads to a known reference sequence to generate a consensus sequence along with a list of high confidence mutations. Amplicon Variant Analyzer identifies and quantitates sequence variants by ultra-deep sequencing of amplicons.
A number of companies including DNA Star and SoftGenetics provides off-the-shelf analysis software. Besides there are numerous free analysis tools available on the web.
The facility performs a preliminary analysis including trimming of the standard primers from the reads, assembly of the reads to contigs and mapping to a reference sequence. Working with us, the investigator will need to do customized downstream analysis like - annotation of known SNPs, or generation of a consensus sequence and/or a long list of mutations and rare sequence variants in numerous genes, and the possible implications at the level of coding or non-coding sequence changes and so on.
Expected Run Results (Standard Chemistry)
Throughput based on 250b read length
Pico Tilter Plate (PTP) Device |
No of Regions per PTP |
Reads per regions |
Throughput per region |
Total Reads per PTP |
Throughput per PTP |
70 x 75 |
2 |
200,000 |
50 Mb |
400,000 |
100 Mb |
4 |
70,000 |
17.5 Mb |
280,000 |
70 Mb |
|
8 |
30,000 |
7.5 Mb |
240,000 |
60 Mb |
|
16 |
12,000 |
3 Mb |
192,000 |
46 Mb |
|
25 x 75 |
1 |
70,000 |
17.5 Mb |
70,000 |
17.5 Mb |
4 |
12,000 |
3 Mb |
48,000 |
12 Mb |
Expected Run Results (Titanium Chemistry)
Throughput based on ~ 400b read length
Pico Tilter Plate (PTP) Device |
No of Regions per PTP |
Reads per regions |
Throughput per region |
Total Reads per PTP |
Throughput per PTP |
70 x 75 |
2 |
500,000 |
200 - 250 Mb |
1,000,000 |
400 - 500 Mb |
4 |
200,000 |
70 - 100 Mb |
800,000 |
280 - 400 Mb |
|
8 |
100,000 |
35 - 50 Mb |
800,000 |
280 - 400 Mb |
|
16 |
32,000 |
12 - 18 Mb |
512,000 |
200 - 280 Mb |
Turnaround time
Between 4 - 6 weeks depending on the workload. A lot depends on the quality of the DNA submitted.
We request the users to use only 1/2, 1/4 or 1/8 of a plate for each sample at present. Using 1/16th of a plate means having to wait for 16 samples altogether to fill up a plate.
454 Sequencing Request Form (word doc., standard chemistry)
454 Sequencing Request Form (word doc., Titanium chemistry)
For further information and to set up a meeting, please contact Tapan Ganguly
Tel: 215-573-7238, e-mail: gangulyt@mail.med.upenn.edu
Updated Juky 20, 2009

