RNA-seq数据的处理与分析.pdf

资源ID：3434287 资源大小：1.27MB 全文页数：25页
资源格式： PDF 下载积分：10金币

快捷下载

账号登录下载

微信登录下载

三方登录下载：

微信扫一扫登录

下载资源需要10金币

邮箱/手机：
温馨提示：	快捷下载时，用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）。如填写123，账号就是123，密码也是123。
支付方式：
验证码：	换一换

加入VIP,免费下载

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？

友情提示

1、下载资料失败解决办法

2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。

3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。

4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

5、试题试卷类文档，如果标题没有明确说明有答案则都视为没有答案，请知晓。

RNA-seq数据的处理与分析.pdf

1、RNA-seq Data Handling and Analysis Kevin Childs Statistical genetics/genomics journal club 中国测序论坛Overview Fasta/Fastq file formats NCBI SRA Data preparation Bowtie/Tophat/Cufflinks Velvet/Oases Trinity 中国测序论坛Fasta File Format#FASTA gi|1800214|gb|U56729.1|SBU56729 Sorghum bicolor phytochrome A CGCATC

2、CTTCCGCGCCGGGCATGGGCACCGCGTCGGCGCGCGCCCCTACCCAGTCGTCGACTTGATGCTG CTCACTCGCACTCGTCGCAGCGCCCCACGCCCCGCTATTTATGCGTACTTGCTTGCCGGGAGAGTCGCTG GAGGTGGGCGTCCTCCTCCCGCTCCAGAGCTCGCTGCTTCGCTCCACCCACCCTTAAGCAGGAGTGATAT CTGGTGGTTTTTCAAAAGAAGACAAAAATGTCTTCCTCGAGGCCTGCCCACTCTTCCAGTTCATCCAGTA GGACTCGCCAGAGCTCCCAGGC

3、AAGGATATTAGCACAAACAACCCTTGATGCTGAACTCAATGCAGAGTA TGAAGAATCTGGTGATTCCTTTGATTACTCCAAGTTGGTTGAAGCACAGCGGAGCACTCCATCTGAGCAG CAAGGGCGATCAGGAAAGGTCATAGCCTACTTGCAGCATATTCAAAGAGGAAAGCTAATCCAACCATTTG GTTGCTTGTTGGCCCTTGACGAGAAGAGCTTCAGGGTCATTGCATTCAGTGAGAATGCACCTGAAATGCT CACAACGGTCAGCCATGCTGTGCCAAACGTTGATGATC

4、CCCCAAAGCTAGGAATTGGTACCAATGTGCGC TCCCTTTTCACTGACCCTGGTGCTACAGCACTGCAGAAGGCACTAGGATTTGCTGATGTTTCTTTGCTGA ATCCTATCCTAGTTCAATGCAAGACCTCAGGCAAGCCATTCTATGCCATTGTTCATAGGGCAACTGGTTG TCTGGTGGTTGATTTTGAGCCTGTGAAGCCTACAGAATTTCCTGCCACTGCTGCTGGGGCTTTGCAGTCT 中国测序论坛Fastq File Format Read Name Sequence Quality Qua

5、lity scores are in ASCII characters representing coded Phred scores.ASCII codes start at ASCII 33 or ASCII 64.All SRA codes converted to ASCII 33 These scores provide a likelihood that the base was called incorrectly.10 1 in 10 chance the base call is incorrect 20 1 in 100 chance the base call is in

6、correct 30 1 in 1000 chance the base call is incorrect 中国测序论坛High Throughput Sequencing Platforms Illumina HiSeq 1000 and HiSeq 2000 Illumina Genome Analyzer IIx*Life Sciences/Roche 454 pyrosequencing ABI Solid Sequencing System*Pacific Biosciences*Ion Torrent Cambridge Nannopore(late 2012?)中国测序论坛Hi

7、gh Throughput Sequencing HiSeq 2000 Highly parallel sequencing by synthesis Single and paired-end reads between 50 bp and 100 bp 187 million single end or 374 million paired-end reads per lane High error rate in the 3 end 中国测序论坛NCBI SRA SRA toolkit fastq-dump/opt/sratoolkit/fastq-dump SRR373821.lite

8、.sra/opt/sratoolkit/fastq-dump-split-files SRR329070.lite.sra 中国测序论坛Read Quality with the FASTX-Toolkit http:/hannonlab.cshl.edu/fastx_toolkit/中国测序论坛Read Quality with the FASTX-Toolkit Bad Sequence Good Sequence 中国测序论坛FASTX Toolkit fastx_quality_stats-Q 33 i initial_fastq_file.fastq o stats.txt fast

9、x_quality_boxplot_graph.sh-Q 33 i stats.txt t Title o quality.png fastx_clipper-Q 33 -v -a AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT -i initial_fastq_file.fastq -o fastq_file_clipped.fastq fastx_artifacts_filter -Q 33 -v -i fastq_file_clipped.fastq -o fastq_file_artifact_filtered.fa

10、stq fastq_quality_trimmer -Q 33-v -t 20 -l 30 -i fastq_file_artifact_filtered.fastq-o fastq_file_cleaned.fastq -Q is an undocumented parameter to indicate that quality values use ASCII 33 encoding.中国测序论坛FastQC http:/www.bioinformatics.bbsrc.ac.uk/projects/fastqc/A quality control tool for high throu

11、ghput sequence data.中国测序论坛Samtools Package of programs for manipulating sam and bam files sam sequence alignment map bam binary alignment map compressed form of sam file http:/ 中国测序论坛Tuxedo Suite Bowtie fast and quality aware short read aligner for aligning DNA and RNA sequence reads TopHat fast,spl

12、ice junction mapper for RNA-Seq reads built on the Bowtie aligner Cufflinks assembles transcripts,estimates their abundances,and test for differential expression and regulation using the alignments from Bowtie and TopHat 中国测序论坛Bowtie Aligns short reads to large genomes Forms the basis for TopHat,Cuf

13、flinks,Crossbow,and Myrna Unless you are working with genomic DNA derived short reads,you will not directly use Bowtie With the exception of using bowtie-build to create an genomic sequence index file 中国测序论坛TopHat Built on Bowtie and uses the same genome index Used for alignment of RNA-Seq reads to

14、a genome Optimized for paired-end,Illumina sequence reads 70bp 中国测序论坛TopHat 中国测序论坛 Quantification of gene expression using RNA-seq reads Tests for differential expression Uses output from bowtie/tophat Assembles read alignments into transcripts Uses cufflinks-predicted transcripts or user-supplied g

15、ene models for quantification Estimates transcript abundance balanced across transcript isoforms Cufflinks 中国测序论坛Cufflinks 中国测序论坛Bowtie/Tophat/Cufflinks bowtie-build pseudomolecule.fa pseudomolecule.index tophat-p 6-solexa1.3-quals-i 5 -I 1000 -r 100 -no-novel-juncs -GTF pseudomolecule.gtf -o/output

16、/directory pseudomolecule.index purified_reads.fastq samtools sort tophat_output_pairs.bam tophat_output_pairs_sorted samtools view -o tophat_output_pairs_sorted.sam tophat_output_pairs_sorted.bam cufflinks -q -o/output/directory/-p 4 -G pseudomolecule_corrected.gtf tophat_output_pairs_sorted.sam 中国

17、测序论坛Velvet/Oases Genome/transcriptome assembly package Velveth/velvetg work well for genomes but produce fragmented transcriptomes assemblies.Its modules explicitly assume linearity and uniform coverage distribution.Velvet was designed to assemble genomic reads Oases was designed to assemble transcr

18、iptomes of new species.Oases takes the preliminary assembly produced by velvetg,and clusters the contigs into small groups,called loci.It then exploits the read sequence and pairing information,when available,to produce transcript isoforms.中国测序论坛Velvet/Oases velveth /working/directory/31-fastq short

19、Paired switchgrass_purified_1.fastq switchgrass_purified_2.fastq velvetg /working/directory/-read_trkg yes-ins_length 190 -ins_length_sd 44 oases /working/directory/-ins_length 190 -ins_length_sd 44 -min_trans_lgth 250 中国测序论坛Trinity Inchworm,Chrysalis,Butterfly Transcript assembly package 中国测序论坛Trin

20、ity.pl -seqType fa left ABA_RC_1.fa right ABA_RC_2.fa single se_ABA_RC.fa -paired_fragment_length 400 -run_butterfly -output/output/directory -CPU 2 Trinity 中国测序论坛num_of_transcripts=25069 max_len_trans=2707 min_len_trans=300 N50=6492168.5 Avrg size of contigs=517.943954685069 N50_contig=544 Length(b

21、p)distribution 5000 0 num_of_transcripts=86458 max_len_trans=3597 min_len_trans=251 N50=39127035.5 Avrg size of contigs=905.110816812788 N50_contig=1136 Length(bp)distribution 5000 0 Assembly:used 41385139/46322456 reads Assembled reads=89.3414179075479%pe_ABA_RC,pe_ABA_RB,se_ABA_RC,se_ABA_RC.pe_ABA

22、_RC,se_ABA_RC 5,985,516 sequences TRINITY VELVET 中国测序论坛Memory requested and technical notes qsub_time Thu Jun 23 15:07:59 2011 start_time Thu Jun 23 15:08:08 2011 end_time Fri Jun 24 05:54:35 2011 5,985,516 sequences 10G 55G A basic recommendation is to have 1G of RAM per 1M pairs of Illumina reads.Our experience is that the entire process can require about 1 hour per million pairs of reads in the current implementation.*Memory requirements have improved in more recent updates.中国测序论坛

注意事项

本文（RNA-seq数据的处理与分析.pdf）为本站会员主动上传，冰点文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知冰点文库（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。