• 中国中文核心期刊
  • 中国科学引文数据库(CSCD)核心库来源期刊
  • 中国科技论文统计源期刊(CJCR)
  • 第二届国家期刊奖提名奖
Volume 31 Issue 5
Jul.  2019
Article Contents
Turn off MathJax

Citation:

Analysis of SSR Loci of Functional Gene Linked to Drought Resistance Based on Transcriptome Sequences in Pinus massoniana under Drought Stress

  • Objective The transcriptome data of Pinus massoniana under drought stress were used to clarify the function distribution of sequences, as well as the characteristics and distribution patterns of SSR loci, and to explore the key SSR loci linked to drought-resistant gene. Method The P. massoniana needle samples under lingering drought stress for 10, 15, and 25 days and the corresponding samples with sufficient water as the control (CK) were selected to extract the total RNA. Illumina sequencing were performed to generate raw reads. After removal of low-quality data, the transcriptome assembly was conducted using Trinity software. The unigenes of transcriptome were annotated by aligning with several public databases via BLAST program, including GO (Gene Ontology), KOG (eukaryotic orthologous groups), and KEGG (Kyoto Encyclopedia of Genes and Genomes). The SSRs loci were examined using Misa software, and the PCR amplification SSR primers were designed using Primer 3.0 software. GO and KEGG enrichment analysis were implemented using GOSeq (1.10.0) and KOBAS software, respectively, to determine the major process of biological process and metabolic pathways of differentially expressed unigenes contained SSR loci. Result A total of 101 806 unigenes were annotated from 194 821 unigenes of transcriptome. Among them, 64 973 functional annotations were from GO database, 35 880 from KOG database and 30 882 from KEGG database. Moreover, 6 728 SSR loci were identified and distributed in 6 367 unigenes, and their average frequency of SSRs was 3.45%. Among all the SSR motifs, mononucleotide, trinucleotide and dinucleotide were the major repeated types, with occurrence frequency of 35.82%, 33.03% and 25.22%, respectively; the form of A/T, AT/AT, AG/CT, AGC/CTG, and AAG/CTT were the most frequent motifs, the length from 10 to 20 bp were the most repeat motifs, and the SSR repeat numbers from 5 to 10 were the most repeat numbers of motifs. A total of 13 338 pairs of SSR primers were designed for marker development of P. massoniana. Furthermore, among the 6 367 unigenes containing SSR loci, 422 unigenes were differentially expressed on drought stress versus the control. Enriched analysis of KEGG pathway showed that 11 unigenes containing SSR loci were significantly enriched into three KEGG pathways, including photosynthesis, plant hormone signal transduction and carotenoid biosynthesis, which were linked to the plant response to drought stress. Conclusion A total of 101 806 unigenes were annotated from a higher quality of transcriptome database in P. massoniana, 6 728 SSR loci were identified and distributed from 6 367 unigenes, 11 SSR loci from 422 differentially expressed genes containing SSR loci were identified linking to the plant response to drought stress. These results can be used for the subsequent study on molecular mechanism for drought resistance and functional gene localization in P. massoniana.
  • 加载中
  • [1]

    Pandey G, Misra G, Kumari K, et al. Genome-wide development and use of microsatellite markers for large-scale genotyping applications in foxtail millet[Setaria italica (L.)][J].DNA Research, 2013, 20(2):197-207. doi: 10.1093/dnares/dst002
    [2]

    Bai T D, Xu L, Xu M, et al. Characterization of masson pine (Pinus massoniana Lamb.) microsatellite DNA by 454 genome shotgun sequencing[J].Tree Genetics & Genomes, 2014, 10(2):429-437.
    [3]

    Kalia R K, Rai M K, Kalia S, et al. Microsatellite markers: an overview of the recent progress in plants[J].Euphytica, 2011, 177(3):309-334. doi: 10.1007/s10681-010-0286-9
    [4]

    Pinosio S, González-Martínez S C, Bagnoli F, et al. First insights into the transcriptome and development of new genomic tools of a widespread circum-Mediterranean tree species, Pinus halepensis Mill[J].Molecular Ecology Resources, 2014, 14(4):846-856. doi: 10.1111/men.2014.14.issue-4
    [5] 张振, 张含国, 莫迟, 等.红松转录组SSR分析及EST-SSR标记开发[J].林业科学, 2015, 51(8):114-120.

    [6]

    Singh R K, Jena S N, Khan S, et al. Development, cross-species /genera transferability of novel EST-SSR markers and their utility in revealing population structure and genetic diversity in sugarcane[J].Gene, 2013, 524(2):309-329.
    [7]

    Choudhary S, Sethy N K, Shokeen B, et al. Development of chickpea EST-SSR markers and analysis of allelic variation across related species[J].Theoretical and Applied Genetics, 2009, 118(3):591-608. doi: 10.1007/s00122-008-0923-z
    [8]

    Kantety R V, La Rota M, Matthews D E, et al. Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat[J].Plant Molecular Biology, 2002, 48:501-510. doi: 10.1023/A:1014875206165
    [9] 杨洋, 田时炳, 王永清, 等.茄子耐热相关EST-SSR分子标记的研究[J].西南农业学报, 2012, 25(5):1798-1804. doi: 10.3969/j.issn.1001-4829.2012.05.054

    [10] 曲延英, 穆平, 李雪琴, 等.水、旱栽培条件下水稻叶片水势与抗旱性的相关分析及其QTL定位[J].作物学报, 2008, 34(2):198-206.

    [11] 刘莹, 盖钧镒, 吕慧能, 等.大豆耐旱种质鉴定和相关根系性状的遗传与QTL定位[J].遗传学报, 2005, 32(8):856-863.

    [12] 张永虎, 陈雪, 于海峰, 等.向日葵芽期抗旱相关性状丙二醛含量的SSR分子标记分析[J].内蒙古农业大学学报, 2012, 33(3):25-29.

    [13] 丁贵杰, 周志春, 王章荣, 等.马尾松纸浆用材树种培育与利用[M].北京:中国林业出版, 2006:1-34.

    [14] 何卫龙, 张逢凯, 潘婷, 等.利用大配子体构建马尾松遗传图谱[J].分子植物育种, 2014, 12(3):421-431.

    [15] 杜明凤, 丁贵杰.不同种源马尾松ISSR遗传结构及影响因素分析[J].广西植物, 2016, 36(9):1068-1075.

    [16]

    Fan F H, Cui B W, Zhang T, et al.LTR-retrotransposon activation, IRAP marker development and its potential in genetic diversity assessment of masson pine (Pinus massoniana)[J].Tree Genetics & Genomes, 2014, 10(1):213-222.
    [17] 刘公秉, 季孔庶.基于松树EST序列的马尾松SSR引物开发[J].分子植物育种, 2009, 7(4):833-838. doi: 10.3969/mpb.007.000833

    [18] 梅利那, 范付华, 崔博文, 等.基于马尾松转录组的SSR分子标记开发及种质鉴定[J].农业生物技术学报, 2017, 25(6):991-1002.

    [19] 杜明凤, 丁贵杰, 赵熙州.不同家系马尾松对持续干旱的响应及抗旱性评价[J].林业科学, 2017, 53(6):21-29.

    [20] 王艺, 丁贵杰.马尾松菌根化苗木对干旱的生理响应及抗旱性评价[J].应用生态学报, 2013, 24(3):639-645.

    [21] 王晓锋, 何卫龙, 蔡卫佳, 等.马尾松转录组测序和分析[J].分子植物育种, 2013, 11(3):385-392.

    [22]

    Chagné D, Chaumeil P, Ramboer A, et al. Cross-species transferability and mapping of genomic and cDNA SSRs in pines[J].Theoretical and Applied Genetics, 2004, 109(6):1204-1214. doi: 10.1007/s00122-004-1683-z
    [23] 阎毛毛, 戴晓港, 李淑娴, 等.松树、杨树及桉树表达基因序列微卫星比对分析[J].基因组学与应用生物学, 2011, 30(1):103-109. doi: 10.3969/gab.030.000103

    [24] 陆素娟, 李乡旺.松属的起源、演化及扩散[J].西北林学院学报, 1999, 14(3):1-5. doi: 10.3969/j.issn.1001-7461.1999.03.001

    [25]

    Salzer K, Sebastlani F, Guerli F, et al. Isolation and characterization of polymorphic nuclear microsatellite loci in Pinus cembra L.[J].Molecular Ecology Resources, 2009, 9(3):858-861. doi: 10.1111/men.2009.9.issue-3
    [26]

    Liang X, Chen X, Hong Y, et al. Utility of EST-derived SSR in cultivated peanut (Arachis hypogaea L.) and Arachis wild species[J].BMC Plant Biology, 2009, 9(1):35. doi: 10.1186/1471-2229-9-35
    [27]

    Garg R, Patel R K, Tyagi A K, et al. De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification[J].DNA Research, 2011, 18(1):53-63. doi: 10.1093/dnares/dsq028
    [28]

    Bérubé Y, Zhuang J, Rungis D, et al. Characterization of EST-SSRs in loblolly pine and spruce[J].Tree Genetics & Genomes, 2007, 3(3):251-259.
    [29]

    Echt C S, Saha S, Deemer D L, et al. Microsatellite DNA in genomic survey sequences and unigenes of loblolly pine[J].Tree Genetics & Genomes, 2011, 7(4):773-780.
    [30]

    Rota L R, Kantety R V, Yu J K. Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat and barley[J].BMC Genomics, 2005, 6(2):23.
    [31] 李小白, 张明龙, 崔海瑞.油菜EST资源的SSR信息分析[J].中国油料作物学报, 2007, 29(1):20-25. doi: 10.3321/j.issn:1007-9084.2007.01.004

    [32]

    Temnykh S, DeClerck G, Lukashova A, et al. Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): Frequency, length variation, transposon associations, and genetic marker potential[J].Genome Research, 2001, 11(8):1441-1452. doi: 10.1101/gr.184001
    [33] 张勇, 范佳, 赵兴延, 等.植物防御信号物质JA/SA对桃蚜解毒酶谷胱甘肽-S-转移酶及唾液腺基因C002表达诱导反应[J].中国科学:生命科学, 2016, 46(5):665-672.

    [34]

    Millward T A, Zolnierowicz S, Hemmings B A. Regulation of protein kinase cascades by protein phosphatase 2A[J].Trends in Biochemical Sciences, 1999, 24(5):186-191. doi: 10.1016/S0968-0004(99)01375-4
    [35]

    Liu L, Hu X, Song J, et al. Over-expression of a Zea mays L. protein phosphatase 2C gene (ZmPP2C) in Arabidopsis thaliana decreases tolerance to salt and drought[J].Journal of Plant Physiology, 2009, 166(5):531-542. doi: 10.1016/j.jplph.2008.07.008
  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(7) / Tables(5)

Article views(4240) PDF downloads(317) Cited by()

Proportional views

Analysis of SSR Loci of Functional Gene Linked to Drought Resistance Based on Transcriptome Sequences in Pinus massoniana under Drought Stress

    Corresponding author: DING Gui-jie, gjding@gzu.edu.cn
  • 1. College of Forestry/Institute for Forest Resources & Environment of Guizhou, Guizhou University, Guiyang 550025, Guizhou, China
  • 2. School of Karst Science, Guizhou Normal University, Guiyang 550001, Guizhou, China

Abstract:  Objective The transcriptome data of Pinus massoniana under drought stress were used to clarify the function distribution of sequences, as well as the characteristics and distribution patterns of SSR loci, and to explore the key SSR loci linked to drought-resistant gene. Method The P. massoniana needle samples under lingering drought stress for 10, 15, and 25 days and the corresponding samples with sufficient water as the control (CK) were selected to extract the total RNA. Illumina sequencing were performed to generate raw reads. After removal of low-quality data, the transcriptome assembly was conducted using Trinity software. The unigenes of transcriptome were annotated by aligning with several public databases via BLAST program, including GO (Gene Ontology), KOG (eukaryotic orthologous groups), and KEGG (Kyoto Encyclopedia of Genes and Genomes). The SSRs loci were examined using Misa software, and the PCR amplification SSR primers were designed using Primer 3.0 software. GO and KEGG enrichment analysis were implemented using GOSeq (1.10.0) and KOBAS software, respectively, to determine the major process of biological process and metabolic pathways of differentially expressed unigenes contained SSR loci. Result A total of 101 806 unigenes were annotated from 194 821 unigenes of transcriptome. Among them, 64 973 functional annotations were from GO database, 35 880 from KOG database and 30 882 from KEGG database. Moreover, 6 728 SSR loci were identified and distributed in 6 367 unigenes, and their average frequency of SSRs was 3.45%. Among all the SSR motifs, mononucleotide, trinucleotide and dinucleotide were the major repeated types, with occurrence frequency of 35.82%, 33.03% and 25.22%, respectively; the form of A/T, AT/AT, AG/CT, AGC/CTG, and AAG/CTT were the most frequent motifs, the length from 10 to 20 bp were the most repeat motifs, and the SSR repeat numbers from 5 to 10 were the most repeat numbers of motifs. A total of 13 338 pairs of SSR primers were designed for marker development of P. massoniana. Furthermore, among the 6 367 unigenes containing SSR loci, 422 unigenes were differentially expressed on drought stress versus the control. Enriched analysis of KEGG pathway showed that 11 unigenes containing SSR loci were significantly enriched into three KEGG pathways, including photosynthesis, plant hormone signal transduction and carotenoid biosynthesis, which were linked to the plant response to drought stress. Conclusion A total of 101 806 unigenes were annotated from a higher quality of transcriptome database in P. massoniana, 6 728 SSR loci were identified and distributed from 6 367 unigenes, 11 SSR loci from 422 differentially expressed genes containing SSR loci were identified linking to the plant response to drought stress. These results can be used for the subsequent study on molecular mechanism for drought resistance and functional gene localization in P. massoniana.

  • 转录组学借助高通量测序技术快速获取海量转录本,从RNA水平解析物种在特定环境下的基因功能表达、生物学过程及分子运行机制,是一种快捷高效的分子生物学研究手段,更适用于基因组较大、且基因信息相对匮乏的非模式生物[1-2]。基于转录组的SSR标记,不仅具有SSR多态性高、重复性好等特性,更具有转录组经济、高效、信息量大等优势[3];众多林木已相继开展转录组SSR标记开发研究,如地中海松(Pinus halepensis Mill.)[4]、红松(Pinus koraiensis Sieb.et Zucc.)[5]等。同时,转录组SSR标记源自转录区编码序列,直接反映基因表达的变异[6],成为与基因表达直接关联的功能标记[7],涉及基础代谢、信号转导及转录等基因调控的各个方面[8];因此,通过转录组学分析手段,可快速精准地锁定目的基因的SSR功能标记,加快目的基因定位研究进程。但借助转录组数据挖掘SSR功能标记的研究非常少,仅见杨洋等[9]通过茄子转录数据找到1个与热胁迫相关的SSR候选位点。目前,SSR标记结合BSA法(Bulked Segregant Analys),在水稻(Oryza sativa L.)[10]、大豆(Glycine max (Linn.) Merr.)[11]、向日葵(Helianthus annuus L.)[12]等作物中找到耐旱基因的连锁标记,为耐旱功能基因定位及耐旱性育种奠定了基础。

    马尾松(Pinus massoniana Lamb.)是中国南方主要造林树种,具有速生、丰产、综合利用程度高等特点,在我国森林资源发展和生态建设中具有重要地位[13]。马尾松分子遗传学研究发展迅速,SSR[14]、ISSR[15]、IRAP[16]等标记在马尾松遗传图谱构建、遗传结构变异、遗传多样性研究中被广泛运用;有学者曾利用马尾松近缘种EST序列[17]、基因组[2]或转录组[18]开发SSR标记,但开发的SSR标记仍较少,无法满足分子标记辅助育种需求。近年来,南方频发的季节性干旱严重威胁马尾松生长,抗旱种质选育已成为应对干旱逆境的关键。课题组前期选择3个速生性和适生性较好的马尾松优良家系进行干旱胁迫,通过对其形态、生长及生理等指标进行对比分析,筛选出马尾松抗旱种质[19], 并对其进行高通量测序获得干旱胁迫转录本数据。本研究将对该转录本进行Unigene功能注释及分类,分析SSR位点分布特征;在此基础上,对含SSR位点的Unigene序列进行差异表达以及GO、KEGG显著性富集分析,深度挖掘与功能基因直接关联的SSR标记,为后续马尾松SSR规模性标记开发、抗旱分子机制、功能基因定位等研究奠定基础。

1.   材料与方法
  • 材料为长势一致的马尾松2年生幼苗(苗高65 cm左右),是课题组前期筛选出的抗旱家系[19]。试验设对照组和干旱组,对照组每3~5 d浇1次水,维持正常水分,干旱组在浇水后自然持续干旱至30 d。对照组和干旱组在10、15、25 d时,对应的土壤相对含水量分别为81.9%、80.3%、81.2%和57.8%、46.6%、30.1%,分属于湿润(75%~80%)、轻微干旱(55%~60%)、中度干旱(45%~50%)、重度干旱(30%~35%)[20]。选取供水10 d(CK)及干旱10、15、25 d的同位针叶,重复2次共8个样品, 交诺禾致源生物有限公司完成RNA提取、质检及转录组测序。

  • 通过Hiseq2000高通量测序平台获得8个针叶转录组数据(NCBI数据库Accession: SRX 2327081~2327084; 2310441~2310444)。用NGSQCToolkit v 2.5软件去除接头序列、ploy-A及低质量序列,用Trinity软件完成序列拼接,筛选大于200 bp的unigene序列。通过Blast序列比对,对unigene进行Nr、Nt、Swiss-Prot、PFAM、GO、KOG、KEGG 7大数据库信息注释,注释成功的unigene根据功能的不同,进一步划分不同的GO基因功能、KOG类别及KEGG代谢途径。

  • 用Misa(MIcorSAtellite Identification Tool)软件对unigene进行SSR位点检测。参数设置为单核苷酸重复数10次或10次以上,二~六核苷酸的最小重复数分别为6、5、5、5、5,SSR位点侧翼序列长度≥50 bp。用primer 3(2.3.5版,默认参数)对含SSR位点的unigene批量设计引物,引物设计参照通用标准, 并按下列标准进一步筛选引物:引物错配5’端低于3个碱基,3’端低于1个碱基;引物中无SSR;去除可匹配其他unigene的引物,筛选唯一匹配引物;用SSRFinder校验SSR,将产物序列搜寻的SSR与MISA结果比较,保留具有相同SSR产物的引物。为验证引物有效性,对随机筛选的11对引物用1个DNA材料进行PCR扩增,扩增条带用1.5%的琼脂糖凝胶电泳检测。

  • 对含SSR位点的unigene进行差异表达分析,以及GO、pathway富集分析。RPKM法(Reads Per Million Kilobases)计算unigene表达量。采用DESeq法(1.10.1)比较干旱胁迫与正常供水(CK)之间的unigene表达量,设定Fold change≥2和FDR<0.05来筛选样品间具有显著性差异的unigene。针对差异表达unigene,用GOSeq(1.10.0)、topGO(2.10.0)软件进行GO富集分析,用KOBAS(v2.0.12)软件进行KEGG富集分析,均取校正P-Value < 0.05。

  • 从转录组中随机筛选4个unigenes进行qRT-PCR检验,验证转录组数据可靠性。利用宝生物RNA LA PCR Kit试剂盒对转录组测序RNA进行反转录合成cDNA,Primer Premier 5.0软件设计unigene特异性引物,UBC选作内参基因,ABI 7500 Real Time System进行PCR扩增。反应体系20.0 μL:SYBR mix 10.0 μL,正反向引物各0.5 μL,cDNA模板1.0 μL,ddH2O 8.0 μL;扩增程序: 95℃ 60 s, 95℃ 10 s(40个循环), 61℃ 30 s,72℃ 30 s;加溶解曲线程序:95℃ 15 s, 61℃ 60 s, 95℃ 15 s。差异表达分析采用2-△△CT定量分析法。

2.   结果与分析
  • 基于马尾松干旱胁迫转录组,通过低质量Raw reads的筛除,高质量clean reads的Trinity拼接,共获得大于200 bp的Unigene 194 821个。通过7大数据库的Blast比对,有101 806个Unigene获得注释,注释率52.26%;其余的93 015个Unigene未获注释,可能为新基因。其中,NR数据库比对所获得的66 825个注释中,16 323个Unigene与云杉属(Picea)、3 040个Unigene与松属(Pinus)的序列同源,其序列匹配数远高于其它物种。

  • 对注释成功的Unigene进行GO、KOG、KEGG功能注释并分别归类(图 1~3),该结果将反映马尾松干旱胁迫过程中表达基因的功能分布整体情况。GO功能分类显示,64 943个Unigene被成功注释,占总数33.33%。获得的128 326个功能注释,被划分为3大类:46 779个生物过程(36.45%)、52 470个分子功能(40.89%)和29 077个细胞组分(22.66%)(图 1)。生物过程包含25个功能亚类,主要涉及代谢过程(34 306, 73.34%)、细胞过程(34 293, 73.31%)、单细胞有机体过程(27 445,58.67%);与胁迫响应相关的有生物调控(9 993, 21.36%)、胁迫响应(6 741, 14.41%)、信号(3 679, 7.86%)等。分子功能包含10个功能亚类,代表性功能包括蛋白结合(32 711, 62.34%)、催化活性(29 984, 57.15%)、转运活性(4 442, 8.47%)等;与胁迫响应相关的有核酸结合转录因子(1 708, 3.26%)、转录因子活性(602, 1.15%)、抗氧化活性(467, 0.89%)。细胞组分包含21个功能亚类,主要类型有细胞(18 463, 63.50%)、细胞组分(18 441, 63.42%)、细胞器(12 184, 41.90%)、高分子复合物(11 475, 39.46%)及细胞膜(9 248, 31.81%)等。该结果表明马尾松参与了广泛的细胞过程和代谢活动,几乎涵盖所有干旱胁迫过程的生命活动。

    Figure 1.  GO categorization of non-redundant unigenes in P. massoniana transcriptome.

    Figure 2.  KOG annotation of putative proteins in P. massoniana transcriptome.

    Figure 3.  KEGG annotation of putative proteins in P. massoniana transcriptome.

    KOG功能分类显示,35 880个Unigene比对到同源序列,占总数18.42%;共获得39 989个注释,涉及全部的26个KOG功能类别(图 2)。其中,一般功能预测比例最大(6 179,17.22%);随后依次为翻译后修饰、蛋白翻转、分子伴侣(4 486,12.50%),翻译、核糖体结构和生物合成(3 197, 8.91%),能量产生和转化(2 962,8.26%),信号转导机制(2 659,7.41%)、脂类转运及代谢(2 129,5.93%)等;而胞外结构和细胞迁移的比例最小,分别仅涉及72和23个Unigenes。

    KEGG注释结果显示:30 882个Unigene获得KO注释,占总数15.85%,涉及284条代谢途径(图 3)。其中,Unigene注释最多的代谢途径主要涉及糖代谢(4 040,13.08%)、氨基酸代谢(3 082,9.98%)、翻译(2 970,9.62%)、信号转导(2 817,9.12%)、能量代谢(2 310,7.48%)、脂类代谢(2 118,6.86%)等各类代谢及环境适应,表明干旱胁迫下马尾松的各类代谢活动、信号转导过程非常活跃。

  • 通过Misa搜索,从194 821个Unigenes中获得6 728个SSR位点,分布于6 367个Unigenes中,其中,6 031个Unigenes只含1个SSR位点,336个Unigenes含多个(≥2)SSR位点,出现频率3.45%,平均距离15.97 kb(表 1)。SSR出现频率随Unigenes长度的增加而增加,在各Unigenes长度分组中依次为1.64%、2.61%、4.77%、8.51%及15.71%。

    长度
    Length/bp
    基因数
    Gene number
    含SSR的基因数
    Number of unigene contained SSR
    SSR位点数
    Number of SSR loci
    出现频率
    Frequency/%
    平均距离
    Mean distance/kb
    ≤300 97 486 1 569 1 597 1.64 67.30
    301500 45 841 1 156 1 197 2.61 89.79
    5011 000 28 051 1 266 1 337 4.77 80.38
    1 0012 000 15 081 1 188 1 283 8.51 83.77
    ≥2 001 8 362 1 188 1 314 15.71 81.79
    合计Total 194 821 6 367 6 728 3.45 15.97

    Table 1.  Number, frequency and mean distance of SSR in different unigene length distribution of P. massoniana

  • 马尾松转录组中单核~六核苷酸的SSR重复类型均有分布(表 2)。单、二、三核苷酸重复类型的出现频率占优势,共6 329个,占总SSR位点的94.07%, 其中,单核苷酸最多,为2 410个,占35.82%,其次三核苷酸为2 222个,占33.03%;二核苷酸为1 697个,占25.22%;其余四、五、六核苷酸重复类型的数量较少,分布相对分散。

    重复类型Repeat type 数目Number 比例Percentage/% 平均长度Mean length/bp 平均距离Mean distance/kb 出现频率Mean frequency/%
    单核苷酸Mononucleotide 2 410 35.82 11.31 44.59 1.24
    二核苷酸Dinucleotide 1 697 25.22 14.17 63.33 0.87
    三核苷酸Trinucleotide 2 222 33.03 16.21 48.37 1.14
    四核苷酸Quadnucleotide 108 1.61 21.36 995.12 0.05
    五核苷酸Pentanucleotide 17 0.25 25.00 6 321.94 0.01
    六核苷酸Hexanucleotide 21 0.31 39.16 5 117.76 0.01
    复合型Composition 253 3.76 70.49 424.79 0.13
    合计Total 6 728 100.00 16.09 15.97 3.45

    Table 2.  Occurrence SSR in P. massoniana transcriptome

  • 马尾松转录组SSR包含70种重复基序,单核至六核苷酸的重复基序分别为2、4、10、22、12、20种。出现频率以单核苷酸A/T (2 332个,占34.66%),二核苷酸AT/AT(791个,占11.76%)、AG/CT(579个,占8.61%)、AC/GT(392个,占5.83%),三核苷酸AGC/CTG(443个,占6.58%)、AAG/CTT(317个,占4.71%)较多,其余基序频率均相对较低(表 3)。

    重复基序Motif 数目Number 频率Frequency/%
    A/T 2 332 34.66
    C/G 90 1.34
    AC/GT 392 5.83
    AG/CT 579 8.61
    AT/AT 791 11.76
    CG/CG 22 0.33
    AAC/GTT 260 3.86
    AAG/CTT 317 4.71
    AAT/ATT 239 3.55
    ACC/GGT 186 2.76
    ACG/CGT 110 1.63
    ACT/AGT 32 0.48
    AGC/CTG 443 6.58
    AGG/CCT 244 3.63
    ATC/ATG 205 3.05
    CCG/CGG 231 3.43
    其余Other 255 3.79

    Table 3.  The number and frequency of the motifs in SSR of P. massoniana transcriptome

  • 马尾松转录组SSR基序的重复次数介于5~23次之间,随重复次数增加,SSR数量呈递减趋势(表 4)。5~10次重复的SSR位点数5 447个,占总数80.97%;11次重复及以上的SSR位点数1 180个,占17.54%。其中,单核苷酸以10次重复的基序(1 284个,占19.09%)最多,二核苷酸以6次重复的基序(820个,占12.19%)最多,三~六核苷酸中均以5次重复的基序最多。SSR的长度从10~212 bp不等,长度为10~20 bp的SSR位点最多,共6 149个,占SSR位点总数的91.39%;长度大于20 bp的共579个,占8.61%(表 5)。

    重复类型
    Repeat type
    重复次数
    repeat number
    合计
    Total
    比例
    Percentage/%
    5 6 7 8 9 10 11 12 13 14 15 >15
    单核苷酸Mononucleotide 1 284 491 248 119 79 48 153 2 422 36.00
    二核苷酸Dinucleotide 820 392 298 161 75 36 2 1 784 26.52
    三核苷酸Trinucleotide 1 563 503 178 21 1 1 2 267 33.70
    四核苷酸Quadnucleotide 89 26 2 1 1 119 1.77
    五核苷酸Pentanucleotide 15 15 0.22
    六核苷酸Hexanucleotide 10 3 2 3 1 1 20 0.30
    合计Total 1 677 1 352 574 323 162 1 359 530 250 119 80 48 153 6 627 98.50
    比例Percentage/% 24.93 20.10 8.53 4.80 2.41 20.20 7.88 3.72 1.77 1.19 0.71 2.27 98.50

    Table 4.  The number of repeats and distribution of the motifs in SSR of P. massoniana transcriptome

    项目
    Item
    基序长度motif length/bp 合计
    Total
    10 11 12 13 14 15 16 17 18 19 20 >20
    SSR数量Number of SSR 1 280 492 1 043 119 452 1 583 314 23 650 18 175 579 6 728
    频率Frequency/% 19.02 7.31 15.5 1.77 6.72 23.53 4.67 0.34 9.66 0.27 2.6 8.61 100

    Table 5.  The length and frequency of the motifs in SSR of P. massoniana transcriptome

  • 从含SSR位点的6 367个Unigenes中成功筛选4 446个Unigenes并设计出13 338对SSR引物。其中,引物长度18~27 bp,GC含量40%~55%,退火温度(Tm)57~63℃,正、反向引物退火温度差低于5℃,PCR产物大小100~280 bp。随机筛选11对SSR引物进行PCR扩增检测,5对引物能有效扩增出PCR产物(图 4),引物转化率45.5%,表明引物具有一定可行性。

    Figure 4.  The amplification of SSR primers from transcriptome of P. massoniana

  • 基于转录组功能注释和SSR位点挖掘,对含SSR位点的6 367个Unigenes进行差异表达分析,共获得422个差异表达Unigene (图 5)。干旱胁迫10、15、25 d与正常供水相比,分别有325、147、183个差异表达Unigene, 其中,上调表达分别为196、66、87个,下调表达分别为129、81、96个;特异性差异表达Unigene分别为181、21、60个,不同程度胁迫的共差异表达Unigene为73个。其中胁迫10 d的差异表达Unigene数量最多,表明干旱10 d时马尾松响应胁迫的各途径中Unigene表达量丰富。

    Figure 5.  Venn diagram of the differentially expressed genes identified in three comparisons

  • 进一步确定差异表达Unigene行使的主要生物学功能以及参与的主要代谢途径及信号转导通路。GO显著性富集分析发现,422个含SSR位点的差异Unigenes中有261个参与了3大类生物学功能。生物学过程中,有机环化物合成(GO:1901362)富集的Unigenes最多(51),其次为氧化还原过程(GO:0055114)(43)、新陈代谢调控(GO:0019222)(37);分子功能中,氧化还原酶活性(GO:0016491)富集的Unigenes最多(42),其次为转运活性(GO:0005215)(29)、核酸结合转录因子(GO:0001071)(16);细胞组分中,胞外区(GO:0005576)富集的Unigenes最多(18),其次为线粒体内膜蛋白复合物(GO:0098800)等,表明上述显著富集的生物学功能可能涉及马尾松干旱胁迫响应过程。

    KEGG显著性富集分析发现(图 6),422个含SSR位点的差异Unigenes中有97个被富集到53个代谢途径中,其中, 光合作用(ko00195)、类胡萝卜素合成(ko00906)、植物激素信号传导(ko040753)等3个代谢途径被显著富集(P<0.05),表明这3个代谢途径与马尾松干旱逆境应答相关。光合作用途径富集了4个含SSR位点Unigenes,包括2个ATP合成酶(ATP synthase; c94519_g2, c88154_g1),1个光系统Ⅱ(photosystem Ⅱ; c77320_g1),1个氧化还原酶(oxidoreductase; c85918_g1),均下调表达。类胡萝卜素合成途径富集了3个含SSR位点Unigenes,1个铁离子(iron ion binding;c89714_g1)呈上调表达,其余1个氧化还原酶(oxidoreductase; c78714_g)和1个黄素腺嘌呤二核苷酸(FAD; c69125_g1)均下调表达。植物激素信号传导途径富集了4个含SSR位点Unigenes,1个茉莉酸(JA; c88597_g2)呈上调表达,其余1个蛋白磷酸酶(PP2C; c68631_g1)和2个生长素(IAA; c92989_g1, c77087_g3)均下调表达。上述结果表明,11个含SSR位点Unigenes可能参与了马尾松干旱响应过程。一方面马尾松光合作用明显减弱,生理、生长变缓; 另一方面,马尾松启动干旱防御保护机制,通过上调JA表达、下调PP2C表达,延迟干旱损伤。结合转录组SSR位点数据,筛选出上述11个重要干旱响应基因的SSR位点信息。

    Figure 6.  Significant enrichment KEGG pathways of differentially expressed genes contained SSR

  • qRT-PCR结果(图 7)显示,随干旱持续,2个Unigenes (c71819_g3、c85755_g1)的基因表达量呈递减变化,2个Unigenes (c95186_g2、c93699_g2)呈先升后降变化。3个Unigenes (c71819_g3、c85755_g1、c95186_g2)的qRT-PCR变化与转录水平DEG的变化基本一致;1个Unigene (c93699_g2)在第10天和第15天的qRT-PCR扩增倍数高于转录水平DEG的变化倍数,但二者变化趋势一致;说明转录组结果有效可靠。

    Figure 7.  Real-time PCR Validations of P. massoniana

3.   讨论
  • 基于高通量测序技术,本研究获得马尾松干旱胁迫转录组194 821个Unigene,通过Blast比对,获得101 806个Unigene序列注释,远高于马尾松均一化测序获得的33 772个Unigene注释量[21],表明组装效果好, 注释信息丰富。NR数据库24.42%的Unigene(16 323个)被注释到系统进化关系紧密的云杉属和松属,且在所有注释物种中其匹配数量最多,表明序列注释结果较好,注释成功率较高。

    Unigene功能注释及分类,可初步确定其编码的蛋白质功能,是深入解析转录组信息的前提和基础。本研究中,COG分类涉及全部26个功能类别,表明注释信息全面,几乎涵盖马尾松整个生命过程。GO分类中,大部分Unigene参与初生代谢、细胞结构、生物调控、胁迫刺激响应、信号等生物学过程,表明多数Unigene生理活动与干旱响应有关。KEGG分类中,多数Unigene参与次生代谢、植物激素合成、信号转导通路,表明被注释Unigene可能参与各类干旱胁迫响应过程。这些Unigene的发掘为后续基因功能验证、抗逆机制研究奠定了基础。

  • 基于转录组194 821个Unigene,本研究检测出SSR位点平均距离15.97 kb,出现频率3.45%;高于马尾松基因组的3.2 %[2],低于马尾松近缘种EST-SSR的4.08%[17]。与其他松树相比,高于海岸松(Pinus pinaster Ait.)的2.1% [22],低于红松的4.24%[5]。总体上,针叶树SSR出现频率及变化幅度较小,说明同类植物变化趋势接近;其差异主要与物种基因组大小、含SSR的基因比例以及转录时含SSR基因的表达丰度有关。与阔叶树相比,远低于桉树(Eucalyptus robusta Smith)的14.99%[23],说明遗传距离较远的物种,SSR出现频率差异较大,其差异可能与其进化地位有关。松属起源于2.5亿年前的中生代三叠纪[24],远早于起源距今3 650~6 500万年的桉属[23],松属在长期进化积累及自然选择压力下趋于稳定,其基因组进化速度及变异程度相应小于起源较晚的桉属。由此可知,马尾松SSR发生频率较低,印证了松属SSR分布较低的观点[25]

    物种SSR重复类型多数以两、三核苷酸为主[26],由于三核苷酸突变不易引起物种突变,面对重大突变压力时,物种更倾向选择三核苷酸[27], 且起源越早、压力选择累积越多的物种,其三核苷酸重复类型的富集越明显[28],如起源较早的火炬松(Pinus taeda L.)[29]、地中海松[4]等。本研究发现,马尾松三核苷酸重复类型频率较高,占SSR总数33.03%,与上述研究结果相似,表明松科植物的自然选择机制具有明显的趋同倾向。此外,马尾松的单核苷酸重复类型频率也较高,占SSR总数35.82%,但与刘公秉[17]的24.49%差异较大,可能与SSR位点重复次数的阈值有关,刘公秉的单核苷酸阈值为15次而本研究为10次,阈值不同将导致检测到的单核苷酸比例不同。综上所述,马尾松SSR重复类型以单、三核苷酸为主。

    在数量足够大、无偏倚性的理想情况下,4种碱基随机组合产生的二至五核苷酸的重复基序分别为4、10、33、102种[30]。本研究马尾松二~六核苷酸的重复基序分别为4、10、22、12、20种,且以AT、AG、AC、AGC、AAG为主,存在明显的偏倚性, 可能与SSR高级基元自身长度的限制有关[31]; 同时,许多松树如火炬松[29]、地中海松[4]也存在类似的偏倚性, 故这种偏倚性还可能与松属固有的遗传特性有关。

    SSR产生于DNA复制过程中的碱基错配,短序列SSR(12≤L<20 bp)的碱基错配率小,其突变率远低于长序列SSR (L≥20 bp)[32]。本研究中,20 bp以下基序占SSR总数91.39%,20 bp以上基序仅占8.61%,以短序列基序为主,表明马尾松SSR具有较强的碱基错配修复能力,为马尾松精准化、规模化SSR标记开发提供了重要保障。基于转录组数据大规模设计SSR备选引物13 338对,随机引物有效扩增率45.5%,这一结果处于Bai等的50%[2]和刘公秉等的37.78%[17]之间,表明本研究开发的SSR引物具有一定的有效性和通用性。下一步将针对目的基因进行引物筛选,可为马尾松分子辅助育种和遗传多样性研究奠定基础。

  • 转录组SSR锚定基因编码序列,表征具体功能,深入挖掘转录组SSR标记功能信息,可实现直接与目的性状靶向标记[6-7]。为快速搜寻马尾松具有抗旱功能的SSR标记,本研究筛选出具有差异表达的422个含SSR位点Unigene,并进行GO和KEGG富集分析,深度挖掘其干旱胁迫下主要的生物过程及生化代谢和信号途径。KEGG发现,光合作用、类胡萝卜素合成以及植物激素信号传导等3个代谢通路被显著富集,涉及11个差异表达的SSR位点Unigene,表明其可能与干旱逆境应答直接关联。锁定这11个功能表达Unigene,从转录组中获得对应的SSR位点信息,可为后续马尾松抗旱SSR标记指纹筛选及抗旱优良种质的选育等研究奠定基础。

    编码JA(c88597_g2)和PP2C(c68631_g1)的2个Unigene在植物激素信号传导通路中被显著富集。JA作为启动植物防御机制的重要信号,通过上调表达激活防御反应途径,诱导下游防御基因转录或表达,最终产生防御物质以延缓或抵制逆境伤害[33]。本研究编码JA的Unigene亦呈上调表达,并参与了植物激素信号传导的代谢反应。此外,编码PP2C的Unigene被诱导下调表达,参与ABA信号转导,与苜蓿(Medicago sativa Linn)[34]、玉米(Zea mays L.)[35]等在逆境胁迫下PP2C的变化趋势相同,表明PP2C作为重要的功能基因,通过负调节方式直接或间接参与马尾松在干旱胁迫下的信号转导。因此,JA和PP2C可能参与了马尾松的干旱逆境应答。针对这2个含SSR位点的Unigene, 未来可进一步构建抗旱、不抗旱品系的F2群体,将BSA法与SSR标记分析相结合,深入开展抗旱基因功能定位研究。

4.   结论
  • 本研究从马尾松干旱胁迫转录组中获得101 806条具有注释的Unigene,丰富了马尾松基因信息资源;从6 367个Unigene中搜寻到6 728个SSR位点,并初步设计13 338对SSR标记引物,为马尾松SSR分子标记规模化开发及遗传多样性研究奠定了基础;422个含SSR位点的差异表达基因参与了3个与干旱响应关联的代谢途径,包括植物激素信号传导、光合作用、类胡萝卜素合成,从中筛选出11个重要的SSR功能位点,为马尾松抗旱分子机制研究,特别是抗旱功能基因的定位研究奠定了基础。

Reference (35)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return