Welcome, 
Our resources, Your power    
资讯提要RSS

 

查看大图  

标题:NCBI UniGene系统介绍

浏览次数:41


NCBI UniGene系统介绍
UniGene是从属于GeneBank的一部分,专门收集非冗余性的基因来源的clusters数据。每一个UniGene Cluster包含代表单一基因的序列和相关的信息,例如基因表达的组织类型和图谱定位信息。
除了这些具有具有特征的序列以外,成千上万的EST也被收录在内。因此,相应的,这些收集的资源可以作为基因发现的来源。现在,许多实验室研究人员已经利用UniGene进行大规模的基因表达图谱分析,并且所有这些序列并没有被用来尝试产生Contigs或Consensus。这里存在一些原因解释为什麽同属于某一基因的序列不用来产生一个单一的Contig.
1. 所有属于同一基因的剪切变异被放在同一聚类中。
2. 来从同一cDNA克隆的EST序列,通常都有5’和3’端的序列,但这些序列并不都具有重叠部分。
当前,Unigene已收录了Human 、Rat、Mouse和Cow、Zebrafish的序列。选择这些物种是因为它们有大量的EST可用数据,其它物种的序列将在今后陆续被加入。
UniGene的数据可通过FTP下载(ftp://ftp.ncbi.nih.gov/repository/UniGene)。
UniGene Build Procedure:
聚类是一个发现同属于一个大类中的小的亚序列的过程,可通过转换离散相似值为序列之间的布尔数学体系联系。也就是说,如果序列间的相似性超过某一阈值,则认为它们具有相关性。UniGene clustering 对于这种相关性分析提供更多的生物学意义上的考虑,聚类过程大致如下:
1. 对序列中的载体、寡核苷酸、重复片段以及线粒体、核糖体等污染序列进行过滤。去除污染序列之后的序列至少要含有100bp的带有信息的序列才可作为候选序列。
2. Gene links
从属于同一基因的序列(包括 mRNA or genome sequences, 完整的 CDS)彼此之间要进行比较,足够相似的序列被放在一起,形成初始的clusters。
3. EST to Gene links and EST to EST
通过megablast,EST与属于同一基因的序列进行比较,足够相似的序列 被加入到这些clusters。如果某一个序列,表现为可同时放在2个不同clusters,但不能把2个clusters联成一个cluster,这样的序列必须被剔除。另外,如果在clusters有2个以上的带有 3’末端标记EST或在Cluster中没有带有poly(A)信号的序列,这样的Clusters必须被抛弃。
经过这些标准筛选得到的clusters,称为锚定clusters 。因为这些clusters具有3’ 末端序列,并被假定为已知的。
4.根据克隆信息设定Cluster的边界。
这样可以确保如果同属于一个Cluster的5’末端和3’末端ESTs即使中间没有重叠的片段将它们联系起来,也能认定它们属于同一个cluster。如果在一个Cluster中有两个3’末端ESTs,那么就可在相同的克隆中找到两个5’末端EST,并将其放到同一个Cluster中。并且可以提供Clusters之间合并的信息。
由于新的序列数据的加入和每星期的不断更新,因此在UniGene中的resulting Cluster每星期也随之重新整理,不断更新。Clusters之间会发生融合,因此使用Cluster的ID作为标识,是不明智之举,最好利用GB accession numbers比较安全。
目前,在UniGene中包括有48,000clustes,Clusters 大部分依据EST序列形成,每一个Cluster代表一个human gene 的转录子,当前估计人类基因组约有80,000到100,000基因。利用UniGene Clusters的一个重要的目的是识别新的,非冗余的候选b表达图谱,为产生一个转录子图谱-识别基因组所有的编码序列。 

一篇发表在NCBI NEWS(1997年8月)上的文章对clustering算法以及UniGene项目作了介绍,为我们提供了了解UniGene&Transcript Map project的背景知识(see Schuler et al., 1996, below).
其它参考文献:
Schuler (1997). Pieces of the puzzle: expressed sequence tags and the catalog of human genes. J Mol Med 75(10), 694-698.
Schuler et al. (1996). A gene map of the human genome. Science 274, 540-546.
Boguski & Schuler (1995). ESTablishing a human transcript map. Nature Genetics 10, 369-371.

















UniGene Statistics
Species
UniGene entries
Cutoff Date*
Release Date*

Chordata
Mammalia
Bos taurus (cow)
45364
Jun 19 2012 Jul 9 2012
Canis lupus familiaris (dog)
24459
May 3 2010 Jul 8 2010
Capra hircus (domestic goat)
38271
Nov 29 2011 Dec 20 2011
Equus caballus (horse)
6289
Mar 17 2011 May 7 2012
Homo sapiens (human)
130029
Aug 11 2012 Sep 4 2012
Macaca fascicularis (crab-eating macaque)
12546
Aug 28 2011 Nov 7 2011
Macaca mulatta (rhesus monkey)
26849
Aug 5 2012 Sep 10 2012
Monodelphis domestica (gray short-tailed opossum)
662
Oct 24 2010 Jan 31 2012
Mus musculus (mouse)
80383
Jun 20 2012 Jul 11 2012
Ornithorhynchus anatinus (platypus)
749
Nov 26 2009 Jul 27 2010
Oryctolagus cuniculus (rabbit)
7298
Mar 25 2012 Apr 3 2012
Ovis aries (sheep)
17318
Aug 27 2011 Nov 21 2011
Pan troglodytes (chimpanzee)
2412
Mar 15 2011 Mar 28 2011
Papio anubis (olive baboon)
11659
Mar 8 2011 Nov 7 2011
Peromyscus maniculatus (deer mouse)
11045
Mar 9 2010 Aug 3 2011
Pongo abelii (Sumatran orangutan)
7460
Nov 20 2010 Dec 27 2010
Rattus norvegicus (Norway rat)
67262
Jun 23 2012 Sep 4 2012
Sus scrofa (pig)
50106
Jun 19 2012 Jul 9 2012
Trichosurus vulpecula (silver-gray brushtail possum)
11405
Apr 3 2011 Aug 8 2011

Actinopterygii
Danio rerio (zebrafish)
53558
Jun 19 2012 Jul 9 2012
Fundulus heteroclitus (killifish)
5440
Feb 3 2012 Mar 5 2012
Gadus morhua (Atlantic cod)
41275
Jul 11 2012 Sep 6 2012
Gasterosteus aculeatus (three spined stickleback)
16728
Apr 4 2011 Dec 30 2011
Ictalurus furcatus (blue catfish)
17357
Sep 18 2008 Nov 3 2011
Ictalurus punctatus (channel catfish)
30185
Mar 14 2012 Apr 16 2012
Oncorhynchus mykiss (rainbow trout)
117120
Jul 29 2011 Oct 11 2011
Oreochromis niloticus (Nile tilapia)
17426
Feb 28 2011 Mar 9 2011
Oryzias latipes (Japanese medaka)
21803
Aug 28 2011 Nov 2 2011
Pimephales promelas (fathead minnow)
20664
May 18 2011 Aug 23 2011
Salmo salar (Atlantic salmon)
29820
Jul 28 2011 Dec 12 2011
Takifugu rubripes (pufferfish)
3800
Jul 2 2011 Aug 10 2011

Amniota
Anolis carolinensis (green anole)
25135
Nov 11 2011 Dec 12 2011

Amphibia
Xenopus laevis (African clawed frog)
31431
Aug 19 2012 Sep 4 2012
Xenopus tropicalis (western clawed frog)
43712
Oct 6 2010 Oct 21 2010

Ascidiacea
Ciona intestinalis
28121
Aug 5 2011 Dec 15 2011
Ciona savignyi
7350
Jan 27 2010 Aug 10 2011
Molgula tectiformis
8755
Oct 16 2008 Aug 11 2011

Aves
Gallus gallus (chicken)
33850
Aug 18 2012 Sep 26 2012
Meleagris gallopavo (turkey)
1406
Nov 2 2011 Jan 30 2012
Taeniopygia guttata (zebra finch)
13733
Jun 3 2011 Aug 12 2011

Cephalochordata
Branchiostoma floridae (Florida lancelet)
15165
Apr 3 2011 Aug 12 2011

Hyperoartia
Petromyzon marinus (sea lamprey)
10622
May 18 2011 Aug 15 2011

Echinodermata
Echinoidea
Paracentrotus lividus (common urchin)
8313
Jun 3 2011 Aug 15 2011
Strongylocentrotus purpuratus (purple sea urchin)
14718
Sep 5 2010 Aug 15 2011

Arthropoda
Arachnida
Ixodes scapularis (black-legged tick)
19405
Jul 30 2009 Jun 23 2010
Tetranychus urticae (two-spotted spider mite)
7177
Oct 27 2010 Jan 6 2011

Branchiopoda
Daphnia pulex (common water flea)
14177
Jul 30 2009 May 26 2010

Insecta
Acyrthosiphon pisum (pea aphid)
93757
Nov 23 2010 Feb 17 2011
Aedes aegypti (yellow fever mosquito)
16680
Jun 2 2010 Aug 15 2011
Anopheles gambiae (African malaria mosquito)
13066
Oct 17 2010 Dec 21 2010
Aphis gossypii (cotton aphid)
7467
Oct 28 2010 Jan 25 2011
Apis mellifera (honey bee)
24392
Oct 21 2010 Nov 24 2010
Bicyclus anynana (squinting bush brown)
4615
Nov 5 2010 Aug 15 2011
Bombyx mori (domestic silkworm)
13952
Feb 25 2012 Mar 5 2012
Culex quinquefasciatus (house mosquito)
5021
Apr 29 2010 Sep 30 2010
Dendroctonus ponderosae (mountain pine beetle)
6783
May 24 2010 Mar 14 2011
Drosophila melanogaster (fruit fly)
17127
Jan 13 2011 Mar 2 2012
Drosophila simulans
7041
Dec 22 2009 Aug 3 2011
Glossina morsitans
7521
May 20 2010 Aug 3 2011
Nasonia vitripennis (jewel wasp)
15445
Jul 29 2010 Aug 16 2010
Tribolium castaneum (red flour beetle)
6852
Mar 13 2012 Apr 3 2012

Malacostraca
Litopenaeus vannamei (Pacific white shrimp)
7738
Jul 1 2011 Dec 27 2011

Maxillopoda
Lepeophtheirus salmonis
9363
Nov 2 2010 Jan 6 2011

Mollusca
Gastropoda
Aplysia californica (California sea hare)
24709
Jul 8 2011 Nov 30 2011
Lottia gigantea
15623
Jan 19 2009 May 26 2010

Annelida
Hirudinida
Helobdella robusta
6973
Dec 22 2009 Feb 5 2010

Polychaeta
Alvinella pompejana
14191
Mar 31 2009 Dec 10 2010

Nematoda
Chromadorea
Ancylostoma caninum (dog hookworm)
7041
Oct 2 2011 Feb 1 2012
Caenorhabditis elegans (nematode)
22358
Jan 13 2011 Aug 17 2011

Platyhelminthes
Trematoda
Schistosoma japonicum
10483
Apr 10 2011 Aug 17 2011
Schistosoma mansoni
12517
Jul 28 2011 Dec 27 2011

Turbellaria
Schmidtea mediterranea
10265
Jul 1 2011 Aug 30 2011

Porifera
Demospongiae
Amphimedon queenslandica
6211
Dec 30 2009 Dec 30 2011

Cnidaria
Anthozoa
Nematostella vectensis (starlet sea anemone)
14574
Nov 25 2010 Aug 18 2011

Hydrozoa
Clytia hemisphaerica
4637
Oct 27 2010 Dec 29 2010
Hydra magnipapillata
10473
Apr 4 2011 Feb 2 2012

Ascomycota
Eurotiomycetes
Coccidioides posadasii
7075
Aug 19 2010 Aug 18 2011

Pezizomycetes
Tuber melanosporum
7543
Oct 28 2010 Jan 6 2011

Sordariomycetes
Gibberella moniliformis
5316
Nov 1 2009 Aug 17 2011
Magnaporthe grisea
13135
Apr 28 2009 Aug 31 2011
Neurospora crassa
17073
Mar 9 2011 Aug 19 2011

Basidiomycota
Heterobasidiomycetes
Filobasidiella neoformans
5042
Oct 2 2010 Jan 5 2011

Codonosigidae
Monosiga
Monosiga ovata
5265
Apr 27 2010 Aug 18 2011

Streptophyta
Bryopsida
Physcomitrella patens
17573
Dec 2 2011 Feb 9 2012

Coniferopsida
Picea glauca (white spruce)
27848
Oct 17 2011 Nov 4 2011
Picea sitchensis (Sitka spruce)
19944
Mar 8 2011 May 12 2011
Pinus taeda (loblolly pine)
17379
Dec 31 2010 Aug 19 2011

Eudicotyledons
Aquilegia formosa x Aquilegia pubescens
7735
Dec 9 2005 Aug 19 2011
Arabidopsis thaliana (thale cress)
30633
Jun 14 2012 Jul 9 2012
Arachis hypogea (peanut)
52468
Jul 18 2012 Sep 5 2012
Artemisia annua (sweet wormwood)
86708
Nov 2 2011 Dec 12 2011
Brassica napus (rape)
26117
May 12 2011 Jan 3 2012
Brassica oleracea
14286
Jan 12 2011 Mar 9 2011
Brassica rapa (field mustard)
14497
May 31 2012 Jun 21 2012
Capsicum annuum
8731
May 26 2011 Aug 22 2011
Carica papaya
6992
Apr 10 2011 Aug 2 2011
Citrus clementina
8994
Jun 27 2010 Aug 22 2011
Citrus sinensis (Valencia orange)
15936
Jan 4 2011 May 5 2011
Coffea canephora (robusta coffee)
5231
Jan 1 2011 Aug 22 2011
Glycine max (soybean)
35982
Feb 26 2012 Apr 9 2012
Gossypium hirsutum (upland cotton)
20085
Jul 2 2011 Aug 22 2011
Gossypium raimondii
3174
Jan 27 2010 Aug 22 2011
Helianthus annuus (sunflower)
11614
Jun 7 2012 Jun 20 2012
Lactuca sativa (garden lettuce)
45536
Jun 29 2011 Oct 11 2011
Lotus japonicus
20128
Mar 1 2011 Aug 22 2011
Malus x domestica (apple)
22493
Nov 10 2011 Jan 3 2012
Manihot esculenta (cassava)
9667
Mar 1 2011 Aug 2 2011
Medicago truncatula (barrel medic)
18045
Nov 21 2011 Jan 31 2012
Mimulus guttatus (spotted monkey flower)
13108
May 27 2009 Sep 21 2010
Nicotiana tabacum (tobacco)
24432
Jan 12 2011 Feb 10 2011
Phaseolus vulgaris
6686
Dec 22 2009 Feb 23 2010
Populus tremula x Populus tremuloides (hybrid aspen)
9352
Nov 23 2009 Nov 30 2011
Populus trichocarpa (western balsam poplar)
15056
Nov 10 2011 Jan 3 2012
Prunus persica (peach)
7481
Dec 27 2010 Dec 27 2011
Quercus robur (truffle oak)
7170
Sep 10 2010 Jan 11 2011
Raphanus raphanistrum (wild radish)
18729
Dec 1 2008 Aug 25 2011
Raphanus sativus (radish)
17181
Jul 15 2011 Sep 16 2011
Solanum lycopersicum (tomato)
18051
Aug 2 2012 Sep 17 2012
Solanum melongena (eggplant)
9060
Nov 2 2011 Jan 3 2012
Solanum tuberosum (potato)
18189
Mar 3 2011 Sep 17 2012
Theobroma cacao
24263
May 18 2010 Jan 3 2012
Vigna unguiculata (cowpea)
15365
Jan 16 2011 Sep 27 2011
Vitis vinifera (wine grape)
22101
Mar 25 2012 Apr 5 2012

Isoetopsida
Selaginella moellendorffii
10646
Feb 1 2012 Jun 21 2012

Liliopsida
Brachypodium distachyon
10698
Dec 9 2009 Sep 6 2011
Festuca pratensis (meadow ryegrass)
7046
May 14 2009 Sep 19 2011
Hordeum vulgare (barley)
26944
Jun 17 2011 Jul 28 2011
Oryza sativa (rice)
44118
Jun 5 2012 Jul 9 2012
Panicum virgatum (switchgrass)
22961
Oct 7 2010 Jan 3 2012
Saccharum officinarum (sugarcane)
15394
Apr 4 2011 Aug 3 2011
Sorghum bicolor (sorghum)
13733
Jun 29 2011 Sep 19 2011
Triticum aestivum (Wheat)
56954
Jan 5 2012 Mar 19 2012
Zea mays (maize)
91964
Jun 2 2012 Sep 4 2012

Chlorophyta
Chlorophyceae
Chlamydomonas reinhardtii
7570
Jul 10 2011 Sep 19 2011
Volvox carteri
5329
Aug 16 2010 Sep 20 2011

Apicomplexa
Coccidia
Toxoplasma gondii
6237
Apr 19 2012 Aug 9 2012

Bacillariophyta
Bacillariophyceae
Phaeodactylum tricornutum
7883
Apr 3 2011 Sep 26 2011

Oomycetes
Peronosporales
Phytophthora infestans (potato late blight agent)
8920
Jul 29 2010 Sep 26 2011

Pythiales
Pythium ultimum
6663
Jan 26 2011 Mar 14 2011

Dictyosteliida
Dictyostelium
Dictyostelium discoideum (slime mold)
6187
Jul 28 2010 Sep 26 2011

Ciliophora
Oligohymenophorea
Paramecium tetraurelia
5230
Jan 7 2011 Sep 26 2011
Tetrahymena thermophila
5974
Nov 18 2010 Oct 18 2011

* The cutoff date is the modification date of the most recent sequence included in a released UniGene dataset. In order to keep ancillary information such as gene names and reported protein alignments timely, UniGene releases are generated even when there is no new sequence data for an organism.
 

 Copyright@2009|Powered by YesLab|版权所有|网站使用条款和隐私声明|联系我们|SEARCH|SITEMAP|ICP2022007114