2002³â 10¿ù 1ÀÏ
Copyright © 2002 by (ÁÖ)´º·ÎÁ¦³Ø½º
ÀÌ ¹®¼´Â EnCyclon¿¡¼ »ç¿ëÇÏ´Â Neurogenex Sequence Format(ÀÌÇÏ NGSF)ÀÇ ±¸¼ºÀ» ±â¼úÇÑ ¹®¼ÀÔ´Ï´Ù.
¹®¼³»¿¡¼ ¿À·ù¸¦ ¹ß°ßÇÏ½Ã¸é ¹®¼´ã´çÀÚ <bio@neurogenex.com>¿¡°Ô ¿¬¶ôÁֽʽÿÀ
EnCyclon¿¡¼ »ç¿ëÇϱâÀ§ÇÏ¿© °³¹ßµÈ Neurogenex¿¡¼ °³¹ßÇÑ »õ·Î¿î ½ÃÄö½º Æ÷¸äÀÔ´Ï´Ù.
EnCyclonÀ» °³¹ßÇϱâ À§ÇØ ±âÁ¸ÀÇ ½ÃÄö½º Æ÷¸äÀ» °ËÅäÇÏ¿© º¸¾ÒÀ¸³ª ÇÊ¿äÀÌ»óÀ¸·Î ºÒÇÊ¿äÇÑ ³»¿ëÀÌ ¸¹°í ¶ÇÇÑ º¹ÀâÇÏ¿© ¾²±â ¾î·Á¿ï »Ó¸¸¾Æ´Ï¶ó EnCyclon¿¡¼ ±¸ÇöÇÏ·Á°íÇÏ´Â ±â´ÉÀ» Ç¥ÇöÇϱ⿡´Â ºÎÁ·ÇÑ ºÎºÐÀÌ ¸¹¾Ò½À´Ï´Ù. ±×·¡¼ »õ·Î¿î ½ÃÄö½º Æ÷¸äÀ» °³¹ßÇÏ°Ô µÇ¾ú½À´Ï´Ù.
NGSF ½ÃÄö½º ÆÄÀÏÀÇ ÆÄÀÏ ±¸Á¶¸¦ ¼³¸íÇÕ´Ï´Ù.
NGSF ½ÃÄö½º ÆÄÀÏÀº ½ÃÄö½º Çì´õ¿Í ½ÃÄö½º µÎ ºÎºÐÀ¸·Î ÀÌ·ç¾îÁ® ÀÖ½À´Ï´Ù.
¸ðµç ½ÃÄö½º ÆÄÀÏÀº ½ÃÄö½º Çì´õ¿Í ½ÃÄö½º¸¦ °¡Áö¸ç, ½ÃÄö½º Çì´õ´Â »ý·«°¡´ÉÇÑ Çʵ尡 ÀÖ½À´Ï´Ù. ±×·¯³ª ½ÃÄö½º´Â »ý·«ÇÒ ¼ö ¾ø½À´Ï´Ù.
½ÃÄö½º Çì´õ Çʵå´Â 1Ä÷³¿¡¼ ½ÃÀÛÇÕ´Ï´Ù. ½ÃÄö½º Çì´õ Çʵ忡 Á¾¼ÓµÇ´Â Çʵå´Â 2~12 Ä÷³¿¡ ÀÖ¾î¾ßÇÕ´Ï´Ù.
ÇϳªÀÇ Çʵå´Â 1¿¿¡¼ ½ÃÀÛÇϴ Ű¿öµå·ÎºÎÅÍ ´ÙÀ½ Çʵ尡 ½ÃÀ۵DZâ Àü±îÁöÀÔ´Ï´Ù. ÀÌ·¸°Ô ÇϳªÀÇ Çʵ尡 ¿©·¯ ÁÙ·Î ±¸¼ºµÉ ¼ö ÀÖÁö¸¸ NG, NAME DATE, ENDS Ű¿öµå´Â ÇÑ ÁÙ·Î ±¸¼ºµË´Ï´Ù.
ÇÑ ÁÙ ÀÌ»óÀ¸·Î ±¸¼ºµÇ´Â Çʵå´Â ´ÙÀ½ ¶óÀÎÀÇ Ã¹ ¹øÂ° Ä÷³ÀÌ °ø¹é¹®ÀÚ[1]°¡ ¾Æ´Ñ ¹®ÀÚ°¡ ³ª¿Ã ¶§±îÁö ±× Ű¿öµå¿¡ ¼ÓÇÑ ³»¿ëÀ¸·Î ÀνÄÇÕ´Ï´Ù.
½ÃÄö½º´Â NG Çʵå·ÎºÎÅÍ ½ÃÀÛÇϸç, NGÇʵå´Â »ý·«ÇÒ ¼ö ¾ø½À´Ï´Ù. ÀÌÈÄ¿¡´Â ´Ù¸¥ Çì´õ Çʵ尡 ³ª¿É´Ï´Ù. NG¸¦ Á¦¿ÜÇÑ ´Ù¸¥ Çì´õ Çʵå´Â »ý·«ÀÌ °¡´ÉÇÕ´Ï´Ù.
½ÃÄö½º Çì´õ°¡ Á¾·áµÇ¸é ½ÃÄö½º°¡ ³ª¿À¸ç, ù ¹øÂ° Ä÷³ºÎÅÍ ½ÃÀÛÇÏ´Â "//"·Î ½ÃÄö½º Á¾·á¸¦ Ç¥½ÃÇÕ´Ï´Ù.
¿¹ 2-1. NGSF ÆÄÀÏ ¿¹Á¦
NG1.0 Genbank:J01749 circular dsDNA 4361 bp
DESCRIPTION Cloning vector pBR322, complete genome.
AUTHOR Gilbert,W.
DATE 2000/12/14
FEATURE
Source 1..1762 F pSC101
CDS 86..1276 F Tet
CDS 1915..2106 F ROP
CDS 3293..4153 R Amp
Promoter 27..33 R P1
Promoter 43..49 F P2
Promoter 4188..4194 R P3
SEQUENCE
1 ttctcatgtt tgacagctta tcatcgataa gctttaatgc ggtagtttat cacagttaaa
61 ttgctaacgc agtcaggcac cgtgtatgaa atctaacaat gcgctcatcg tcatcctcgg
121 caccgtcacc ctggatgctg taggcatagg cttggttatg ccggtactgc cgggcctctt
181 gcgggatatc gtccattccg acagcatcgc cagtcactat ggcgtgctgc tagcgctata
241 tgcgttgatg caatttctat gcgcacccgt tctcggagca ctgtccgacc gctttggccg
301 ccgcccagtc ctgctcgctt cgctacttgg agccactatc gactacgcga tcatggcgac
361 cacacccgtc ctgtggatcc tctacgccgg acgcatcgtg gccggcatca ccggcgccac
421 aggtgcggtt gctggcgcct atatcgccga catcaccgat ggggaagatc gggctcgcca
481 cttcgggctc atgagcgctt gtttcggcgt gggtatggtg gcaggccccg tggccggggg
541 actgttgggc gccatctcct tgcatgcacc attccttgcg gcggcggtgc tcaacggcct
601 caacctacta ctgggctgct tcctaatgca ggagtcgcat aagggagagc gtcgaccgat
661 gcccttgaga gccttcaacc cagtcagctc cttccggtgg gcgcggggca tgactatcgt
721 cgccgcactt atgactgtct tctttatcat gcaactcgta ggacaggtgc cggcagcgct
781 ctgggtcatt ttcggcgagg accgctttcg ctggagcgcg acgatgatcg gcctgtcgct
841 tgcggtattc ggaatcttgc acgccctcgc tcaagccttc gtcactggtc ccgccaccaa
901 acgtttcggc gagaagcagg ccattatcgc cggcatggcg gccgacgcgc tgggctacgt
961 cttgctggcg ttcgcgacgc gaggctggat ggccttcccc attatgattc ttctcgcttc
1021 cggcggcatc gggatgcccg cgttgcaggc catgctgtcc aggcaggtag atgacgacca
1081 tcagggacag cttcaaggat cgctcgcggc tcttaccagc ctaacttcga tcactggacc
1141 gctgatcgtc acggcgattt atgccgcctc ggcgagcaca tggaacgggt tggcatggat
1201 tgtaggcgcc gccctatacc ttgtctgcct ccccgcgttg cgtcgcggtg catggagccg
1261 ggccacctcg acctgaatgg aagccggcgg cacctcgcta acggattcac cactccaaga
1321 attggagcca atcaattctt gcggagaact gtgaatgcgc aaaccaaccc ttggcagaac
1381 atatccatcg cgtccgccat ctccagcagc cgcacgcggc gcatctcggg cagcgttggg
1441 tcctggccac gggtgcgcat gatcgtgctc ctgtcgttga ggacccggct aggctggcgg
1501 ggttgcctta ctggttagca gaatgaatca ccgatacgcg agcgaacgtg aagcgactgc
1561 tgctgcaaaa cgtctgcgac ctgagcaaca acatgaatgg tcttcggttt ccgtgtttcg
1621 taaagtctgg aaacgcggaa gtcagcgccc tgcaccatta tgttccggat ctgcatcgca
1681 ggatgctgct ggctaccctg tggaacacct acatctgtat taacgaagcg ctggcattga
1741 ccctgagtga tttttctctg gtcccgccgc atccataccg ccagttgttt accctcacaa
1801 cgttccagta accgggcatg ttcatcatca gtaacccgta tcgtgagcat cctctctcgt
1861 ttcatcggta tcattacccc catgaacaga aatccccctt acacggaggc atcagtgacc
1921 aaacaggaaa aaaccgccct taacatggcc cgctttatca gaagccagac attaacgctt
1981 ctggagaaac tcaacgagct ggacgcggat gaacaggcag acatctgtga atcgcttcac
2041 gaccacgctg atgagcttta ccgcagctgc ctcgcgcgtt tcggtgatga cggtgaaaac
2101 ctctgacaca tgcagctccc ggagacggtc acagcttgtc tgtaagcgga tgccgggagc
2161 agacaagccc gtcagggcgc gtcagcgggt gttggcgggt gtcggggcgc agccatgacc
2221 cagtcacgta gcgatagcgg agtgtatact ggcttaacta tgcggcatca gagcagattg
2281 tactgagagt gcaccatatg cggtgtgaaa taccgcacag atgcgtaagg agaaaatacc
2341 gcatcaggcg ctcttccgct tcctcgctca ctgactcgct gcgctcggtc gttcggctgc
2401 ggcgagcggt atcagctcac tcaaaggcgg taatacggtt atccacagaa tcaggggata
2461 acgcaggaaa gaacatgtga gcaaaaggcc agcaaaaggc caggaaccgt aaaaaggccg
2521 cgttgctggc gtttttccat aggctccgcc cccctgacga gcatcacaaa aatcgacgct
2581 caagtcagag gtggcgaaac ccgacaggac tataaagata ccaggcgttt ccccctggaa
2641 gctccctcgt gcgctctcct gttccgaccc tgccgcttac cggatacctg tccgcctttc
2701 tcccttcggg aagcgtggcg ctttctcata gctcacgctg taggtatctc agttcggtgt
2761 aggtcgttcg ctccaagctg ggctgtgtgc acgaaccccc cgttcagccc gaccgctgcg
2821 ccttatccgg taactatcgt cttgagtcca acccggtaag acacgactta tcgccactgg
2881 cagcagccac tggtaacagg attagcagag cgaggtatgt aggcggtgct acagagttct
2941 tgaagtggtg gcctaactac ggctacacta gaaggacagt atttggtatc tgcgctctgc
3001 tgaagccagt taccttcgga aaaagagttg gtagctcttg atccggcaaa caaaccaccg
3061 ctggtagcgg tggttttttt gtttgcaagc agcagattac gcgcagaaaa aaaggatctc
3121 aagaagatcc tttgatcttt tctacggggt ctgacgctca gtggaacgaa aactcacgtt
3181 aagggatttt ggtcatgaga ttatcaaaaa ggatcttcac ctagatcctt ttaaattaaa
3241 aatgaagttt taaatcaatc taaagtatat atgagtaaac ttggtctgac agttaccaat
3301 gcttaatcag tgaggcacct atctcagcga tctgtctatt tcgttcatcc atagttgcct
3361 gactccccgt cgtgtagata actacgatac gggagggctt accatctggc cccagtgctg
3421 caatgatacc gcgagaccca cgctcaccgg ctccagattt atcagcaata aaccagccag
3481 ccggaagggc cgagcgcaga agtggtcctg caactttatc cgcctccatc cagtctatta
3541 attgttgccg ggaagctaga gtaagtagtt cgccagttaa tagtttgcgc aacgttgttg
3601 ccattgctgc aggcatcgtg gtgtcacgct cgtcgtttgg tatggcttca ttcagctccg
3661 gttcccaacg atcaaggcga gttacatgat cccccatgtt gtgcaaaaaa gcggttagct
3721 ccttcggtcc tccgatcgtt gtcagaagta agttggccgc agtgttatca ctcatggtta
3781 tggcagcact gcataattct cttactgtca tgccatccgt aagatgcttt tctgtgactg
3841 gtgagtactc aaccaagtca ttctgagaat agtgtatgcg gcgaccgagt tgctcttgcc
3901 cggcgtcaac acgggataat accgcgccac atagcagaac tttaaaagtg ctcatcattg
3961 gaaaacgttc ttcggggcga aaactctcaa ggatcttacc gctgttgaga tccagttcga
4021 tgtaacccac tcgtgcaccc aactgatctt cagcatcttt tactttcacc agcgtttctg
4081 ggtgagcaaa aacaggaagg caaaatgccg caaaaaaggg aataagggcg acacggaaat
4141 gttgaatact catactcttc ctttttcaat attattgaag catttatcag ggttattgtc
4201 tcatgagcgg atacatattt gaatgtattt agaaaaataa acaaataggg gttccgcgca
4261 catttccccg aaaagtgcca cctgacgtct aagaaaccat tattatcatg acattaacct
4321 ataaaaatag gcgtatcacg aggccctttc gtcttcaaga a
//
|
½ÃÄö½º¿¡ ´ëÇÑ ±âº»ÀûÀÎ Á¤º¸¸¦ ´ã°í ÀÖ´Â ÇʵåÀÔ´Ï´Ù.
NG Çʵå´Â ID, topology, molecule, length Ç׸ñÀ» °¡Áö°í ÀÖÀ¸¸ç »ý·«ÇÒ¼ö ¾ø½À´Ï´Ù. °¢ Ç׸ñµéÀº °ø¹é¹®ÀÚ[1]¸¦ Æ÷ÇÔÇÒ ¼ö ¾øÀ¸¸ç Ç׸ñ°ú Ç׸ñÀÇ ±¸ºÐÀº °ø¹é¹®ÀÚ·Î ÇÕ´Ï´Ù. "NG" ´ÙÀ½¿¡ ³ª¿À´Â ¼ýÀÚ´Â ¹öÀü Á¤º¸¸¦ Ç¥½ÃÇϸç, ÇöÀç NGSF ¹öÀüÀº 1.0 ÀÔ´Ï´Ù.
ID´Â ½ÃÄö½º µ¥ÀÌÅͺ£À̽º¿¡ µé¾î°¡´Â °íÀ¯ÇÑ À̸§À̸ç, ¿ÜºÎ µ¥ÀÌÅͺ£À̽º¿¡¼ ºÒ·¯ ¿ÔÀ»¶§´Â µ¥ÀÌÅͺ£À̽º À̸§°ú ±× µ¥ÀÌÅͺ£À̽º¿¡¼ ½ÃÄö½ºÀÇ °íÀ¯ÇÑ ½Äº°ÀÚ¸¦ Ç¥½ÃÇÕ´Ï´Ù. topology´Â ±× ½ÃÄö½º°¡ linear ÀÎÁö circularÀÎÁö¸¦ ³ªÅ¸³À´Ï´Ù. molecule´Â ½ÃÄö½ºÀÇ type°ú strand¸¦ Ç¥½ÃÇÕ´Ï´Ù. ½ÃÄö½º typeÀº DNA¿Í RNA·Î ³ª´©¾îÁö¸ç, singl strand´Â 'ss'·Î double strand´Â 'ds'·Î Ç¥½ÃÇÕ´Ï´Ù. (ex : dsDNA , ssRNA) length´Â ½ÃÄö½ºÀÇ ±æÀ̸¦ Ç¥½ÃÇÕ´Ï´Ù.
½ÃÄö½º ¿µ¿ª¿¡ ´ëÇÑ Á¤º¸¸¦ Ç¥½ÃÇÕ´Ï´Ù.
FEATURE´Â Source, CDS, Promoter, OtherÀÇ ¼ºêÇʵå·Î ±¸¼ºµÇ¸ç, °¢ ÇÑÁٷΠǥ½ÃÇÕ´Ï´Ù. ¼ºê Çʵå´Â 2 ~ 12 Ä÷³ ¿¡¼ ½ÃÀ۵Ǹç, ±× ³»¿ëÀº °ø¹é ¹®ÀÚ·Î ±¸ºÐÇÕ´Ï´Ù.
Source´Â ½ÃÄö½ºÀÇ Ãâó(Çö ½ÃÄö½º µ¥ÀÌÅÍÀÇ Æ¯Á¤ ¿µ¿ªÀÌ ¾îµð¼ ¿Ô´ÂÁö)¸¦ Ç¥½ÃÇϸç, CDS´Â Protein Coding Sequence¸¦, Promoter´Â À¯ÀüÀÚ ¹ßÇö Á¶Àý ºÎÀ§¸¦, Other´Â À§¿¡¼ Ç¥Çö µÇÁö ¾ÊÀº ´Ù¸¥ Áß¿äÇÑ Á¤º¸¸¦ Ç¥½ÃÇÕ´Ï´Ù.
¼ºêÇʵå´Â feature_key, ¿µ¿ª, ¹æÇâ, feature_name ¼ø¼·Î ÀÛ¼ºÇÕ´Ï´Ù. feature_key´Â "Source", "CDS", "Promoter", "Other" À̰í, ¿µ¿ªÀº ½ÃÀÛ°ú ³¡ À§Ä¡¸¦ ".." ¸¦ »çÀÌ¿¡ µÎ°í Ç¥½ÃÇÕ´Ï´Ù. ¹æÇâÀº C (¹æÇ⼺ÀÌ ¾ø´Â feature), F (¹æÇ⼺ÀÌ ½ÃÄö½º ¹æÇâ°ú °°Àº feature), R (¹æÇ⼺ÀÌ ½ÃÄö½º ¹æÇâ°ú ¹Ý´ëÀÎ feature)·Î Ç¥½ÃÇÑ´Ù. feature_name Àº feature ¿µ¿ªÀÇ À̸§À» Ç¥½ÃÇÕ´Ï´Ù. Source ¿¡¼ feature_name´Â "DBÀ̸§:id"¿Í °°ÀÌ ½ÃÄö½ºÀÇ Ãâó¸¦ Ç¥½ÃÇÕ´Ï´Ù.
linear sequence ¾ç³¡ÀÇ cohesive end ¸¦ Ç¥½ÃÇÕ´Ï´Ù. ºÎÈ£¿Í ±æÀÌ´Â ½ÃÄö½º ¾ç ³¡À» ±âÁØÀ¸·Î, 5'ÀÌ ³ª¿Ô´Ù¸é +, 3'ÀÌ ³ª¿Ô´Ù¸é - ·Î Ç¥½ÃÇϰí, ±æÀÌ´Â 3' °ú 5' ÀÇ end À§Ä¡ Â÷À̸¦ Ç¥½ÃÇÕ´Ï´Ù.
Nucleotide sequence¸¦ Ç¥½ÃÇÕ´Ï´Ù. SEQUENCE Ű¿öµå ´ÙÀ½ÁÙ ºÎÅÍ Á¾·áŰ¿öµå("//") ÀÌÀü±îÁö ½ÃÄö½º·Î ÀνÄÇÕ´Ï´Ù. 2~12 Ä÷³Àº ½ÃÄö½ºÀÇ ±æÀ̸¦, 13Ä÷³ À̻󿡼´Â ½ÃÄö½º°¡ Ç¥½ÃµË´Ï´Ù. ÇÑ ÁÙ¿¡ 60 bases¸¦ Ç¥½ÃÇϰí, 10 bases¸¶´Ù °ø¹éÀ¸·Î ±¸ºÐÇÕ´Ï´Ù.
| [1] |
White space: ½ºÆäÀ̽º, Åǹ®ÀÚ. |