Neurogenex Sequence Format(NGSF) Guide 1.0

(ÁÖ)´º·ÎÁ¦³Ø½º Bioinformatics Team

2002³â 10¿ù 1ÀÏ

EnCyclon® is Resgistered Trademark ¢â by Neurogenex Co., Ltd..


Â÷·Ê
1. NGSF ¼Ò°³
1.1. ÀÌ ¹®¼­¿¡ ´ëÇØ¼­
1.2. NGSF´Â ¹«¾ùÀΰ¡?
1.3. NGSFÀÇ Æ¯Â¡
1.4. ¹öÀüº° º¯°æ»çÇ×
2. NGSF ÆÄÀÏ ±¸Á¶
2.1. ÆÄÀÏ ±¸Á¶
2.2. ¿¹Á¦ ÆÄÀÏ
2.3. ½ÃÄö½º Çì´õ
2.3.1. NG
2.3.2. NAME
2.3.3. DESCRIPTION
2.3.4. AUTHOR
2.3.5. DATE
2.3.6. FEATURE
2.3.7. ENDS
2.4. ½ÃÄö½º
2.4.1. SEQUENCE
¿¹ ¸ñ·Ï
2-1. NGSF ÆÄÀÏ ¿¹Á¦

1Àå. NGSF ¼Ò°³

1.1. ÀÌ ¹®¼­¿¡ ´ëÇØ¼­

ÀÌ ¹®¼­´Â EnCyclon¿¡¼­ »ç¿ëÇÏ´Â Neurogenex Sequence Format(ÀÌÇÏ NGSF)ÀÇ ±¸¼ºÀ» ±â¼úÇÑ ¹®¼­ÀÔ´Ï´Ù.

¹®¼­³»¿¡¼­ ¿À·ù¸¦ ¹ß°ßÇÏ½Ã¸é ¹®¼­´ã´çÀÚ ¿¡°Ô ¿¬¶ôÁֽʽÿÀ


1.2. NGSF´Â ¹«¾ùÀΰ¡?

EnCyclon¿¡¼­ »ç¿ëÇϱâÀ§ÇÏ¿© °³¹ßµÈ Neurogenex¿¡¼­ °³¹ßÇÑ »õ·Î¿î ½ÃÄö½º Æ÷¸äÀÔ´Ï´Ù.

EnCyclonÀ» °³¹ßÇϱâ À§ÇØ ±âÁ¸ÀÇ ½ÃÄö½º Æ÷¸äÀ» °ËÅäÇÏ¿© º¸¾ÒÀ¸³ª ÇÊ¿äÀÌ»óÀ¸·Î ºÒÇÊ¿äÇÑ ³»¿ëÀÌ ¸¹°í ¶ÇÇÑ º¹ÀâÇÏ¿© ¾²±â ¾î·Á¿ï »Ó¸¸¾Æ´Ï¶ó EnCyclon¿¡¼­ ±¸ÇöÇÏ·Á°íÇÏ´Â ±â´ÉÀ» Ç¥ÇöÇϱ⿡´Â ºÎÁ·ÇÑ ºÎºÐÀÌ ¸¹¾Ò½À´Ï´Ù. ±×·¡¼­ »õ·Î¿î ½ÃÄö½º Æ÷¸äÀ» °³¹ßÇÏ°Ô µÇ¾ú½À´Ï´Ù.


1.3. NGSFÀÇ Æ¯Â¡

  • ¿¬±¸¿¡ ÇÊ¿äÇÑ Á¤º¸¸¸ °£°áÇÑ ÇüÅ·Πǥ½ÃÇÕ´Ï´Ù. ºÒÇÊ¿äÇÑ Á¤º¸¸¦ ½ÃÄö½º¿¡ Ç¥ÇöÇÏÁö¾Ê¾Æ ½ÃÄö½º¸¦ Àдµ¥ ÆíÇÕ´Ï´Ù.

  • ½ÃÄö½ºÀÇ »óŸ¦ Á¤È®ÇÏ°Ô ³ªÅ¸³À´Ï´Ù. ENDS


1.4. ¹öÀüº° º¯°æ»çÇ×

°íÄ£ °úÁ¤
°íħ 1.0 2001³â 11¿ù 26ÀÏ
ÃÖÃÊ ¸±¸®½º



2Àå. NGSF ÆÄÀÏ ±¸Á¶

NGSF ½ÃÄö½º ÆÄÀÏÀÇ ÆÄÀÏ ±¸Á¶¸¦ ¼³¸íÇÕ´Ï´Ù.


2.1. ÆÄÀÏ ±¸Á¶

NGSF ½ÃÄö½º ÆÄÀÏÀº ½ÃÄö½º Çì´õ¿Í ½ÃÄö½º µÎ ºÎºÐÀ¸·Î ÀÌ·ç¾îÁ® ÀÖ½À´Ï´Ù.

¸ðµç ½ÃÄö½º ÆÄÀÏÀº ½ÃÄö½º Çì´õ¿Í ½ÃÄö½º¸¦ °¡Áö¸ç, ½ÃÄö½º Çì´õ´Â »ý·«°¡´ÉÇÑ Çʵ尡 ÀÖ½À´Ï´Ù. ±×·¯³ª ½ÃÄö½º´Â »ý·«ÇÒ ¼ö ¾ø½À´Ï´Ù.

½ÃÄö½º Çì´õ Çʵå´Â 1Ä÷³¿¡¼­ ½ÃÀÛÇÕ´Ï´Ù. ½ÃÄö½º Çì´õ Çʵ忡 Á¾¼ÓµÇ´Â Çʵå´Â 2~12 Ä÷³¿¡ ÀÖ¾î¾ßÇÕ´Ï´Ù.

ÇϳªÀÇ Çʵå´Â 1¿­¿¡¼­ ½ÃÀÛÇϴ Ű¿öµå·ÎºÎÅÍ ´ÙÀ½ Çʵ尡 ½ÃÀ۵DZâ Àü±îÁöÀÔ´Ï´Ù. ÀÌ·¸°Ô ÇϳªÀÇ Çʵ尡 ¿©·¯ ÁÙ·Î ±¸¼ºµÉ ¼ö ÀÖÁö¸¸ NG, NAME DATE, ENDS Ű¿öµå´Â ÇÑ ÁÙ·Î ±¸¼ºµË´Ï´Ù.

ÇÑ ÁÙ ÀÌ»óÀ¸·Î ±¸¼ºµÇ´Â Çʵå´Â ´ÙÀ½ ¶óÀÎÀÇ Ã¹ ¹øÂ° Ä÷³ÀÌ °ø¹é¹®ÀÚ[1]°¡ ¾Æ´Ñ ¹®ÀÚ°¡ ³ª¿Ã ¶§±îÁö ±× Ű¿öµå¿¡ ¼ÓÇÑ ³»¿ëÀ¸·Î ÀνÄÇÕ´Ï´Ù.

½ÃÄö½º´Â NG Çʵå·ÎºÎÅÍ ½ÃÀÛÇϸç, NGÇʵå´Â »ý·«ÇÒ ¼ö ¾ø½À´Ï´Ù. ÀÌÈÄ¿¡´Â ´Ù¸¥ Çì´õ Çʵ尡 ³ª¿É´Ï´Ù. NG¸¦ Á¦¿ÜÇÑ ´Ù¸¥ Çì´õ Çʵå´Â »ý·«ÀÌ °¡´ÉÇÕ´Ï´Ù.

½ÃÄö½º Çì´õ°¡ Á¾·áµÇ¸é ½ÃÄö½º°¡ ³ª¿À¸ç, ù ¹øÂ° Ä÷³ºÎÅÍ ½ÃÀÛÇÏ´Â "//"·Î ½ÃÄö½º Á¾·á¸¦ Ç¥½ÃÇÕ´Ï´Ù.


2.2. ¿¹Á¦ ÆÄÀÏ

¿¹ 2-1. NGSF ÆÄÀÏ ¿¹Á¦

NG1.0       Genbank:J01749   circular  dsDNA  4361 bp
DESCRIPTION Cloning vector pBR322, complete genome.
AUTHOR      Gilbert,W.
DATE        2000/12/14
FEATURE
   Source   1..1762        F   pSC101   
   CDS      86..1276       F   Tet             
   CDS      1915..2106     F   ROP             
   CDS      3293..4153     R   Amp             
   Promoter 27..33         R   P1              
   Promoter 43..49         F   P2              
   Promoter 4188..4194     R   P3              
SEQUENCE
          1 ttctcatgtt tgacagctta tcatcgataa gctttaatgc ggtagtttat cacagttaaa 
         61 ttgctaacgc agtcaggcac cgtgtatgaa atctaacaat gcgctcatcg tcatcctcgg 
        121 caccgtcacc ctggatgctg taggcatagg cttggttatg ccggtactgc cgggcctctt 
        181 gcgggatatc gtccattccg acagcatcgc cagtcactat ggcgtgctgc tagcgctata 
        241 tgcgttgatg caatttctat gcgcacccgt tctcggagca ctgtccgacc gctttggccg 
        301 ccgcccagtc ctgctcgctt cgctacttgg agccactatc gactacgcga tcatggcgac 
        361 cacacccgtc ctgtggatcc tctacgccgg acgcatcgtg gccggcatca ccggcgccac 
        421 aggtgcggtt gctggcgcct atatcgccga catcaccgat ggggaagatc gggctcgcca 
        481 cttcgggctc atgagcgctt gtttcggcgt gggtatggtg gcaggccccg tggccggggg 
        541 actgttgggc gccatctcct tgcatgcacc attccttgcg gcggcggtgc tcaacggcct 
        601 caacctacta ctgggctgct tcctaatgca ggagtcgcat aagggagagc gtcgaccgat 
        661 gcccttgaga gccttcaacc cagtcagctc cttccggtgg gcgcggggca tgactatcgt 
        721 cgccgcactt atgactgtct tctttatcat gcaactcgta ggacaggtgc cggcagcgct 
        781 ctgggtcatt ttcggcgagg accgctttcg ctggagcgcg acgatgatcg gcctgtcgct 
        841 tgcggtattc ggaatcttgc acgccctcgc tcaagccttc gtcactggtc ccgccaccaa 
        901 acgtttcggc gagaagcagg ccattatcgc cggcatggcg gccgacgcgc tgggctacgt 
        961 cttgctggcg ttcgcgacgc gaggctggat ggccttcccc attatgattc ttctcgcttc 
       1021 cggcggcatc gggatgcccg cgttgcaggc catgctgtcc aggcaggtag atgacgacca 
       1081 tcagggacag cttcaaggat cgctcgcggc tcttaccagc ctaacttcga tcactggacc 
       1141 gctgatcgtc acggcgattt atgccgcctc ggcgagcaca tggaacgggt tggcatggat 
       1201 tgtaggcgcc gccctatacc ttgtctgcct ccccgcgttg cgtcgcggtg catggagccg 
       1261 ggccacctcg acctgaatgg aagccggcgg cacctcgcta acggattcac cactccaaga 
       1321 attggagcca atcaattctt gcggagaact gtgaatgcgc aaaccaaccc ttggcagaac 
       1381 atatccatcg cgtccgccat ctccagcagc cgcacgcggc gcatctcggg cagcgttggg 
       1441 tcctggccac gggtgcgcat gatcgtgctc ctgtcgttga ggacccggct aggctggcgg 
       1501 ggttgcctta ctggttagca gaatgaatca ccgatacgcg agcgaacgtg aagcgactgc 
       1561 tgctgcaaaa cgtctgcgac ctgagcaaca acatgaatgg tcttcggttt ccgtgtttcg 
       1621 taaagtctgg aaacgcggaa gtcagcgccc tgcaccatta tgttccggat ctgcatcgca 
       1681 ggatgctgct ggctaccctg tggaacacct acatctgtat taacgaagcg ctggcattga 
       1741 ccctgagtga tttttctctg gtcccgccgc atccataccg ccagttgttt accctcacaa 
       1801 cgttccagta accgggcatg ttcatcatca gtaacccgta tcgtgagcat cctctctcgt 
       1861 ttcatcggta tcattacccc catgaacaga aatccccctt acacggaggc atcagtgacc 
       1921 aaacaggaaa aaaccgccct taacatggcc cgctttatca gaagccagac attaacgctt 
       1981 ctggagaaac tcaacgagct ggacgcggat gaacaggcag acatctgtga atcgcttcac 
       2041 gaccacgctg atgagcttta ccgcagctgc ctcgcgcgtt tcggtgatga cggtgaaaac 
       2101 ctctgacaca tgcagctccc ggagacggtc acagcttgtc tgtaagcgga tgccgggagc 
       2161 agacaagccc gtcagggcgc gtcagcgggt gttggcgggt gtcggggcgc agccatgacc 
       2221 cagtcacgta gcgatagcgg agtgtatact ggcttaacta tgcggcatca gagcagattg 
       2281 tactgagagt gcaccatatg cggtgtgaaa taccgcacag atgcgtaagg agaaaatacc 
       2341 gcatcaggcg ctcttccgct tcctcgctca ctgactcgct gcgctcggtc gttcggctgc 
       2401 ggcgagcggt atcagctcac tcaaaggcgg taatacggtt atccacagaa tcaggggata 
       2461 acgcaggaaa gaacatgtga gcaaaaggcc agcaaaaggc caggaaccgt aaaaaggccg 
       2521 cgttgctggc gtttttccat aggctccgcc cccctgacga gcatcacaaa aatcgacgct 
       2581 caagtcagag gtggcgaaac ccgacaggac tataaagata ccaggcgttt ccccctggaa 
       2641 gctccctcgt gcgctctcct gttccgaccc tgccgcttac cggatacctg tccgcctttc 
       2701 tcccttcggg aagcgtggcg ctttctcata gctcacgctg taggtatctc agttcggtgt 
       2761 aggtcgttcg ctccaagctg ggctgtgtgc acgaaccccc cgttcagccc gaccgctgcg 
       2821 ccttatccgg taactatcgt cttgagtcca acccggtaag acacgactta tcgccactgg 
       2881 cagcagccac tggtaacagg attagcagag cgaggtatgt aggcggtgct acagagttct 
       2941 tgaagtggtg gcctaactac ggctacacta gaaggacagt atttggtatc tgcgctctgc 
       3001 tgaagccagt taccttcgga aaaagagttg gtagctcttg atccggcaaa caaaccaccg 
       3061 ctggtagcgg tggttttttt gtttgcaagc agcagattac gcgcagaaaa aaaggatctc 
       3121 aagaagatcc tttgatcttt tctacggggt ctgacgctca gtggaacgaa aactcacgtt 
       3181 aagggatttt ggtcatgaga ttatcaaaaa ggatcttcac ctagatcctt ttaaattaaa 
       3241 aatgaagttt taaatcaatc taaagtatat atgagtaaac ttggtctgac agttaccaat 
       3301 gcttaatcag tgaggcacct atctcagcga tctgtctatt tcgttcatcc atagttgcct 
       3361 gactccccgt cgtgtagata actacgatac gggagggctt accatctggc cccagtgctg 
       3421 caatgatacc gcgagaccca cgctcaccgg ctccagattt atcagcaata aaccagccag 
       3481 ccggaagggc cgagcgcaga agtggtcctg caactttatc cgcctccatc cagtctatta 
       3541 attgttgccg ggaagctaga gtaagtagtt cgccagttaa tagtttgcgc aacgttgttg 
       3601 ccattgctgc aggcatcgtg gtgtcacgct cgtcgtttgg tatggcttca ttcagctccg 
       3661 gttcccaacg atcaaggcga gttacatgat cccccatgtt gtgcaaaaaa gcggttagct 
       3721 ccttcggtcc tccgatcgtt gtcagaagta agttggccgc agtgttatca ctcatggtta 
       3781 tggcagcact gcataattct cttactgtca tgccatccgt aagatgcttt tctgtgactg 
       3841 gtgagtactc aaccaagtca ttctgagaat agtgtatgcg gcgaccgagt tgctcttgcc 
       3901 cggcgtcaac acgggataat accgcgccac atagcagaac tttaaaagtg ctcatcattg 
       3961 gaaaacgttc ttcggggcga aaactctcaa ggatcttacc gctgttgaga tccagttcga 
       4021 tgtaacccac tcgtgcaccc aactgatctt cagcatcttt tactttcacc agcgtttctg 
       4081 ggtgagcaaa aacaggaagg caaaatgccg caaaaaaggg aataagggcg acacggaaat 
       4141 gttgaatact catactcttc ctttttcaat attattgaag catttatcag ggttattgtc 
       4201 tcatgagcgg atacatattt gaatgtattt agaaaaataa acaaataggg gttccgcgca 
       4261 catttccccg aaaagtgcca cctgacgtct aagaaaccat tattatcatg acattaacct 
       4321 ataaaaatag gcgtatcacg aggccctttc gtcttcaaga a  
//

2.3. ½ÃÄö½º Çì´õ

2.3.1. NG

½ÃÄö½º¿¡ ´ëÇÑ ±âº»ÀûÀÎ Á¤º¸¸¦ ´ã°í ÀÖ´Â ÇʵåÀÔ´Ï´Ù.

NG Çʵå´Â ID, topology, molecule, length Ç׸ñÀ» °¡Áö°í ÀÖÀ¸¸ç »ý·«ÇÒ¼ö ¾ø½À´Ï´Ù. °¢ Ç׸ñµéÀº °ø¹é¹®ÀÚ[1]¸¦ Æ÷ÇÔÇÒ ¼ö ¾øÀ¸¸ç Ç׸ñ°ú Ç׸ñÀÇ ±¸ºÐÀº °ø¹é¹®ÀÚ·Î ÇÕ´Ï´Ù. "NG" ´ÙÀ½¿¡ ³ª¿À´Â ¼ýÀÚ´Â ¹öÀü Á¤º¸¸¦ Ç¥½ÃÇϸç, ÇöÀç NGSF ¹öÀüÀº 1.0 ÀÔ´Ï´Ù.

ID´Â ½ÃÄö½º µ¥ÀÌÅͺ£À̽º¿¡ µé¾î°¡´Â °íÀ¯ÇÑ À̸§À̸ç, ¿ÜºÎ µ¥ÀÌÅͺ£À̽º¿¡¼­ ºÒ·¯ ¿ÔÀ»¶§´Â µ¥ÀÌÅͺ£À̽º À̸§°ú ±× µ¥ÀÌÅͺ£À̽º¿¡¼­ ½ÃÄö½ºÀÇ °íÀ¯ÇÑ ½Äº°ÀÚ¸¦ Ç¥½ÃÇÕ´Ï´Ù. topology´Â ±× ½ÃÄö½º°¡ linear ÀÎÁö circularÀÎÁö¸¦ ³ªÅ¸³À´Ï´Ù. molecule´Â ½ÃÄö½ºÀÇ type°ú strand¸¦ Ç¥½ÃÇÕ´Ï´Ù. ½ÃÄö½º typeÀº DNA¿Í RNA·Î ³ª´©¾îÁö¸ç, singl strand´Â 'ss'·Î double strand´Â 'ds'·Î Ç¥½ÃÇÕ´Ï´Ù. (ex : dsDNA , ssRNA) length´Â ½ÃÄö½ºÀÇ ±æÀ̸¦ Ç¥½ÃÇÕ´Ï´Ù.


2.3.2. NAME

½ÃÄö½ºÀÇ À̸§À» °£´ÜÇÑ ¹®ÀåÀ¸·Î ÇÑ ÁÙ¿¡ Ç¥½ÃÇÕ´Ï´Ù.


2.3.3. DESCRIPTION

½ÃÄö½º¿¡ ´ëÇÑ °£´ÜÇÑ ¼³¸íÀ» ÇÑ ÁÙ ÀÌ»óÀ¸·Î Ç¥½ÃÇÕ´Ï´Ù.


2.3.4. AUTHOR

½ÃÄö½º ÀÛ¼ºÀÚ¸¦ ","·Î ±¸ºÐÇÏ¿© ÇÑ ÁÙ ÀÌ»óÀ¸·Î Ç¥½ÃÇÕ´Ï´Ù.


2.3.5. DATE

½ÃÄö½ºÀÇ ÀÛ¼ºÀÏÀ» ÇÑÁٷΠǥÇöÇϸç, Çü½ÄÀº YYYY/MM/DD ÀÔ´Ï´Ù.


2.3.6. FEATURE

½ÃÄö½º ¿µ¿ª¿¡ ´ëÇÑ Á¤º¸¸¦ Ç¥½ÃÇÕ´Ï´Ù.

FEATURE´Â Source, CDS, Promoter, OtherÀÇ ¼­ºêÇʵå·Î ±¸¼ºµÇ¸ç, °¢ ÇÑÁٷΠǥ½ÃÇÕ´Ï´Ù. ¼­ºê Çʵå´Â 2 ~ 12 Ä÷³ ¿¡¼­ ½ÃÀ۵Ǹç, ±× ³»¿ëÀº °ø¹é ¹®ÀÚ·Î ±¸ºÐÇÕ´Ï´Ù.

Source´Â ½ÃÄö½ºÀÇ Ãâó(Çö ½ÃÄö½º µ¥ÀÌÅÍÀÇ Æ¯Á¤ ¿µ¿ªÀÌ ¾îµð¼­ ¿Ô´ÂÁö)¸¦ Ç¥½ÃÇϸç, CDS´Â Protein Coding Sequence¸¦, Promoter´Â À¯ÀüÀÚ ¹ßÇö Á¶Àý ºÎÀ§¸¦, Other´Â À§¿¡¼­ Ç¥Çö µÇÁö ¾ÊÀº ´Ù¸¥ Áß¿äÇÑ Á¤º¸¸¦ Ç¥½ÃÇÕ´Ï´Ù.

¼­ºêÇʵå´Â feature_key, ¿µ¿ª, ¹æÇâ, feature_name ¼ø¼­·Î ÀÛ¼ºÇÕ´Ï´Ù. feature_key´Â "Source", "CDS", "Promoter", "Other" À̰í, ¿µ¿ªÀº ½ÃÀÛ°ú ³¡ À§Ä¡¸¦ ".." ¸¦ »çÀÌ¿¡ µÎ°í Ç¥½ÃÇÕ´Ï´Ù. ¹æÇâÀº C (¹æÇ⼺ÀÌ ¾ø´Â feature), F (¹æÇ⼺ÀÌ ½ÃÄö½º ¹æÇâ°ú °°Àº feature), R (¹æÇ⼺ÀÌ ½ÃÄö½º ¹æÇâ°ú ¹Ý´ëÀÎ feature)·Î Ç¥½ÃÇÑ´Ù. feature_name Àº feature ¿µ¿ªÀÇ À̸§À» Ç¥½ÃÇÕ´Ï´Ù. Source ¿¡¼­ feature_name´Â "DBÀ̸§:id"¿Í °°ÀÌ ½ÃÄö½ºÀÇ Ãâó¸¦ Ç¥½ÃÇÕ´Ï´Ù.


2.3.7. ENDS

linear sequence ¾ç³¡ÀÇ cohesive end ¸¦ Ç¥½ÃÇÕ´Ï´Ù. ºÎÈ£¿Í ±æÀÌ´Â ½ÃÄö½º ¾ç ³¡À» ±âÁØÀ¸·Î, 5'ÀÌ ³ª¿Ô´Ù¸é +, 3'ÀÌ ³ª¿Ô´Ù¸é - ·Î Ç¥½ÃÇϰí, ±æÀÌ´Â 3' °ú 5' ÀÇ end À§Ä¡ Â÷À̸¦ Ç¥½ÃÇÕ´Ï´Ù.


2.4. ½ÃÄö½º


2.4.1. SEQUENCE

Nucleotide sequence¸¦ Ç¥½ÃÇÕ´Ï´Ù. SEQUENCE Ű¿öµå ´ÙÀ½ÁÙ ºÎÅÍ Á¾·áŰ¿öµå("//") ÀÌÀü±îÁö ½ÃÄö½º·Î ÀνÄÇÕ´Ï´Ù. 2~12 Ä÷³Àº ½ÃÄö½ºÀÇ ±æÀ̸¦, 13Ä÷³ À̻󿡼­´Â ½ÃÄö½º°¡ Ç¥½ÃµË´Ï´Ù. ÇÑ ÁÙ¿¡ 60 bases¸¦ Ç¥½ÃÇϰí, 10 bases¸¶´Ù °ø¹éÀ¸·Î ±¸ºÐÇÕ´Ï´Ù.

ÁÖ¼®

[1]

White space: ½ºÆäÀ̽º, Åǹ®ÀÚ.