Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Human DNA question

Human DNA question

Scheduled Pinned Locked Moved The Lounge
question
7 Posts 5 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • I Offline
    I Offline
    Ivor S Sargoytchev
    wrote on last edited by
    #1

    The DNA is a sequence of 4 different numbers (quad system), which can be packed into 2 bits. Suppose you do that and then ZIP the whole thing does anyone know how big the zip file would be?

    P 1 Reply Last reply
    0
    • I Ivor S Sargoytchev

      The DNA is a sequence of 4 different numbers (quad system), which can be packed into 2 bits. Suppose you do that and then ZIP the whole thing does anyone know how big the zip file would be?

      P Offline
      P Offline
      peterchen
      wrote on last edited by
      #2

      Interesting. As I recall from high school biology a short google yields: The human genome contains about 3 Billion Base pairs, uncompressed this is 3e9*2pair*2bit / something = 1.4GB roughly. However, this should be compressible very well. First I remember the "atomic" unit are sequences of 3 (or 4?) bases. Second, there are many repetitions - http://www.geneticengineering.org/chemis/Chemis-NucleicAcid/DNA.htm[^]: Genome is made of three different types of DNA : Single Copy DNA consist of genes, found in only one or few places in the genome (75% of genome). It also includes multiple copy genes such as those coding for rRNA or Histones which exist as large clusters of multiple copies (50-10000 copies). Repetitive dispersed DNA fractions are characteristic short sequences (6-10 np) repeated 100000 - 1,000,000 times in disparate places throughout the genome (15% of genome). Satellite DNA (10% of genome) is made of highly repetitive sequences, basically confined to the chromosomes centromere and telomeres. So I guess you can finally put it on a CD.


      Pandoras Gift #44: Hope. The one that keeps you on suffering.
      aber.. "Wie gesagt, der Scheiss is' Therapie"
      boost your code || Fold With Us! || sighist | doxygen

      I 1 Reply Last reply
      0
      • P peterchen

        Interesting. As I recall from high school biology a short google yields: The human genome contains about 3 Billion Base pairs, uncompressed this is 3e9*2pair*2bit / something = 1.4GB roughly. However, this should be compressible very well. First I remember the "atomic" unit are sequences of 3 (or 4?) bases. Second, there are many repetitions - http://www.geneticengineering.org/chemis/Chemis-NucleicAcid/DNA.htm[^]: Genome is made of three different types of DNA : Single Copy DNA consist of genes, found in only one or few places in the genome (75% of genome). It also includes multiple copy genes such as those coding for rRNA or Histones which exist as large clusters of multiple copies (50-10000 copies). Repetitive dispersed DNA fractions are characteristic short sequences (6-10 np) repeated 100000 - 1,000,000 times in disparate places throughout the genome (15% of genome). Satellite DNA (10% of genome) is made of highly repetitive sequences, basically confined to the chromosomes centromere and telomeres. So I guess you can finally put it on a CD.


        Pandoras Gift #44: Hope. The one that keeps you on suffering.
        aber.. "Wie gesagt, der Scheiss is' Therapie"
        boost your code || Fold With Us! || sighist | doxygen

        I Offline
        I Offline
        Ivor S Sargoytchev
        wrote on last edited by
        #3

        I disagree with your calculation: you do not need to multiply by 2 just because they are base pairs. So if it contains 3 billion, it becomes 6 billion bits, which is 715 Megs. Reference: http://www.schoolscience.co.uk/content/4/biology/abpi/genome/genome3.html[^]

        P E 2 Replies Last reply
        0
        • I Ivor S Sargoytchev

          I disagree with your calculation: you do not need to multiply by 2 just because they are base pairs. So if it contains 3 billion, it becomes 6 billion bits, which is 715 Megs. Reference: http://www.schoolscience.co.uk/content/4/biology/abpi/genome/genome3.html[^]

          P Offline
          P Offline
          peterchen
          wrote on last edited by
          #4

          Ahh yes, the pairs always are complementaries, right? then we are down to 715 meg uncomressed.


          Pandoras Gift #44: Hope. The one that keeps you on suffering.
          aber.. "Wie gesagt, der Scheiss is' Therapie"
          boost your code || Fold With Us! || sighist | doxygen

          1 Reply Last reply
          0
          • I Ivor S Sargoytchev

            I disagree with your calculation: you do not need to multiply by 2 just because they are base pairs. So if it contains 3 billion, it becomes 6 billion bits, which is 715 Megs. Reference: http://www.schoolscience.co.uk/content/4/biology/abpi/genome/genome3.html[^]

            E Offline
            E Offline
            El Corazon
            wrote on last edited by
            #5

            Ivor S. Sargoytchev wrote: it becomes 6 billion bits, which is 715 Megs. It still fits on a CDROM. Somehow I was hoping I would be a superbit DVD.... oh well.... _________________________ Asu no koto o ieba, tenjo de nezumi ga warau. Talk about things of tomorrow and the mice in the ceiling laugh. (Japanese Proverb)

            N 1 Reply Last reply
            0
            • E El Corazon

              Ivor S. Sargoytchev wrote: it becomes 6 billion bits, which is 715 Megs. It still fits on a CDROM. Somehow I was hoping I would be a superbit DVD.... oh well.... _________________________ Asu no koto o ieba, tenjo de nezumi ga warau. Talk about things of tomorrow and the mice in the ceiling laugh. (Japanese Proverb)

              N Offline
              N Offline
              Nish Nishant
              wrote on last edited by
              #6

              Jeffry J. Brickley wrote: It still fits on a CDROM. Somehow I was hoping I would be a superbit DVD.... oh well.... Copy/Paste the DNA stream as XML into MS Word, and save-close-reopen-save a few dozen times. In a few minutes you'll have a huge doc file that'll just about fit a DVD ;-)

              P 1 Reply Last reply
              0
              • N Nish Nishant

                Jeffry J. Brickley wrote: It still fits on a CDROM. Somehow I was hoping I would be a superbit DVD.... oh well.... Copy/Paste the DNA stream as XML into MS Word, and save-close-reopen-save a few dozen times. In a few minutes you'll have a huge doc file that'll just about fit a DVD ;-)

                P Offline
                P Offline
                p daddy
                wrote on last edited by
                #7

                Nishant Sivakumar wrote: Copy/Paste the DNA stream as XML into MS Word, and save-close-reopen-save a few dozen times. In a few minutes you'll have a huge doc file that'll just about fit a DVD In just a few minutes? That's one very powerful machine you must have :)

                1 Reply Last reply
                0
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • World
                • Users
                • Groups