Human DNA question
-
The DNA is a sequence of 4 different numbers (quad system), which can be packed into 2 bits. Suppose you do that and then ZIP the whole thing does anyone know how big the zip file would be?
-
The DNA is a sequence of 4 different numbers (quad system), which can be packed into 2 bits. Suppose you do that and then ZIP the whole thing does anyone know how big the zip file would be?
Interesting. As I recall from high school biology a short google yields: The human genome contains about 3 Billion Base pairs, uncompressed this is 3e9*2pair*2bit / something = 1.4GB roughly. However, this should be compressible very well. First I remember the "atomic" unit are sequences of 3 (or 4?) bases. Second, there are many repetitions - http://www.geneticengineering.org/chemis/Chemis-NucleicAcid/DNA.htm[^]: Genome is made of three different types of DNA : Single Copy DNA consist of genes, found in only one or few places in the genome (75% of genome). It also includes multiple copy genes such as those coding for rRNA or Histones which exist as large clusters of multiple copies (50-10000 copies). Repetitive dispersed DNA fractions are characteristic short sequences (6-10 np) repeated 100000 - 1,000,000 times in disparate places throughout the genome (15% of genome). Satellite DNA (10% of genome) is made of highly repetitive sequences, basically confined to the chromosomes centromere and telomeres. So I guess you can finally put it on a CD.
Pandoras Gift #44: Hope. The one that keeps you on suffering.
aber.. "Wie gesagt, der Scheiss is' Therapie"
boost your code || Fold With Us! || sighist | doxygen -
Interesting. As I recall from high school biology a short google yields: The human genome contains about 3 Billion Base pairs, uncompressed this is 3e9*2pair*2bit / something = 1.4GB roughly. However, this should be compressible very well. First I remember the "atomic" unit are sequences of 3 (or 4?) bases. Second, there are many repetitions - http://www.geneticengineering.org/chemis/Chemis-NucleicAcid/DNA.htm[^]: Genome is made of three different types of DNA : Single Copy DNA consist of genes, found in only one or few places in the genome (75% of genome). It also includes multiple copy genes such as those coding for rRNA or Histones which exist as large clusters of multiple copies (50-10000 copies). Repetitive dispersed DNA fractions are characteristic short sequences (6-10 np) repeated 100000 - 1,000,000 times in disparate places throughout the genome (15% of genome). Satellite DNA (10% of genome) is made of highly repetitive sequences, basically confined to the chromosomes centromere and telomeres. So I guess you can finally put it on a CD.
Pandoras Gift #44: Hope. The one that keeps you on suffering.
aber.. "Wie gesagt, der Scheiss is' Therapie"
boost your code || Fold With Us! || sighist | doxygenI disagree with your calculation: you do not need to multiply by 2 just because they are base pairs. So if it contains 3 billion, it becomes 6 billion bits, which is 715 Megs. Reference: http://www.schoolscience.co.uk/content/4/biology/abpi/genome/genome3.html[^]
-
I disagree with your calculation: you do not need to multiply by 2 just because they are base pairs. So if it contains 3 billion, it becomes 6 billion bits, which is 715 Megs. Reference: http://www.schoolscience.co.uk/content/4/biology/abpi/genome/genome3.html[^]
Ahh yes, the pairs always are complementaries, right? then we are down to 715 meg uncomressed.
Pandoras Gift #44: Hope. The one that keeps you on suffering.
aber.. "Wie gesagt, der Scheiss is' Therapie"
boost your code || Fold With Us! || sighist | doxygen -
I disagree with your calculation: you do not need to multiply by 2 just because they are base pairs. So if it contains 3 billion, it becomes 6 billion bits, which is 715 Megs. Reference: http://www.schoolscience.co.uk/content/4/biology/abpi/genome/genome3.html[^]
Ivor S. Sargoytchev wrote: it becomes 6 billion bits, which is 715 Megs. It still fits on a CDROM. Somehow I was hoping I would be a superbit DVD.... oh well.... _________________________ Asu no koto o ieba, tenjo de nezumi ga warau. Talk about things of tomorrow and the mice in the ceiling laugh. (Japanese Proverb)
-
Ivor S. Sargoytchev wrote: it becomes 6 billion bits, which is 715 Megs. It still fits on a CDROM. Somehow I was hoping I would be a superbit DVD.... oh well.... _________________________ Asu no koto o ieba, tenjo de nezumi ga warau. Talk about things of tomorrow and the mice in the ceiling laugh. (Japanese Proverb)
Jeffry J. Brickley wrote: It still fits on a CDROM. Somehow I was hoping I would be a superbit DVD.... oh well.... Copy/Paste the DNA stream as XML into MS Word, and save-close-reopen-save a few dozen times. In a few minutes you'll have a huge doc file that'll just about fit a DVD ;-)
-
Jeffry J. Brickley wrote: It still fits on a CDROM. Somehow I was hoping I would be a superbit DVD.... oh well.... Copy/Paste the DNA stream as XML into MS Word, and save-close-reopen-save a few dozen times. In a few minutes you'll have a huge doc file that'll just about fit a DVD ;-)
Nishant Sivakumar wrote: Copy/Paste the DNA stream as XML into MS Word, and save-close-reopen-save a few dozen times. In a few minutes you'll have a huge doc file that'll just about fit a DVD In just a few minutes? That's one very powerful machine you must have :)