Genome analysis

 

Genome sequencing – the Human Genome Project

-          first idea came about in 1985

-          formal govt 15 year project proposed in 1990

o       1998 Celera says it will privately sequence genome in 3 years

-          1999 first chromosome completely sequenced

o       What does “complete” mean?

§         11 gaps

§         Error rate of 1/50,000nt

-          2001 “working draft” of the human genome

-          2003 sequencing “completed” – papers dribble out until 2006

-          Two strategies

o       Mapping first, then sequencing (Fig. 10.9)

§         Contig – series of inserts covering a region

 Minimal tiling path

§         Line up BAC, then subclone 2kb bits into plasmids and sequence

o       Shotgun sequencing

§         Make three libraries, one BAC and two plasmids (one 2kb insert, one 10kb insert)

§         Sequence entire small plasmid, ends of bigger and BACS

§         Assemble by computer

-          “fold coverage”

o       How many times is each region sequenced?

§         More times means more accuracy

·         Standard is >99.99% accurate (1 error  in 10,000 bases)

§         But takes time and money

§         Final coverage is 6-7X

 

Finding genes

1) use cDNAs or Expressed Sequence Tags (ESTs) sequence to find exons in genomic sequence

2) use computer to find exons

-          Look for txn start and termination regions

-          Look for exon/intron junctions

-          Look at codon usage

3) compare genomes to each other

-          Mouse vs human – idea is that genes will be very highly conserved (ie, similar in DNA sequence), but areas between genes and introns will not be as highly conserved

 

How does having human genome make dealing with other genomes easier?

-          makes it easier to assemble other organism sequences

-          makes finding genes in other organisms easier

-          makes it easier to find orthologs

o       ortholog is gene for same protein in another organism

§         beta globin of mouse and humans are orthologs

-          can tell us about evolution of proteins and protein domains (Fig 10.3)

 

Analyzing the human genome

- Solitary genes – 15% of genome, although only .8% is exons

- Gene families – 15%, although only .8% is exons

o       odorant receptor family (Fig. 10.16 and Table 10.2)

§         proteins expressed in nasal epithelium that bind odorant molecules

§         over 1000 members, all related to each other but slightly different

§         these genes are paralogs

·         probably one/a few ancestral and lots of duplications

- tandemly repeated genes (.3% of genome)

o       tRNA, rRNA and some others

 

TOTAL OF ABOVE – about 31% of genome

 

Transposons

-          moving DNA elements

-          50% of genome!!!

 

Unclassified – not any of the above categories – 20% of genome

 

Number of human genes

-          we think about 25,000

 

The proteome

-          proteome is all proteins produced in an organism

-          so do we make 25,000 proteins?

o       Alternative splicing may actually increase that

§         We seem to do this more than other organisms

o       Posttranslational modification may increase that