Lecture 4: Recombinant Technology: Fundamentals of Biotechnology
pdf _download pdf _download

So, we are now able to understand how genes on chromosomes are inherited and -using the recombination frequencies of "linked" genes- we are able to construct crude linkage maps of ther relative locations on their respective chromosomes and how meiosis -more so than mitosis- leads to genetic variation rather than genetic constancy.

Moreover, having understood the nature and structure of DNA, and how it can be replicated, we now begin to be aware of some of the variety of enzymes that are involved in its replication and maintenance of genomiv DNA as a viable template for inheritance.


What are these genes, or genetic units...red ball. ..

In trying to unravel the makeup of genes and how they are encode for functional gene producs within their respective chromosomes, geneticists have been able to isolate these units from one system and have them expressed in "model systems" for well over 75 years now..


Newsweek Article.... "transgenic" monkees, humans? ...................Old news in prokaryotes and lower, single celled eukaryotes, where DNA is actively taken up from various sources and used as cellular resources.




While its great for geneticists, it does cause a problem for bacterial hosts..... and the acceptance or rejection of external sources of genetic elements boils down to a genetic definition of "self".

There are three basic fates of DNA as it enters the bacterial cell

1) It is degraded rapidly due to host defense mechanisms.
2) It is integrated into host chromosome by recombination (homologous or non-homologous).
3) The DNA is able to circularize and replicate independently (autonomously) from the host chromosome (i.e. it contains an origin of replication that is recognized by host replicating enzymes).

R-factors are autonomously replicating -non essential circular poeces of of DNA of varying sizes that minimally encode some form of antibiotic resistance gene. These "R-factors" provide just such an independent origin of replication, and also provide for a means of selection that maintains their presence within a host bacterial cell in to which they reside. They, therefore, provide a generic role as vectors or genetic vehicles for the "stable" cloning of genes into a bacterial host.


(a) for basic research, scientists?
(b) industrial purposes?
(c) for medicinal purposes?

Plasmids: Extrachromosomal, self replicating or autogenous replicating, covalently closed, circular pieces of dsDNA. They can, sometimes be integrated into the host chromosome, and if so they are often called and episome.

Plasmids of 3,000 - 5,000 bp, often have a high copy number (15 - 100 copies per cell).
Plasmids of 4,000 - 300,000 bp (300 kbp), are as common in nature, but less highly copied per cell(one or two per cell) and (due to these factors) are less easily manipulatable.

Conjugative plasmids invariably contain tra and mob genes, which are necessary to promote cell to cell interaction and and also to promote movement of the DNA through the "conjugative bridge".

eg. F-factor in E. coli, which is ~94.5 kbp in size and present in any given cell at approximately one copy per cell.


As a rule of thumb, therefore, these conjugable plasmids are "large" as they have to encode a number of genes that undertake the tra and mob functions.(even though, some of these functions can be chromosomally encoded... merodiploid)

Are all plasmid vectors or R-factors created equally? No!

Variables: Size, high copy and low copy plasmids, strict or relaxed control of plasmid replication, gives rise to an Incompatability of different plasmids.

Consequently, plasmids are not named and grouped by size, however, or even by DNA homology, but by.......their "incompatibility".

eg. IncP plasmids have a broad-host range and include the IncQ or IncP4 group of plasmids.


Incompatabilities occur in any number of ways, but normally affects either the initiation of replication of the plasmid within various host bacterial cells. They can also be maintained through differing controls of the attachment of plasmids to the bacterial membrane (which, for some, is required for efficient segregation of low copy number plasmids into the two daughter cells).

So, what is necessary for genetic manipulation of DNA in bacteria

(a) Suitable vector
(b) Restriction enzymes.
(c) DNA Ligases.

Restriction Endonucleases: Restriction endonuclease provide an additional tool to facilitate the creation of physical maps of DNA


Type I restriction modification enzymes (first identified by Werner Arber and Dussoix in the1960's using lamda phage infection of E. coli) initially defined two different strains of E. coli -E. coliB and E. coli K12 (two E. coli strains that encode for slight, but specific variants of their HSD system (Host Specificity Determinant) -encoded by the hsdR, hsdM and hsdS genes). These enymes are expressed togther and generally require interactions with cofactors, such as S-Adenosyl methionine (AdoMet), hydrolyzed adenosine triphosphate (ATP), and magnesium (Mg2+) ions.


The enzymes are bi-functional, multimeric (multi-subunited) enzymes and ara capable of two incompatiable functions -either methylating or restricting DNA (i.e. EcoB strain will restrict and EcoK DNA and vice versa). Essentially, the enzymes are very similar, but distinct in their ability to recognize different DNA sequences.

The enzyme system is comprised of three distinct subunits R (135kDa) M (62kDa) and S (55kDa) subunits. Restriction and methylation are mutually exclusive.



eg. EcoB recognizes  5' TGA (N8) TGCT3',         EcoK12 recognizes  5'AAC(N6)GTGC3'



For a review..."Type I Restriction Systems: Sophisticated Molecular Machines (a Legacy of Bertani and Weigle)".

Type II restriction enzymes (most commonly used in Biotechnology) are only able to restrict DNA any methylase activity (if any) is present on a separate protein. Type II enzymes are usually dimeric proteins, and have a variety of digest patterns.


 Type II Restriction endonucleases... Modified from "youtube.com/watch?v=6U8bGOG9OAI"... red ball


Restriction characteristics. Blunt- , 5' and 3' "sticky- ends". DpnI (meth) or DpnII





Usually the site of restriction is found WITHIN the palindromic region, but not always, Type IIS


eg. FokI


Type III restriction enzymes are similar to Type I enzymes, they also have an ATPase requirement and differ mainly in that their M and S subunits are combined into one ~75kDa subunit, with the additional R subunit being ~108kDa. Again these enzymes are Bi-functional enzymes, normally as heterodimers, which can methylate and/or restrict simultaneously, although the methylase subunit can often work on its own. Moreover, methylation only occurs on one strand.

Usually the site of restriction is removed from the recognition site. with the enzyme cutting often cuttingh some 24-28 bases down from recognition site, eg. EcoP1 and EcoP15, and Hinf in Haemophilus influenzae.


5' AGACC - 23-NNN-|- N 3'
3' TGTGG - 2
3-|-NNNNN 5' 


In using these enzymes to clone fragments of DNA into cloning vectors there are a number of variables that need to be considered.

Size of restriction recognition site, will affect frequency of sites being found within any given DNA sequence.

G/C content of restriction site vs. G/C content of DNA to be restricted.

Time of restriction "efficiency"

Compatability of ends


 Fundamentals of Gene Cloning..."youtube.com/watch?v=Ik_Pxht1LM0"... red ball


Desirable attributes of "ideal" cloning vectors:

Use of E.coli as the preferred host for genetic manipulations has definitely biased the choice of vectors and choice of gene transfer.

Ideal cloning vectors do not exist in nature and, while most of the ones used are derived from bacteria in the wild, they have themselves been genetically engineered to accommodate man's purpose.

Replicates autonomously in bacterial host of choice, usually E. coli, and is not too large
Encodes for multiple drug resistances.
Encodes for various and numerous "single" restriction sites
Has a relatively high copy number.

pBR322 used to foot the bill.
Maintained at ~40-50 copies/cell
Enodes for 2 distinct drug resistances
Has a number of single sites.

By convention EcoR I site defines "0"

Origin of replication is on the opposite side of plasmid from the two drug resistance genes.
A number of single sites for restriction enzymes within the drug resistance genes promotes insertion inactivation of these genes gives the ability to select for the loss of one of the antibiotic resistances.




The importance of a good vector, and the use of both selection and screening techniques.


To Review a little...

Selection vs. screening: 5-bromo-4-chloro-3-indolyl-b-D-galactopyranoside (X-Gal) is a synthetic substrate analogue that is basically 'a dye' bound to a galactose sugar moiety through a ß 1-4 galactosidic bond.



If this bond is cleaved the "X" compound turns the medium it is in BLUE. 

Rationale for the EFFECTIVE use of the lacZ gene (3,072 bp) on to a plasmid requires the use of a lacZ- host. So the first thing that a researcher has to do to make appropriate use of this screening technology is to delete the original lacZ gene from the chromosome. 

Then, in the case of E. coli, the researcher would transfer the major part of the lacZ' gene on to an F plasmid, which would enable the researcher to use the now, almost ubiquitous ALPHA-COMPLEMENTATION technique, using plasmids like pUC18

   microarray   microarray

So, we now have the ability to clone various DNA fragments into specific, "designer" vectors and utilize them easily in bacterial strains.


What else can we do?

microarray  microarray    microarray  microarray



 red ball

Formation of Gene banks or Gene libraries for any given organism using "partial digests" of relatively "rare" restriction enzymes, which allows for the geneticist to create incredibly long 2D jigsaw puzzles -and, in so doing, "walk the chromosome" and form (as they prefer to say nowadays) "Contigs" of chromosomal fragments.

Having defined these fragments, the next step would be to determine the sequence of the cloned DNA, which (historically) has been defined (manually) by one of two techniques, but is now normally undertaken through automation:


Maxam/Gilbert chemical cleavage sequencing.  G -alone (DMS, piperidine), A + G (DMS, formic acid and piperidine), C + T Hydrazine, piperidine, and for C -alone( Hydrazine in high salt).


    Dideoxy sequence, courtesy of a Sanger (circa 1977), which in essence "replicates" the DNA sequence (with a few modifications) and requires the use of a synthetic DNA primers to initiate the activity of the DNA polymerase.



Automated sequence analysis Automated sequence analysis I


The use of automated sequence analysis, now standardized sequencing technology, utilizes the inherent powers of the uniform use of a "common" recipient vectors (or at least MCS's within vectors), which allows the reuse of specific, universal primers (hybridizing, for example, within the lacZ' sequence in pUC18) to generate a DNA sequence of the outermost portions of any cloned fragment. Thereafter, the internal fragment can either be sub-cloned, or the known sequences within the DNA itself can be used to generate more synthetic primers, which can subsequently be used either to "tag" the sequence or to extend the known sequences from the periphary inward.  At least, these are the approaches of a research lab that is designed to specifically analyze a relatively small region of any given chromosome.


So, we now have the ability to clone various DNA fragments into specific vectors and utilize them easily in some larger animals.... other than humans.




But, what happens if the goals are larger to clone and to sequence whole genomes?


Potential problems:

Size of genomes vary among the different 'karyotes', causing logistical problems. Complexity and Gene structure.

Size of fragments that different vectors can handle.

Paucity of unique restriction sites within large sequences of DNA eg. 6 base cutter, will cut every 64 or ~1 in evey 1,296 bp eve 8 bp cutters will cut 1 in 4096 bp on average.

Additional Karyote variations complicate the genetic landscape.


1 (A). SIZE PROBLEM -mapping

Size of genomes vary among the different 'karyotes', causing major logistical problems. Complexity and Gene structure.


Formation of Gene banks or Gene libraries for any given organism using "partial digests" of relatively "rare" restriction enzymes allows for the geneticist to create incredibly long 2D jigsaw puzzles and effectively "walk the chromosome" or form (as they prefer to use the term) "Contigs".

As importantly, what of multiple chromosomes??? How do you assort and map fragments from multiple chromosomes??.


Somatic Cell hybrids.       

Somatic cell hybridization is effectively a fusion of two types of cells (eg human and rodent) effectively used in human genome mapping, but it can be used (in principle) in many different animal systems. The procedure uses cells growing in culture along with the Sendai virus, or PEG, both of which have useful properties. Each Sendai virus has several points of attachments to cells that it infects, so it can potentially attach to two different cells at the same time. If they happen to be close together the membranes of the two cells may fuse and the two cells become one.

sdfds   sdfds  

Radiation Cell/DNA hybrids: which are more hybrids that provide for a "fine tune" genetic mapping technique, where essentially the cells are irradiated with >3000 rads of X rays to fragment the chromosomes. Cells bearing these fragments are then fused (as before) with the rodent cells to form a panel of different hybrids. In this case the hybrids have an assortment of fragments of human chromosomes.


The different chromosomes from the original human strain having been "tagged" with a series of markers, should then be retained at frequencies approximating the product of individual retention frequencies. These can yield a calculated recombinatory mapping unit, which can be calibrated to "approximate" 0.1 cM (m.u.).


       weewr  weewr  sdfds  sdfds  sdfds

...such labelling gives rise to Sequence Targeted Site (STS), which can also be used for constructing whole chromosome maps, as we shall see.


1 (B). SIZE PROBLEM -cloning

Generally, multicopy plasmids (such as pUC's) that are routinely used in molecular biology laboratories can accommodate up to ~10 - 25 kbp of insert DNA before the replication of the vector begins to select for deletions within the cloned sequence.

Potential Answer: Use of modified bacterial-phage vectors, which can enhance this size of cloned DNA up to ~45 kbp


Use of other vehicles, such as YAC's (Yeast Artificial Chromosomes) can enhance this capacity even further....to approximate 300 kbp to a Mbp of unknown sequence.


Yac2     Yac2  sdfds


In this way "chromosome walking" among and between different fragments becomes chromosome "striding".

Regardless it is the same "jig-saw puzzle" type of analysis.

Unfortunately, the process has inherent flaws:

• Large YAC's are more likely to encode for more than one particular chromosomal fragment which have been artificially joined by ligase! Therefore, the researcher needs to have a backup, structural and or genetic map to compare and contrast the different "contigs". Appreciation of Morgan’s mapping techniques was beneficial after all!

• More insidiously, yeast are unable to tolerate some DNA sequences, and preferentially remove these sequences, without the researcher being aware of the deletion.

How might a researcher overcome these concerns?

Even now, in these days of sequenced genomic sequences and the precise knowledge of whole genomes, is all this an historical aside, or is it all still relevant to our current genetic analysis? 

The answer remains a resounding.....Yes??!!!

One of the other, additional features that have complicated the physical mapping and eventual sequencing of eukaryotic genomes from yeast up to humans, has been the additional concerns about the very nature of the genes themselves and their genetic context within neighbouring DNA within which they reside.



Within large sequences of DNA.

Use only Partially restricted DNA fragments to make clones.

Use specific methylase modifying enzymes to "mask" internal sites before the linkers are added!

Alternatively use a technique called the "Achilles heel" technique, whereby the researcher can hide a known sequence from modification enzymes using modified restriction conditions or an oligonucleotide or DNA binding protein that will bind to these sequences. Modify the restricted/ligated fragments and then clone in to vectors by removing the "block" and cutting with the enzyme that will only cut once and only once within the "blocked" sequence, thereby allowing the fragment to be cloned.



Unfortunately for our egos, humans do not have the largest genome -hence the development of a C-value that attempts to rationalize a more rigorous definition of a complexity value on to the DNA within different organism, and refers to the amount of repetitious sequences within each of the genomes.

Mostly the genomes of eukaryotes are interspersions of moderately repetitive and unique sequences, but different chromosomes that encode some of the highly repetitive and some of the moderately repetitive DNA sequences have resulted from the presence of viral or transposon-like elements within the genome, e.g.. AluI sequences

There are effectively three major distinct classes of "complex" DNA in Eukaryotes:

Highly repetitive DNA (~10%) often < 100 bp in length, long tracts of 100Mb in length. e.g. Satellite DNA around centromeres, spacer DNA..

Moderately repetitive DNA (~30%), larger sequences of DNA, ~10 - 100,000 copies of between 300 - 5,000 bp per genome, e.g. rRNA genes, Ribosomal Proteins and Histones, mini- microsatellite DNA.

Unique DNA sequences (or low copy number) (~60%) having a complexity ~300,000,000 or 3.0x10[8] bp e.g. single genes, spacer DNA .

The proportion of the genome occupied by non-repetitive DNA varies widely from species to species.

Moreover, these repetitive sequences are not always evenly distributed throughout the dfferent chromosomes. They can reside within "short period" interspersions (100-200 bp length of repeats that are interspersed throughout 1,000 - 2,000 bp lengths of DNA) then there are the long-period of interspersions, which are 5,000 bp of repeated sequences interspersed throughout 35,000 (and up) bp region of DNA.

How do we determine all this, and then how do we characterize "complexity"?

.....Tm profiles of DNA denaturation and re-hybridization patterns


Transposed sequences, viral insertions and retroposons.


So much so, that it was reasoned that the large amount of repetitive DNA would make it impossible to accurately align sequences from genome fragments by “end” homology. It was decided that a scaffold would first be constructed from mapped markers to provide a context for accurate clone and sequence alignment [public consortium, human genome project

Satellite DNA: Surrounds centromeres and can be up to hundreds of kilobases in length, comprised of simple repetitious DNA. eg. the sequence 5’AATAACATAGAATAACATAGAATAACATAG3’ surrounds all centromeres in Drosophila melanogaster.

VNTR’s Variable number of tandem repeats of 15 -100 nucleotides in length that can extend from 1 - 5 kbp in humans.

And then of course there are the genes themselves...........

Dispersed Gene families of DNA such as the histones 100 - 1,000 copies per genome, actins, 5 -30 etc.


4. KARYOTE VARIATION IN GENE SIZE, GENE COMPLEXITY: Gene size, and degree of linearity

The breakdown of expressed mRNA as Introns / Exons etc. adds an additional layer of genetic complexity.



...giving rise to the necessity for us to appreciate two distinct types of physcial genomes within any given eukaryote. Be aware of the differences between a cDNA Gene bank vs. total DNA gene bank, the role of reverse transcriptase, poly A tails and RNaseH in preparing cDNA clones!

The presence of these "split" genes necessitate RNA processing as the message being has to be "whittled" down to a functional series of connected exons. is transcribed, which takes many forms: the major one being the excision or splicing out of non-expressed introns.

The only really good news is that in related genes in different organisms the organization of exons within the gene structure does not change overly much between DNA and RNA


Very often the organization of interrupted genes is so conserved that the Interruptions occur at homologous positions (relative to the coding sequence) in similar genes within different species.

For example, in all known active globin genes, including those of mammals, birds, and frogs. The first intron is always fairly short, and the second usually is longer, but the actual lengths can vary. Most of the variation in overall lengths between different globin genes results from the variation in the second intron. In the mouse, the second intron in the α-globin gene is only 150 bp long, so the overall length of the gene is 850 bp, compared with the major β-globin gene where the intron length of 585 bp gives the gene a total length of 1382 bp. The variation in length of the genes is much greater than the range of lengths of the mRNAs (α-globin mRNA = 585 bases, β-globin mRNA = 620 bases).

Size of genes, however, DOES vary as the "complexity" of the DNA increases from prokaryotes and lower eukaryotes to higher eukaryotes (as discussed previously).


But the size variation of genes is also evident in varying gene structure and gene complexity: the number and size of introns and exons varies among the different eukaryotes that have multiple introns; ranging from 200-50 or 60 kbp


Splicing of genes presents a whole new problem....Use of different gene banks.

Finally, be aware of the differences between cDNA Gene bank vs. total DNA gene, and the role of reverse transcriptase, poly A tails and RNaseH to derive cDNA clones

There are  a considerable number of benefits and variables that result from "exon shuffling", such as developmental changes (sxl gene expression in Drosphila melanogaster; (full details will be given in a later lecture) and ultimately, perhaps in potentiating rapid evolutionary changes, but all reflect a rearrangement of the available exons, in the general order in which they occur on the DNA.

So, to get back to the human genome....

Automated sequence analysis II



...giving rise to Sequence Targeted Site (STS) mapping


April 14 2003...

                    red ball                  red ball

         aewrew  aewrew  microarray  aewrew


Other tools that are now readily available to the molecular geneticist.

PCR (Polymerase Chain Reaction) amplification is a powerful tool, which dramatically facilitates the specific cloning of a given sequence of DNA, through the discrete but precise amplification of a select segment of DNA.

The opposing orientation of the primers is critical as, through a series of replication -denaturing-replication cycles the region of DNA in between the two primers can be specifically amplified.

The theoretical possibilities of this technique were converted into reality by the appropriation of a heat -stable DNA polymerases such as Taq polymerase. This enzyme is derived from a species of Thermus aquaticus that was originally isolated from the volcanic hot springs in Yellowstone park!



The primers "in opposition" can be designed to accommodate specific restriction sites within their sequence, and can thus be used to facilitate the subsequent cloning of the amplified DNA fragment.

PCR combined with automated sequencing analysis also allows for the possibility of distinguishing between similar sequences, from divergent organisms, or even more specifically (as we shall see), between alleles within a genome.

Automated sequence analysis III

The ability to PCR has opened up whole new methods of sequencing such as whole genome sequencing of "amplicon fragments ", which are the premis of "Deep Sequencing technologies"


              454.1454.1    454.1   454.1    454.1    





Copyright © Department of Biology, Georgia State UniversityView Legal Statement Contact Biology Office: Tel: 404-413-5300