The nuclear OXPHOS genes in insecta: a common evolutionary origin, a common cis-regulatory motif, a common destiny for gene duplicates
© Porcelli et al; licensee BioMed Central Ltd. 2007
Received: 19 April 2007
Accepted: 08 November 2007
Published: 08 November 2007
When orthologous sequences from species distributed throughout an optimal range of divergence times are available, comparative genomics is a powerful tool to address problems such as the identification of the forces that shape gene structure during evolution, although the functional constraints involved may vary in different genes and lineages.
We identified and annotated in the MitoComp2 dataset the orthologs of 68 nuclear genes controlling oxidative phosphorylation in 11 Drosophilidae species and in five non-Drosophilidae insects, and compared them with each other and with their counterparts in three vertebrates (Fugu rubripes, Danio rerio and Homo sapiens) and in the cnidarian Nematostella vectensis, taking into account conservation of gene structure and regulatory motifs, and preservation of gene paralogs in the genome. Comparative analysis indicates that the ancestral insect OXPHOS genes were intron rich and that extensive intron loss and lineage-specific intron gain occurred during evolution. Comparison with vertebrates and cnidarians also shows that many OXPHOS gene introns predate the cnidarian/Bilateria evolutionary split. The nuclear respiratory gene element (NRG) has played a key role in the evolution of the insect OXPHOS genes; it is constantly conserved in the OXPHOS orthologs of all the insect species examined, while their duplicates either completely lack the element or possess only relics of the motif.
Our observations reinforce the notion that the common ancestor of most animal phyla had intron-rich gene, and suggest that changes in the pattern of expression of the gene facilitate the fixation of duplications in the genome and the development of novel genetic functions.
As the number of sequenced eukaryotic genomes steadily increases, systematic comparison of closely related species ("phylogenetic shadowing")  allows characterization of recent evolutionary events before they are obscured through accumulation of random mutations, while comparison over larger evolutionary distances highlights lineage specific changes. Besides providing a better understanding of many aspects of genome evolution, recent studies based on a comparative genomic-based approach led to significant progress in clarifying the molecular mechanisms that control gene evolution and the origin of the differences in gene structure between eukaryotic species , showing that changes in exon-intron structure are largely independent of protein sequence evolution . Comparative analysis is also a powerful tool to identify conserved noncoding sequences essential for regulating gene expression [4–7]. However, investigation of the forces that shape eucaryotic genomes is still significantly hampered by the absence of comprehensive and readily available data on a sufficient number of informative lineages. Studies involving as large as possible a number of different species and gene subsets are needed, because selective pressures may differ significantly not only between different evolutionary lineages but also between particular types of genes .
Oxidative phosphorylation (OXPHOS), the primary energy-producing biological process in all aerobic organisms , generates ATP using the products of both nuclear and mitochondrial genes (OXPHOS genes); because they encode products organized in the respiratory complexes spanning the inner mitochondrial membrane, OXPHOS genes are subject to specific evolutionary constraints, e.g. because coordinate evolution is required to maintain the stochiometric balance between components of multisubunit complexes .
We previously reported  the identification of the D. pseudoobscura and A. gambiae orthologs of a set of D. melanogaster genes which are the putative counterparts of human OXPHOS genes . To extend our analysis, we recently identified and annotated the OXPHOS genes orthologs in nine more Drosophilidae genomes, in another Culicidae species (the yellow fever mosquito, Aedes aegypti), and in three non Dipteran insect species, i.e. Bombyx mori (silkworm), Apis mellifera (honeybee) and Tribolium castaneum (red fluor beetle), and we have compiled the MitoComp2 dataset , that provides an integrated view of the data obtained. Duplicates of the OXPHOS genes, when present in a genome, were also included in the dataset, which at present contains more than 1300 annotated genes. Drawing on this information, we present here a detailed comparative analysis of 68 gene clusters each comprising the putative orthologs of an OXPHOS gene in 11 Drosophilidae species and in five non-Drosophilidae insects.
This study focuses on three aspects of the evolutionary history of the insect OXPHOS genes, i.e. i) conservation of the exon-intron structure during evolution; ii) identification and conservation of a putative regulatory motif specific of genes involved in energy production in insects; iii) origin, fixation in the genome and functional significance of OXPHOS genes duplicates. We also compared the exon-intron structure of insect OXPHOS genes with their counterparts in three vertebrate species (Fugu rubripes (pufferfish), Danio rerio (zebrafish) and human), and in the cnidarian Nematostella vectensis (starlet sea anemone). The last comparison was felt informative because of the pivotal position that cnidarians occupy in metazoan phylogeny .
On the whole, our findings further validate the use of interspecific multialignments of orthologous sequences as a powerful tool to identify crucial features that constrain genome evolution. We identified several such features within OXPHOS genes transcriptional units, including known and novel regulatory motifs, splicing sites and previously unidentified genes within genes.
Results and discussion
MitoComp2 : a web resource for the comparative analysis of insect OXPHOS genes
The 68 nuclear OXPHOS genes studied in this work
Gene product (D. melanogaster gene)
NADH ubiquinone oxidoreductase
13 kDa A (CG8680), 13 kDa B (CG6463), 15 kDa (CG11455), 18 kDa (CG12203), 19 kDa (CG3683), 20 kDa (CG9172), 23 kDa (CG3944), 24 kDa (CG5703), 30 kDa (CG12079), 39 kDa (CG6020), 42 kDa (CG6343), 49 kDa (CG1970), 51 kDa (CG9140), 75 kDa (CG2286), B8 (CG15434), B12 (10320), B14 (CG7712), B14.5A (CG3621), B14.5B (CG12400), B14.7 (CG9350), B15 (CG12859), B16.6 (CG3446), B17 (CG13240), B17.2 (CG3214), B18 (CG5548), B22 (CG9306), ACP (CG9160), ASHI (CG3192), MLRQ (CG32230), MNLL (CG18624), PDSW (CG8844), SGDH (CG9762), AGGG (CG40002), MWFE (CG17054).
Flavoprotein (CG17246), Iron-sulfur (CG3283), Cytochrome B560 (CG6666), Cytochrome B small subunit (CG10219).
Ubiquinol-cytochrome c oxidoreductase
6.4 kDa (CG14482), 7.2 kDa (CG8764), 11 kDa (Ucrh), 14 kDa (CG3560), Iron-sulfur (CG7361), Cytochrome C1 (CG4769), Core protein 1 (CG3731), Core protein 2 (CG4169), Ubiquinone-binding protein QP-C (CG7580).
Cytochrome c oxidase
IV (CG10664), Va (CG14724), Vb (CG11015), VIa (CG17280), VIb (CG14235), VIc (CG14028), VIIa (CG9603), VIIc (CG2249).
F0/F1 ATP synthase
Alpha (CG3612), Beta (CG11154), Gamma (CG7610), Delta (CG2968), Epsilon (CG9032), B (CG8189), D (CG6030), E (CG3321), F (CG4692), G (CG6105), Coupling factor 6 (CG4412), Lipid-binding protein (CG1746), OSCP (CG4307).
To identify the putative orthologs of the D. melanogaster OXPHOS genes in other insect species we performed whole genome BLAST searches using the CDSs and amino-acid sequences of the D. melanogaster genes as queries. Orthology/paralogy relationships were inferred from (1) similarity of gene products, (2) conservation of exon/intron structure, (3) conservation of microsyntenic gene order and (4) evidence from phylogenetic trees. Sequences giving the reciprocal best hits in each genome were considered members of the orthologous gene cluster provided that the BLAST E-value was less than 10-30 and that they could be aligned with the D. melanogaster gene over at least 60% of the gene length. According to this criterion, all 68 OXPHOS genes investigated were found to have a counterpart in each insect species studied, except for five genes that were not identified in A. mellifera and one in T. castaneum possibly because the relevant genomic sequences were incomplete or did not give significant BLAST E-values due to an high level of divergence with the query sequences.
Overview of the comparison of the OXPHOS genes
No of genes
One exon genes
No of exons§
No of introns§
No of introns§ per gene
Av. intron size (bp)
One exon genes
MitoComp2 also contains additional information on the comparison of the non-Drosophilidae OXPHOS genes with their D. melanogaster orthologs (taken as representative of the Drosophilidae), providing a link to the multialignment of the coding sequences of the genes compared where the position of the introns is highlighted; a link to the multialignment of the deduced amino acid sequences of the gene products, including their human counterpart; and finally a schematic drawing comparing the exon-intron structure of members of the orthologous gene cluster.
OXPHOS gene structure evolution in insects
Pair-wise conservation of intron position in the OXPHOS genes
Dros.* (122) vs.
Agam (114) vs.
Aaeg (119) vs.
Bmor (187) vs.
Amel (171) vs.
Tcas (133) vs.
Human (301) vs.
Comparison of Drosophilidae and Culicidae indicates descent from a common Dipteran ancestor for members of almost all orthologous gene clusters studied; massive intron loss appears to have occurred indipendently in the two lineages (Figure 3).
Comparison of Dipterans with B. mori, T. castaneum and A. mellifera also indicates descent of non-Dipteran OXPHOS genes from ancestral intron-rich genes that existed before the divergence of insect lineages. 32% of the OXPHOS genes maintain an identical exon-intron structure in all insect genomes studied, while in 19% a discordant intron position was observed in at least one species. In the remaining instances, multiple changes in gene structure were found. In accord with studies suggesting that recombination between genomic sequences and a product of reverse transcription of a processed mRNA is the main mechanism of intron loss in mammals , in most cases only a single intron was lost, while the neighboring introns are conserved; in no case a gap was observed in the alignment, and the sequences flanking the lost intron are always strongly conserved.
The amount of both intron loss and intron gain differs strikingly between insect lineages. Using presence of an intron at a given position in a single species only as a criterion to infer intron gain after lineage divergence, we observed one lineage-specific gain event in the 11 Drosophilidae species studied, one in A. gambiae, two in A. aegypti, 36 in B. mori, 27 in A. mellifera and 15 in T. castaneum. It is of course possible that some of the concordant intron positions are due to independent insertions in different lineages [21, 22]; however, this is unlikely to explain all, or even most, observed intron-position correspondences .
The number of discordant intron positions observed in A. gambiae and A. aegypti, compared with the data obtained in Drosophilidae, suggests that the number of intron gain/loss events is not directly related to divergence time. Comparing A. gambiae and A. aegypti, on 121 total intron positions, 12 discordant positions were observed in 11 genes (Figure 3 and Additional file 2); in nine cases, comparison with other species strongly suggests intron loss; in three, the change in gene structure was most likely due to intron gain.
Overall, our data support a scenario in which extensive intron loss from intron-rich ancestral genes occurred during evolution of most insect OXPHOS genes. Intron gain also appears to have occurred quite often, although much less frequently in Dipterans than in other insect lineages (Figure 4).
Long term evolution of OXPHOS genes
The tentative scenario inferred from the comparative analysis of OXPHOS gene evolution in insects prompted us to extend our analysis to other, distantly related evolutionary lineages. First, we asked whether insect OXPHOS genes share a subtantial fraction of the intron positions with their orthologs in vertebrates, which would indicate an ancient origin predating the insect-vertebrates evolutionary split. To get information on this question we compared the exon-intron structure of the OXPHOS genes in Fugu, zebrafish and humans with each other and with their insect orthologs; the human/fish comparison is highly informative because among sequenced vertebrate genomes fish genomes are the most distantly related available for comparison with humans (the last common ancestor of fish and humans dates back 400–450 Myr, a divergence time substantially longer than the time of divergence of the insect species studied in this work).
Conservation of intron position, exon phase and exon length between insect OXPHOS genes and their counterparts in Fugu, Zebrafish and human strongly suggest descent from a common intron-rich ancestor in 55 out of 68 orthologous gene clusters.
In accord with recent work showing that the exon-intron structure of the gene is highly conserved throughout Vertebrates , very few changes in the organization of the OXPHOS genes were observed between Fugu and zebrafish, and, strikingly, between fish and humans. We found only two gene structure changes in vertebrates: the sequence of a coding exon of the NDUFS8 gene, encoding the 23 kDa subunit of Complex I, is interrupted in Fugu, but not in human or zebrafish, while an exon of the ATP5A1 gene, encoding the alpha chain of Complex V, is interrupted in human, but not in fish (Figure 2).
The last column of Table 3 reports the number of intron positions shared by human OXPHOS genes with their orthologs in the insect species studied. In accord with recent findings suggesting that the honeybee genome is the slowest evolving of the insect genomes so far sequenced , among the insect species examined A. mellifera shares the largest number of OXPHOS gene intron positions with Vertebrates: out of 171 total introns identified in the OXPHOS genes of A. mellifera, 60% are conserved at the same position in their human counterparts, while 34% of 301 intron position in human OXPHOS genes are shared with A. mellifera.
Finally, to extend the evolutionary history of the OXPHOS genes beyond the recent report that a significant fraction of human introns are shared with the genome of annelids  and therefore must predate the bilaterian radiation, we thought of interest to compare the exon-intron structure of 15 human OXPHOS genes with their orthologs in the cnidarian N. vectensis (as shown in Figure 1C, separation of the cnidarian lineage from Bilateria predates the Urbilaterian ancestor). This comparison revealed that N. vectensis shares 80% of the intron OXPHOS gene positions with Vertebrates, while 84% of the human intron positions are shared by Nematostella (Figure 2 and MitoComp2 web site ). Thus, these introns appear to have been present in very ancient ancestral genes predating even the cnidarian/Bilateria divergence, and the remarkable genetic complexity of Nematostella , toghether with the high level of conservation of intron positions between Cnidarians and Vertebrata, makes questionable a correlation between morphological complexity and gene structure.
Specific constraints on conservation of individual introns
Specific constraints may act on individual introns, favoring their conservation during evolution. Unsurprisingly, and in agreement with our previous report concerning a more limited OXPHOS genes sample , the exon-intron structure and the alternative splice forms of the orthologous genes encoding the NADH-ubiquinone oxidoreductase acyl carrier protein (D. melanogaster mtacp1, CG9160) ) and the ATP synthase epsilon chain (sun, CG9032) ), are strictly conserved in all insects studied, as shown by genomic structure comparison, alignment of splice variants and EST mapping (see the gene records in the Mitocomp2 dataset ). As suggested by Fedorova and Fedorov , conservation of the first intron at the 5'end of many genes also appears to be under stringent functional constraints. Because the first exon of most OXPHOS genes encodes a mitochondrial import signal and is usually much less conserved than other coding exons [29, 30], its alignment with orthologous regions is often problematic, and conservation of the first 5' intron position can only be inferred from phase conservation and exon length. With this caveat, the conservation at this position is striking: no loss of this intron was observed in 56 out of 68 orthologous gene clusters, although our analysis necessarily underestimates conservation at this position, since it can only address introns that interrupt the CDS because conservation of introns in the 5' UTR, that usually is hardly alignable with orthologous sequences, cannot be unambigously shown.
As discussed in the next section, conservation of the first 5' intron in OXPHOS genes may have important functional implications depending on the position of the nuclear regulatory gene motif (NRG) : only 13 intronless genes were found out of a total of 283 genes belonging to 48 orthologous gene clusters in which the energy regulatory motif is usually located in the first intron of the coding sequence; on the other hand, when the NRG motif is located upstream of the CDS (as it is in 18 OXPHOS gene clusters), the frequency of one-coding-exon genes is almost 25% (27 out of 107).
Conservation of regulatory elements
The regulation of eukaryotic gene expression is a process involving many different control mechanisms, including chromatin structure and cis-regulatory DNA sequences that bind specific proteins ; recent observations emphasize the importance of intergenic and intronic sequences in regulating transcription . Cross-species DNA sequence comparison is an excellent tool for identifying these biologically important elements, because the level of evolutionary conservation is correlated to the extent of functional constraints.
We used multialignment footprinting  and DNA pattern discovery programs [35, 36]) to identify conserved motifs in noncoding sequences of the OXPHOS genes in 11 Drosophilidae species, two Culicidae (A. gambiae and A. aegypti), and three non-dipteran insects (B. mori, A. mellifera and T. castaneum).
We focused our attention primarily on the genomic regions that in D. melanogaster contain the nuclear respiratory gene element (NRG), a palindromic 10-bp motif (RTTAYRTAAY) shared by all nuclear OXPHOS genes listed in Table 1 and by many other nuclear genes involved in the biogenesis and function of the mitochondrion . In D. melanogaster, most NRG elements are located 160–280 bp downstream of the transcription start site, most often within an intron.
Multiple NRG motifs are present in several genes: in 13 orthologous genes clusters, genes containing at least two NRG were found. The number of copies per gene of the element is variable, but is generally conserved between orthologs. An extreme case is the D. melanogaster CG1746 gene, encoding the lipid-binding protein P1 of ATP synthase: in the introns of this gene, and in its orthologs in other Drosophilidae species, seven copies of the element were identified.
The NRG element was always found downstream of the transcription start site, in 53 out of 68 orthologous gene clusters within the first intron of the gene. In all other cases, with a single exception, it is located either in the second intron or in the putative 5' UTR sequence. Only in the CG6666 gene cluster, encoding the cytochrome b560 subunit of complex II, the NRG element is located in a coding exon.
To study the conservation of the NRG element over long evolutionary times, we also searched noncoding regions of homologous OXPHOS genes of A. gambiae and A. aegypti, and of B. mori, A. mellifera and T. castaneum. Notwithstanding a divergence time between A. gambiae and A. aegypti of approximately 180 Mya, pairwise alignment and comparison with orthologous noncoding Drosophilidae sequences identified conserved NRG elements in most OXPHOS genes of these species, often with conservation of the subgenic localization (see Figure 5B for an example, and the MitoComp2 web site ), and it was possible to define a Culicidae NRG consensus for almost all OXPHOS genes.
Although NRG elements represent the most conserved noncoding sequences in orthologous OXPHOS genes, multispecies alignments also identified other significantly conserved elements with a tendency to maintain the same subgenic position, at least in Drosophilidae. Some of these sequences are strictly conserved in several OXPHOS gene clusters; others are specific of a single gene cluster (see Mitocomp2 dataset ). As these sequences are almost certainly subject to strong functional constraints, our findings further validate the use of intraspecific phylogenetic comparison to identify novel candidate regulatory elements, although there is increasing evidence of sequences with a cis-regulatory function that exhibit little if any primary sequence conservation, and cannot therefore be identified by multispecies DNA aligment .
We did not attempt a systematic study of the functional significance of the non-NRG conserved sequences in the non coding regions of insect OXPHOS genes. However, we would like to report an intriguing example of the interesting insights into the mechanisms shaping genome evolution that data mining the information available in the Mitocomp2 will hopefully provide in the future.
The CG1746 gene, encoding the lipid-binding protein P1 of ATP synthase, contains, besides seven highly conserved copies of the NRG element, at least 15 conserved DNA blocks of various lengths scattered throughout its non coding sequences. Eight such DNA stretches, strictly conserved in all the Drosophilidae species studied, are located within the intron at the 3' end of the gene (see the CG1746 gene entry in Mitocomp2 dataset ). A FlyBase  search revealed that in D. melanogaster this intron encompasses the untranslated Ribonuclease P RNA gene (RNaseP:RNA), and the sequences highly conserved in this intron are known to be crucial for the ribozyme function . Intriguingly, a search for homologous sequences in A, gambiae, A, aegypti, A. mellifera, B. mori and T. castaneum indicated that in these insects the orthologous intron does not contain the Ribonuclease P RNA gene, which is instead found within an intron of other, unrelated genes, different in each species (data not shown).
OXPHOS gene duplications
After previously identifying several duplicates of OXPHOS genes in D. melanogaster, D. pseudoobscura and A. gambiae , we have now asked the question whether annotation of OXPHOS genes duplicates in a greater number of insect species at various levels of divergence could provide further information on the forces that have shaped the evolutionary history of this functionally essential set of genes. We found one or more paralogs of 22 different OXPHOS genes in the Drosophilidae species studied. All the identified duplicates of OXPHOS gene appear to be true functional genes, since all present intact ORFs. Assuming that duplicates found in different microsyntenic contexts originated from independent duplication events, 34 independent duplication events would be sufficient to explain all the observed duplicates.
Not surprisingly, since retroposition is probably the most important mechanism of gene duplication and eventual evolution of novel genetic functions , of the 34 duplication events inferred 26 almost certainly were retropositional events originating duplicates that are intronless or possess only a very few introns, presumably subsequently acquired. In contrast, four of the events originated duplicates that maintain the intron/exon structure of the parental gene, and so were most probably segmental duplication events. There is no sufficient evidence to assume either mechanism for the remaining four duplication events.
Interestingly, seven of the independent events originated duplicates within introns of other genes, in support of the suggestion that retrocopies can become functional genes by exploiting the regulatory elements and the open chromatin state of neighboring transcriptional units . Presence of the duplication in all the species studied, conservation of microsyntenic gene order and evidence from phylogenetic trees suggest that 14 of the duplication events occurred before, and 20 after the Drosophilidae speciation (Figure 3).
We also found duplications of the OXPHOS genes in non-Drosophilidae. Eight independent duplication events were observed in A. gambiae, four in A. aegypti (assuming for convenience that in this species the 80 or more copies of the Aaeg/CG4692 gene, encoding the f chain of ATP synthase, originated from a single amplification event), seven in B. mori, two in A. mellifera and three in T. castaneum. Seven of these duplications involve genes also duplicated in Drosophilidae, so, although pair-wise orthology can not be reliably assigned between duplicates in Drosophila and in other insects, they were probably present in a common ancestor of all the insect species adressed in this work.
It should be noted that the number of OXPHOS gene duplication events inferred to have occurred in Drosophilidae is significantly higher than in any of the other insect lineages examined. This fact could indicate the existence in Drosophilidae of special mechanisms favoring the fixation of OXPHOS gene paralogs in the genome, or, together with the high number of intron gain/loss events in Drosophilidae reported in a previous section, indicate an especially high level of retrotranscriptional activity in this lineage.
In D. melanogaster and in A. gambiae, OXPHOS gene duplicates are expressed at a much lower level than their parent genes, as inferred by the abundance in the public databases of ESTs derived from their transcripts; moreover, in D. melanogaster they exhibit a strongly testis-biased pattern of expression . Based on this data, we suggested that acquiring a new pattern of expression could be required to maintain a duplicate copy of certain genes in the genome. In support of this hypothesis, in an EST library from D. yakuba testes (WashU Drosophila Yakuba EST Project) the expression of OXPHOS gene duplicates is also strongly testis-biased (not shown). That similar mechanisms could favor fixation of gene duplications not only in insects, but also in other organisms is suggested by the independently formulated "out of the testes" hypothesis , proposing that in primates functional retrogenes are initially expressed in testes and only later evolve different expression patterns and potentially novel genetic functions.
The study of the conservation of the NRG putative regulatory element in insect OXPHOS genes presented in this paper (see above) provides intriguing evidence suggesting a possible mechanism to maintain OXPHOS gene duplications in the genome. In total, we have identified 215 OXPHOS gene duplicates in the insect species studied. In 214 out of the 215 cases we found one or more NRG elements only in one of the identified paralogs. Conservation of microsynteny (where possible to ascertain, i.e., in Drosophilidae) and of the structural organization of the gene strongly suggest that the genes maintaining the NRG element are the direct phylogenetic derivatives (true functional orthologs) of the ancestral insect genes responsible for the basic housekeeping function of energy production. On the other hand, the NRG motif is absent (or, as shown in the examples of Additional file 3, sharply diverges from the consensus) in almost all OXPHOS gene duplicates that have achieved long-term fixation in the genome. While the absence of the NRG motif is of course expected in duplicates originated by retroposition of genes in which the element is located in an intron, it suggests preferential loss due to selective constraints in duplicates originated by segmental duplication, or by retroposition from the 26 genes in which it is in the 5'UTR. The single exception showing conservation of a standard NRG element in the 5'UTR of two paralogs of the same gene concerns a duplicate of the mtacp1 gene, encoding the acyl carrier protein of complex I, which is found in D. persimilis but not in its sister species D. pseudoobscura. The 100% identity shared not only in the CDS but also in the UTRs regions with one of the mRNAs generated by the alternative splicing of the mtacp1 gene transcript, the lack of the introns and of the promoter region and a target-site duplication of eight bp flanking the duplicate suggest that it derives from one of the more recent retroposition events documented to date (see the mtacp1 gene entry in MitoComp2 dataset ).
Conservation of the NRG element, and probably of its localization in the transcriptional unit, are evolutionary constraints expected to act in a specific manner on OXPHOS genes; however, we would like to suggest that loss of regulatory elements and consequent changes in the pattern of expression of the gene could be a general mechanisms that facilitates the fixation of duplications in the genome when, as in the case of genes encoding products that are part of multiprotein complexes, the presence of multiple gene copies with the original pattern of expression would be deleterious . In turn, this could allow the development of novel genetic functions that is usually assumed to be the main evolutionary advantage of gene duplication .
We have cataloged the orthologs, as identified by sequence homology, conservation of microsynteny and structural organization, of 68 nuclear genes that control oxidative phoshorylation in 11 Drosophilidae species whose genomic sequencing has been recently completed, and in five non-Drosophilidae insect species, and compiled a web-based dataset, MitoComp2 , containing all data on which this paper is based and available online. Our results indicate that a common ancestor of the insect lineages examined possessed intron rich OXPHOS genes and that extensive intron loss occurred during evolution; lineage-specific intron gain also occurred, least frequently in Dipterans. Furthermore, comparison with the very distantly related vertebrate and cnidarian lineages shows that many of the OXPHOS genes introns already existed in ancestral genes predating the cnidarian/Bilateria evolutionary split.
Comparative analysis also suggests that conservation of the nuclear respiratory gene element, constantly conserved in the OXPHOS gene orthologs of all the insect species examined, has played a key role in the evolution of the insect OXPHOS genes. Furthermore, we found one or more paralogs of 22 different OXPHOS genes in the Drosophilidae species studied, and showed that only the functional orthologs of the ancestral insect genes responsible for the basic housekeeping function of energy production maintain the NRG element, while their paralogs, either originated by retrotranscription or by segmental duplication either completely lack the NRG element or possess only presumably non-functional relics of the motif. Based on this data, we suggest that changes in the pattern of expression of the gene (as testis-specific expression) could facilitate the fixation of duplications in the genome and in turn the development of novel genetic functions.
BlastN and TBlastN  searches of contigs, scaffolds and ESTs from FlyBase  were performed using D. melanogaster OXPHOS CDSs and peptides listed in the MitoDrome database  as queries to identify orthologous OXPHOS genes and their duplications in Drosophilidae and other insect genomes.
Sequence sources: D. erecta, D. ananassae, D mojavensis, D. virilis and D. grimshawi were sequenced by Agencourt, D. simulans and D. yakuba were sequenced at Washington University, D. persimilis and A. aegypti were sequenced by the Broad Institute, D. willistoni was sequenced by TIGR, D. melanogaster was sequenced by the Berkeley Drosophila Genome Project and Celera , D. pseudoobscura , T. castaneum and A. mellifera  were sequenced at Baylor, A. gambiae was sequenced by the Anopheles Genome Consortium , B. mori was sequenced at the Southwest Agricultural University  and by the Silkworm Genome Research Program .
Contigs, scaffolds and ESTs from the National Center for Biotechnology Information (NCBI)  and StellaBase  were searched using the human OXPHOS peptides from Swiss-Prot  to identify orthologous OXPHOS genes and their duplications in F. rubripes, zebrafish (D. rerio) and N. vectensis. Additionally, single-trace sequences were screened at the TraceSite of NCBI  using MEGABlast and were assembled manually. Human genomic and mRNA sequences were retrieved from Ensembl .
Duplicate gene pairs within a genome were identified as best reciprocal hits with an E-value of less than 10-20 in both directions in a TBLASTN search using the default parameters. For convenience, each newly identified insect gene is indicated in this paper by a term comprising the abbreviation of the species followed by the CG number of its counterpart in D. melanogaster.
The genomic sequence of each gene identified was searched manually for exon-intron boundaries and the predicted transcribed sequence was reconstructed in silico. All insect genomic, mRNA/CDS and amino acid sequences utilized for this study are archived at the MitoComp2 web site ; the vertebrate and Nematostella genomic sequences recovered are available on request from C.C.
To identify NRG motifs and other conserved elements in noncoding sequences of Drosophilidae OXPHOS genes, we aligned members of each orthologous gene cluster, manually defined exon-intron boundaries, and searched for DNA stretches maintaining high consensus in all the Drosophilidae species studied. Pair wise alignment and comparison with orthologous Drosophilidae sequences was used to identify conserved NRG elements in noncoding sequences of A. gambiae and A. aegypti OXPHOS genes.
To identify NRG motifs in B. mori, A. mellifera and T. castaneum OXPHOS genes we used the Weeder pattern discovery program  and the Regulatory Sequence Analysis Tools from the RSAT server . Position weight matrices (PWM) were created with the Consensus software from the RSAT server. The graphical representations of NRG motifs as sequence logos  were generated using WebLogo .
We thank Agencourt, Inc. (D. erecta, D. ananassae, D. mojavensis, D. virilis and D. grimshawi), the Washington University Genome Center (D. simulans and D. yakuba), TIGR (D. willistoni) and the Broad Institute (D. sechellia and D. persimilis) for prepublication access to genome data. This work was supported by grants from Ministero dell'Istruzione, dell'Università e della Ricerca (MIUR) to C.C and Fondo Italiano Ricerca di Base (FIRB) project "Laboratorio Italiano di Bioinformatica" to G.P.
- Boffelli D, McAuliffe J, Ovcharenko D, Lewis KD, Ovcharenko I, Pachter L, Rubin EM: Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science. 2003, 299: 1391-1394. 10.1126/science.1081331.View ArticlePubMedGoogle Scholar
- Roy SW, Gilbert W: The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet. 2006, 7: 211-221.PubMedGoogle Scholar
- Yandell M, Mungall CJ, Smith C, Prochnik S, Kaminker J, Hartzell G, Lewis S, Rubin GM: Large-scale trends in the evolution of gene structures within 11 animal genomes. PloS Comput Biol. 2006, 2: e15-10.1371/journal.pcbi.0020015.PubMed CentralView ArticlePubMedGoogle Scholar
- Wolfe A, Goodson M, Goode D, Snell P, McEwen G, Vavouri T, Smith S, North P, Callaway H, Kelly K: Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 2005, 3: e7-10.1371/journal.pbio.0030007.View ArticleGoogle Scholar
- Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, Minovitsky S, Dubchak I, Holt A, Lewis KD: In vivo enhancer analysis of human conserved non-coding sequences. Nature. 2006, 444: 499-502. 10.1038/nature05295.View ArticlePubMedGoogle Scholar
- Glazov EA, Pheasant M, McGraw EA, Bejerano G, Mattick JS: Ultraconserved elements in insect genomes: A highly conserved intronic sequence implicated in the control of homothorax mRNA splicing. Genome Res. 2006, 15: 800-808. 10.1101/gr.3545105.View ArticleGoogle Scholar
- Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005, 15: 1034-1050. 10.1101/gr.3715005.PubMed CentralView ArticlePubMedGoogle Scholar
- Jeffares DC, Mourier T, Penny D: The biology of intron gain and loss. Trends Genet. 2006, 22: 16-22. 10.1016/j.tig.2005.10.006.View ArticlePubMedGoogle Scholar
- Saraste M: Oxidative phosphorylation at the fin de siècle. Science. 1999, 283: 1488-1493. 10.1126/science.283.5407.1488.View ArticlePubMedGoogle Scholar
- Lemos BM, Meiklejohn CD, Hartl DL: Regulatory evolution across the protein interaction network. Nat Genet. 2004, 36: 1059-1060. 10.1038/ng1427.View ArticlePubMedGoogle Scholar
- Tripoli G, D'Elia D, Barsanti P, Caggese C: Comparison of the oxidative phosphorylation (OXPHOS) nuclear genes in the genomes of Drosophila melanogaster, Drosophila pseudoobscura and Anopheles gambiae. Genome Biol. 2005, 6: R11-10.1186/gb-2005-6-2-r11.PubMed CentralView ArticlePubMedGoogle Scholar
- Sardiello M, Licciulli F, Catalano D, Attimonelli M, Caggese C: MitoDrome: a database of Drosophila melanogaster nuclear genes encoding proteins targeted to the mitochondrion. Nucleic Acids Res. 2003, 31: 322-324. 10.1093/nar/gkg123.PubMed CentralView ArticlePubMedGoogle Scholar
- MITOCOMP2. [http://www.mitocomp.uniba.it]
- Technau U, Rudd S, Maxwell P, Gordon PMK, Saina M, Grasso LC, Hayward DC, Sensen CW, Saint R, Holstein TW: Maintenance of ancestral complexity and non-metazoan genes in two basal cnidarians. Trends Genet. 2005, 21: 633-639. 10.1016/j.tig.2005.09.007.View ArticlePubMedGoogle Scholar
- Russo CA, Takezaki N, Nei M: Molecular phylogeny and divergence times of drosophilid species. Mol Biol Evol. 1995, 12: 391-404.PubMedGoogle Scholar
- Kwiatowski J, Krawczyk M, Jaworski M, Skarecky D, Ayala FJ: Erratic evolution of glycerol-3-phosphate dehydrogenase in Drosophila, Chymomyza, and Ceratitis. J Mol Evol. 1997, 44: 9-22. 10.1007/PL00006126.View ArticlePubMedGoogle Scholar
- Besansky NJ, Fahey GT: Utility of the white gene in estimating phylogenetic relationships among mosquitoes (Diptera: Culicidae). Mol Biol Evol. 1997, 14: 442-454.View ArticlePubMedGoogle Scholar
- Zdobnov EM, Bork P: Quantification of insect genome divergence. Trends Genet. 2007, 23: 16-20. 10.1016/j.tig.2006.10.004.View ArticlePubMedGoogle Scholar
- Savard J, Tautz D, Richards S, Weinstock GM, Gibbs RA, Werren JH, Tettelin H, Lercher MJ: Phylogenomic analysis reveals bees and wasps (Hymenoptera) at the base of the radiation of Holometabolous insects. Genome Res. 2006, 16: 1334-1338. 10.1101/gr.5204306.PubMed CentralView ArticlePubMedGoogle Scholar
- Roy SW, Fedorov AV, Gilbert W: Complex early genes. Proc Natl Acad Sci USA. 2003, 102: 1986-1991. 10.1073/pnas.0408355101.View ArticleGoogle Scholar
- Tarrio R, Rodriguez-Trelles F, Ayala FJ: A new Drosophila spliceosomal intron position is common in plants. Proc Natl Acad Sci USA. 2003, 100: 6580-6583. 10.1073/pnas.0731952100.PubMed CentralView ArticlePubMedGoogle Scholar
- Stoltzfus A: Molecular evolution: introns fall into place. Curr Biol. 2004, 14: R351-352. 10.1016/j.cub.2004.04.024.View ArticlePubMedGoogle Scholar
- Sverdlov AV, Rogozin IB, Babenko VN, Koonin EV: Conservation versus parallel gains in intron evolution. Nucleic Acids Res. 2005, 33: 1741-1748. 10.1093/nar/gki316.PubMed CentralView ArticlePubMedGoogle Scholar
- Roy SW, Fedorov A, Gilbert W: Large-scale comparison of intron positions in mammalian genes shows intron loss but no gain. Proc Natl Acad Sci USA. 2003, 100: 7158-7162. 10.1073/pnas.1232297100.PubMed CentralView ArticlePubMedGoogle Scholar
- The Honeybee Genome Sequencing Consortium: Insights into social insects from the genome of the honeybee Apis mellifera. Nature. 2006, 443: 931-949. 10.1038/nature05260.PubMed CentralView ArticleGoogle Scholar
- Raible F, Tessmar-Raible K, Osoegawa K, Wincker P, Jubin C, Balavoine G, Ferrier D, Benes V, de Jong P, Weissenbach J: Vertebrate-type intron-rich genes in the marine annelid Platynereis dumerilii. Science. 2005, 310: 1325-1326. 10.1126/science.1119089.View ArticlePubMedGoogle Scholar
- Ragone G, Caizzi R, Moschetti R, Barsanti P, De Pinto V, Caggese C: The Drosophila melanogaster gene for the NADH:ubiquinone oxidoreductase acyl carrier protein: developmental expression analysis and evidence for alternatively spliced forms. Mol Gen Genet. 1999, 261: 690-697. 10.1007/s004380050012.View ArticlePubMedGoogle Scholar
- Fedorova L, Fedorov A: Introns in gene evolution. Genetica. 2003, 118: 123-131. 10.1023/A:1024145407467.View ArticlePubMedGoogle Scholar
- Schatz G, Dobberstein B: Common principles of protein translocation across membranes. Science. 1996, 271: 1519-1526. 10.1126/science.271.5255.1519.View ArticlePubMedGoogle Scholar
- Voos W, Martin H, Krimmer T, Pfanner N: Mechanisms of protein translocation into mitochondria. Biochim Biophys Acta. 1999, 1422: 235-254.View ArticlePubMedGoogle Scholar
- Sardiello M, Tripoli G, Romito A, Minervini C, Viggiano L, Caggese C, Pesole G: Energy biogenesis: one key for coordinating two genomes. Trends Genet. 2005, 21: 12-16. 10.1016/j.tig.2004.11.009.View ArticlePubMedGoogle Scholar
- Smale ST, Kadonaga JT: The RNA polymerase II core promoter. Annu Rev Biochem. 2003, 72: 449-479. 10.1146/annurev.biochem.72.121801.161520.View ArticlePubMedGoogle Scholar
- Bergman CM, Kreitman M: Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. Genome Res. 2001, 11: 1335-1345. 10.1101/gr.178701.View ArticlePubMedGoogle Scholar
- Corpet F: Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 1998, 16: 10881-10890. 10.1093/nar/16.22.10881.View ArticleGoogle Scholar
- Pavesi G, Mereghetti P, Mauri G, Pesole G: Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 2004, 32: W199-W203. 10.1093/nar/gkh465.PubMed CentralView ArticlePubMedGoogle Scholar
- van Helden J: Regulatory sequence analysis tools. Nucleic Acids Res. 2003, 31: 3593-3596. 10.1093/nar/gkg567.PubMed CentralView ArticlePubMedGoogle Scholar
- Pang KC, Frith MC, Mattick JS: Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function. Trends Genet. 2006, 22: 1-5. 10.1016/j.tig.2005.10.003.View ArticlePubMedGoogle Scholar
- Drysdale RA, Crosby MA: FlyBase: genes and gene models. Nucleic Acids Res. 2005, 33: D390-D395. 10.1093/nar/gki046.PubMed CentralView ArticlePubMedGoogle Scholar
- Piccinelli P, Rosenblad MA, Samuelsson T: Identification and analysis of ribonuclease P and MRP RNA in a broad range of eukaryotes. Nucleic Acids Res. 2005, 33: 4485-4495. 10.1093/nar/gki756.PubMed CentralView ArticlePubMedGoogle Scholar
- Esnault C, Maestre J, Heidmann T: Human LINE retrotransposons generate processed pseudogenes. Nat Genet. 2000, 24: 363-367. 10.1038/74184.View ArticlePubMedGoogle Scholar
- Vinckenbosch N, Dupanloup I, Kaessmann H: Evolutionary fate of retroposed gene copies in the human genome. Proc Natl Acad Sci USA. 2006, 103: 3220-3225. 10.1073/pnas.0511307103.PubMed CentralView ArticlePubMedGoogle Scholar
- Papp B, Pàl C, Hurst LD: Dosage sensitivity and the evolution of gene families in yeast. Nature. 2003, 424: 194-197. 10.1038/nature01771.View ArticlePubMedGoogle Scholar
- Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science. 2000, 290: 1151-1155. 10.1126/science.290.5494.1151.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF: The genome sequence of Drosophila melanogaster. Science. 2000, 287: 2185-2195. 10.1126/science.287.5461.2185.View ArticlePubMedGoogle Scholar
- Richards S, Liu Y, Bettencourt BR, Hradecky P, Letovsky S, Nielsen R, Thornton K, Hubisz MJ, Chen R, Meisel RP: Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution. Genome Res. 2005, 15: 1-18. 10.1101/gr.3059305.PubMed CentralView ArticlePubMedGoogle Scholar
- Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JM, Wides R: The genome sequence of the malaria mosquito Anopheles gambiae. Science. 2002, 298: 129-149. 10.1126/science.1076181.View ArticlePubMedGoogle Scholar
- Biology Analysis Group and Genome Analysis Group: A draft sequence for the genome of the domesticated silkworm (Bombyx mori). Science. 2004, 306: 1937-1940. 10.1126/science.1102210.View ArticleGoogle Scholar
- Mita K, Kasahara M, Sasaki S, Nagayasu Y, Yamada T, Kanamori H, Namiki N, Kitagawa M, Yamashita H, Yasukochi Y: The genome sequence of silkworm, Bombyx mori. DNA Res. 2004, 11: 27-35. 10.1093/dnares/11.1.27.View ArticlePubMedGoogle Scholar
- NCBI blast server. [http://0-www.ncbi.nlm.nih.gov.brum.beds.ac.uk/BLAST/]
- Sullivan JC, Ryan JF, Watson JA, Webb J, Mullikin JC, Rokhsar D, Finnerty JR: StellaBase: the Nematostella vectensis genomics database. Nucleic Acids Res. 2006, 34: D495-499. 10.1093/nar/gkj020.PubMed CentralView ArticlePubMedGoogle Scholar
- ExPASy – Swiss-Prot and TrEMBL. [http://us.expasy.org/sprot]
- Ensembl human genome server. [http://www.ensembl.org/Homo_sapiens/index.html]
- MultAlin. [http://prodes.toulouse.inra.fr/multalin/multalin.html]
- RSAT server. [http://rsat.ulb.ac.be/rsat/]
- Schneider TD, Stephens RM: Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990, 18: 6097-6100. 10.1093/nar/18.20.6097.PubMed CentralView ArticlePubMedGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14: 1188-1190. 10.1101/gr.849004.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.