1. Introduction
Transposable elements (TEs) are present in almost all species and, in many organisms, they contribute to a considerable portion of the genome. Nevertheless, the biological importance of TEs has not yet been adequately understood. Hypotheses about the roles played by TEs range from genomic parasites to symbiontic or mutualistic compounds (Kidwell & Lisch, Reference Kidwell and Lisch2001; Brookfield, Reference Brookfield2005). Also, the TEs ‘life cycle’ in host genomes has likewise been a matter of discussion, and the way that TEs invade, are maintained, are controlled, are domesticated, or are even lost in genomes is not fully comprehended (Le Rouzic & Capy, Reference Le Rouzic and Capy2005). TEs are extremely heterogenic in composition, molecular features and transpositional mechanisms. Class I elements are replicated by an RNA intermediary, and class II elements use DNA as a mediator for transposition. In both classes, there are autonomous elements that produce the necessary enzymes for transposition and non-autonomous elements that use the enzymes produced by autonomous elements (Capy et al., Reference Capy, Bazin, Higuet and Langin1998).
The hobo element is a class II TE, and belongs to the hAT superfamily, which is widely distributed in plants, animals and fungi (Calvi et al., Reference Calvi, Hong, Findley and Gelbart1991). While hobo itself is restricted to the melanogaster group of Drosophila (Daniels et al., Reference Daniels, Chovnick and Boussy1990), some hobo-like elements have been found in several Diptera species, like Musca domestica (Atkinson et al., Reference Atkinson, Warren and O'Brochta1993), in some Lepidoptera species (DeVault & Narang, Reference Devault and Narang1994; Borsatti et al., Reference Borsatti, Azzoni and Mandrioli2003) and in different tephritids (Handler & Gomez, Reference Handler and Gomez1996; Torti et al., Reference Torti, Gomulski, Bonizzoni, Murelli, Moralli, Guglielmino, Raimondi, Crisafulli, Capy, Gasperi and Malacrida2005).
In Drosophila, hobo is found in three forms. The first form is the complete element, or canonical hobo, about 3 kb long, with 12 bp of terminal inverted repeats (TIRs) and a gene with the potential to encode a transposase enzyme. It is known that in Drosophila melanogaster the complete hobo element is active and capable of producing the hybrid dysgenesis syndrome (Blackman et al., Reference Blackman, Koehler, Grimaila and Gelbart1989; Yannopoulos et al., Reference Yannopoulos, Stamatis, Monastrioti, Hatzopoulos and Louis1987). The second form corresponds to defective elements. They exhibit sequences that are very similar to those of the canonical hobo; however, deletions of variable length in the internal portion of the element are found. Complete hobo elements and their deleted derivatives are present only in D. melanogaster and its sibling species, Drosophila simulans and Drosophila mauritiana (Anxolabehere et al., Reference Anxolabehere, Kidwell and Periquet1988). In D. melanogaster and D. simulans, these sequences are present in some strains (called H), and absent in others (denominated E strains, for ‘Empty’). The canonical hobo and its deleted derivatives are supposed to be recent acquisitions of the D. melanogaster genome (Anxolabehere et al., Reference Anxolabehere, Kidwell and Periquet1988; Boussy & Daniels, Reference Boussy and Daniels1991; Simmons, Reference Simmons1992). Finally, the third form is described as a hobo relic or hobo-related sequence (hRS). In comparison with the canonical hobo, the characterized sequences have around 80% similarity, with multiple rearrangements, and they are not able to code for a functional transposase. The relics are present in all strains of the melanogaster subgroup species and the montium subgroup species (Daniels et al., Reference Daniels, Chovnick and Boussy1990). The earliest analyses suggested that these sequences correspond to an ancient hobo element present in the melanogaster group ancestral. The sequences are supposed to be inactive (Lim, Reference Lim1988; Daniels et al., Reference Daniels, Chovnick and Boussy1990; Galindo et al., Reference Galindo, Bigot, Sánchez, Periquet and Pascual2001).
Recently, we described a mobilizable hobo relic in D. simulans, isolated in a de novo mutation that occurred in a hypermutable strain (Torres et al., Reference Torres, Fonte, Valente and Loreto2006). This hRS element, called hobo va, is 1·2 kb long, defective, with roughly 82% similarity at DNA level with the canonical hobo. However, they have extremely conserved 200 bp in each subterminal region, which are significantly similar to the canonical hobo. The inner region of this element is almost completely composed of A and T arranged as imperfect microsatellites. It has also been suggested that this relic hobo could be mobilizable by the canonical element. Furthermore, the presence of sequences similar to hobo va in Drosophila sechellia suggested that these relic hobo elements could have been kept mobilizable since the divergence time between D. simulans and D. sechellia (0·4 million years ago (MYA)).
In the present paper, we describe the presence of hobo va homologous sequences (hobo vahs) in various species of the melanogaster group and we discuss the possibilities of the origin and maintenance of these non-autonomous elements. Moreover, we have shown ‘shrinking’ events of some hobo vahs sequences that could be the origin of some related miniature inverted-repeat TEs (MITEs).
2. Material and methods
2.1. Fly stocks
The PCR search for sequences homologous to hobo va in genomic DNA was carried out in the followings species: D. sechellia (the Seichelles island, 1985; coll. J. David), D. mauritiana (the Mauritius island, 1988; coll. J. David), Drosophila santomea (São Tome, Parque Obo; coll. D. Lachaise), D. melanogaster, Drosophila teissieri (STO384.3 Uganda, Kibale Forest; coll. D. Lachaise), Drosophila ananassae (Florianópolis, Brazil, 2005; coll. M. Gottschakk), Drosophila malerkotliana (Florianópolis, Brazil, 2005; coll. M. Gottschakk), Drosophila kikkawai (Florianópolis, Brazil, 2005; coll. M. Gottschakk) and D. simulans (dpp strain, Eldorado, RS, Brazil, 1989). The source and collection date of stocks are given in parentheses.
2.2. Genome search
Initially, the search for sequences homologous to hobo va (Torres et al., Reference Torres, Fonte, Valente and Loreto2006) was carried out in the genomes of the following species: D. ananassae, Drosophila pseudoobscura, Drosophila persimilis, Drosophila willistoni, Drosophila mojavensis, Drosophila virilis, Drosophila grimshawi, D. simulans, Drosophila yakuba, D. sechellia, D. melanogaster and Drosophila erecta, recently available and analysed by Clark et al. (Reference Clark, Eisen, Smith, Bergman, Oliver, Markow, Kaufman, Kellis and Gelbart2007). The search was performed using the BLAT (Kent, Reference Kent2002) tool available in the UCSC Genome Browser Database (Karolchik et al., Reference Karolchik, Baertsch, Diekhans, Furey, Hinrichs, Lu, Roskin, Schwartz, Sugnet, Thomas, Weber, Haussler and Kent2003), with the assistance of the UCSC Table Browser data retrieval tool (Karolchik et al., Reference Karolchik, Hinrichs, Furey, Roskin, Sugnet, Haussler and Kent2004). All hits were analysed and 1 kb on top of both the sides of the hit was retrieved for subsequent alignments and analyses of these sequences. Searches were also performed using the FlyBase BLAST Service (http://flybase.bio.indiana.edu/blast/) and the NCBI Traces Archives using the Mega BLAST tool (http://www.ncbi.nlm.nih.gov/blast/mmtrace.shtml) (Altschul et al., Reference Altschul, Madden, Schaffer, Zhang, Zhang, Miller and Lipman1997) with the default parameters.
The initial sequences used as query were the D. melanogaster canonical hobo (M69216) and the D. simulans hobo va (AY764286). Subsequently, all retrieved sequences were also used as query until no additional new sequences were obtained. The retrieved sequences were classified using the following criteria: (i) putatively mobilizable sequences (PMS) – in these sequences, TIRs and sometimes target sequence duplications (TSDs) were present; (ii) incomplete sequences – without one or both TIRs; and (iii) degenerate sequences, with similarity <80%. The degenerate sequences were not analysed but can be made available on request.
The structural features that allow several hobo vahs to be classified as PMS are the extremely conserved hobo TIRs (identical to canonical hobo) and also a well conserved 200 bp long component in each subterminal region of the element. These characteristics do not guarantee that these elements will be mobilizable, and it is only possible to show such a property for a specific sequence in an experimental way. Furthermore, some alterations in the TIRs and subterminal sequences can occur even when the element maintains itself mobilizable. In this perspective, our estimates are conservative and correspond only to the elements that showed characteristics suggesting that they are able to be mobilizable.
The genome assemblies used correspond to the final versions released (Clark et al., Reference Clark, Eisen, Smith, Bergman, Oliver, Markow, Kaufman, Kellis and Gelbart2007). The contigs and assemblies names, the sequences coordinates and the length of the sequences used can be seen in Tables 1S and 3S (supplementary material). Also, the alignment of the complete dataset is available in the supplementary material.
2.3. PCR amplification and sequencing
The primers used to specifically amplify the hobo vahs were: hva1s (forward), 5′-cataacggaagggtagagaag-3′; hva2as (reverse), 5′-cgtccacccgataaacactc-3′; Vanew1 (forward), 5′-caattttgwgtgcgggtgcy-3′; Vayak (reverse), 5′-gaactgcagcaagccaccgg-3′. These primers were designed using the sequences obtained in the genome search and they anneal, respectively, at positions 200–219, 1169–1188 and 50–70 using hobo va as a reference sequence and Vayak anneal at nucleotide positions 1540–1560 using the sequence 6yak VA as a reference. This last sequence corresponds to the one obtained in the genomic search in the D. yakuba genome. Both reference sequences can be obtained in the supplementary material, in an alignment file (hobova_alignament.aln). The obtained amplicons correspond to a single band, with roughly 1 kb, while short elements have been observed in the cloned sequenced (see below). These primer sets, in different combinations, anneal to all the sequences retrieved from the genome search. PCR reactions were performed in 25 μl volumes using approximately 20 ng of template DNA, 20 pmol of each primer, 1·5 mM MgCl2, 50 μM of each nucleotide and 1 unit of Taq DNA Polymerase (Invitrogen). After an initial denaturation step of 4 min at 95°C, 35 cycles consisting of 40 s denaturation at 95°C, 40 s annealing at 55°C and 1 min extension at 72°C were carried out. An additional 5 min extension step at 72°C was performed after the last cycle. The PCR products were cloned into pCR-TOPO plasmid (Invitrogen). DNA sequencing was performed directly from the purified plasmids in a MegaBACE 500 automatic sequencer. The dideoxy chain-termination reaction was implemented using the DYEnamic ET kit (GE Healthcare). The sequences were then submitted to a ‘confidence consensus’ analysis using the Staden Package Gap 4 program (Staden, Reference Staden1996). D. santomea sequences have been deposited in GenBank under the accession numbers DQ840031–DQ840035 and DQ823386, and D. mauritiana sequences under accession numbers DQ840036–DQ840038.
2.4. Southern blot analyses
Genomic DNA was obtained as described by Sassi et al. (Reference Sassi, Herédia, Loreto, Valente and Rohde2005). Approximately 7 μg of DNA samples were digested with EcoRI (Invitrogen), separated by electrophoresis on 1% agarose gels and transferred to nylon membranes (HybondN+, Amersham Biosciences). The membranes were hybridized with probes corresponding to PCR fragments of D. simulans hobo va or D. santomea hobo vahs, amplified from plasmids used in the sequencing analyses described below. The divergence between these sequences is 24%. To label and detect nucleic acids, an AlkPhos Direct Labeling and Detection System (Amersham Bioscience) kit was used according to the kit protocol.
2.5. Sequence analyses
The following software was used in the sequence analyses: GENEDOC version 2.6.001 (Nicholas & Nicholas, Reference Nicholas and Nicholas1997) for sequence editing and visualization; Einverted from the EMBOSS suite (http://emboss.sourceforge.net/) for TIR identification; Clustal W (Thompson et al., Reference Thompson, Higgins and Gibson1994) for sequence alignment; and MEGA version 3.1 (Kumar et al., Reference Kumar, Tamura, Jakobsen and Nei2001) for phylogenetic analysis. In the Maximum Parsimony analysis, the best tree was searched using close-neighbour interchange, with parameter values and random addition of sequences (ten replications) to produce the initial trees. In the Neighbour-Joining (NJ) method, the Kimura two-parameter model of nucleotide substitution (Kimura, Reference Kimura1980) was used to construct the distance matrices. In both analyses, bootstrap tests with 1000 replications were performed to assess the support value for each internal branch of the trees. The phylogenetic analysis was carried out with the junction of 1–200 nucleotides of the 5′ subterminal region and 1152–1220 nucleotides of the 3′ subterminal region (using the hobo va sequence as a reference) because these are the more conserved regions, producing a more consistent alignment. The total length of the alignment corresponds to 290 bp and the gaps were included in the analysis.
3. Results
3.1. Search for homologous hobova by PCR and Southern blot
Analyses by PCR have shown sequences homologous to hobo va, as described by Torres et al. (Reference Torres, Fonte, Valente and Loreto2006), only in species of the melanogaster subgroup. As can be seen in Table 1, amplicons of hobo vahs were obtained from D. sechellia, D. mauritiana, D. simulans, D. melanogaster, D. santomea and D. teissieri, which belong to the melanogaster subgroup, but no amplification was obtained from species of other subgroups of the D. melanogaster species group (D. ananassae, D. malerkotliana and D. kikkawai) (Clark et al., Reference Clark, Eisen, Smith, Bergman, Oliver, Markow, Kaufman, Kellis and Gelbart2007). Southern blot analyses confirmed the PCR results. As can be seen in Fig. 1 A, in which hobo va of D. simulans was used as a probe, numerous hybridization bands were observed in D. sechellia, D. mauritiana, D. simulans and D. melanogaster. A weak signal was seen in D. teissieri and no hybridization signal was observed outside the melanogaster subgroup such as D. ananassae, D. malerkotliana and D. kikkawai. When hobo vahs of D. santomea was used as a probe (Fig. 1 B), hybridization signals were seen in D. teissieri and D. santomea, while faint bands occurred in D. melanogaster and D. simulans.
aD. mauritiana, bD. simulans, cD. sechellia, dD. melanogaster, eD. santomea, fD. teissieri, gD. ananassae, hD. malerkotliana, iD. kikkawai.
Together, the PCR and Southern blot analyses show that the hobo vahs are restricted to the melanogaster subgroup.
3.2. Cloning and sequencing of hobovahs
We have cloned and sequenced some elements for those species that have hobo vahs but the genome sequences are not available. Three sequenced clones of D. mauritiana hobo vahs were around 1·1 kb long and exhibited 90% general similarity to hobo va of D. simulans. One clone showed a short hobo vahs sequence with 251 bp.
The sequenced D. santomea clones, eight in total, deserve special attention due to their very short length (391 bp) and because they are almost identical in sequence. In the 5′ subterminal region of these elements, a 180 bp region exhibited 70% similarity to the D. simulans hobo vahs, and in the 3′ end, the last 70 bp had 82% similarity. As in hobo va, the middle region is AT-rich.
4. Genomic search
A search for homologous sequences in the 12 available Drosophila genomes, which represent diverse Drosophila groups, demonstrated the presence of hobo vahs only in the melanogaster group (D. melanogaster, D. simulans, D. yakuba, D. sechellia and D. erecta).
The copy number of hobo vahs varied highly among species. As shown in Table 2, 12 copies were found in the D. melanogaster genome. These copies were PMS, but only five (42%) showed the TSDs. The hobo vahs copies described here do not correspond to those hobo elements previously annotated in the D. melanogaster genome (Kaminker et al., Reference Kaminker, Bergman, Kronmiller, Carlson, Svirskas, Patel, Frise, Wheeler, Lewis, Rubin, Ashburner and Celniker2002; Quesneville et al., Reference Quesneville, Bergman, Andrieu, Autard, Nouaud, Ashburner and Anxolabehere2005). In D. simulans, a significantly higher copy number was found (147 copies), of which 55 copies were incomplete and 92 were PMS. There were 72 (78%) PMS copies in which we were able to find TSDs. In D. yakuba, 70 copies were found, of which 28 were incomplete sequences, along with 42 PMS. Among the PMS detected for D. yakuba, TSDs were observed in 37 (88%). In D. sechellia, 60 copies were found, with 53 being PMS and of which 73% possessed TSDs. In D. erecta, only one copy was found, and it was a PMS with TSD. For the genomes to which the chromosome assemblies are currently available, we were able to analyse the distribution of hobo vahs copies in the chromosomes. As can be seen in Table 2, no preferential insertions were observed in the chromosome arms of D. melanogaster, D. simulans or D. yakuba.
PMS=copy number of putatively mobilizable sequences.
(TSD)=copy number of hobo vahs that showed target sequence duplication.
2L, 2R, 3L, 3R and X=chromosome arms; U and Random=chromosome position not identified.
a In the current genome assembly for these species, the chromosome assemblies are not available.
The presence of 8 bp direct duplications of the insertion site (TSDs) typically characterizes hobo mobilization (McGinnis et al., Reference McGinnis, Shermoen and Beckendorf1983). The identification of TSDs in a significant number of copies (42–88%) – together with high similarity between some copies – is suggestive of recent mobilization.
We have analysed the integration specificity of hobo vahs elements through nucleotide frequency estimation in the TSDs. The TSDs observed in the different species are very similar. Nucleotides in positions 2 and 7 were the most information-rich. Thymidine was the most common nucleotide in position 2 and adenine the most abundant nucleotide in the seventh position. The consensus sequences observed were: D. simulans (GTNCGNAC), D. sechellia (GTNCNNAC), D. yakuba (GTNCNNAT) and D. melanogaster (GTNCNNAC) (Table 4 in the supplementary material).
4.1. Phylogenetic analysis
For phylogenetic analysis, we used the PMS obtained in the genome search (200 sequences). Also, we used three partial sequences from D. mauritiana and eight from D. santomea (sequenced in this work).
The phylogenetic analysis showed the presence of two hobo vahs clusters. As seen in Fig. 2, the cluster called ‘A’, which was statistically well supported, was formed only by sequences from D. simulans and D. sechellia and by two D. melanogaster sequences found in a polytomy. The divergence observed between the subclusters formed by D. simulans and D. sechellia sequences ranged from 0·0 to 18·7% (3·7% on average). When the D. melanogaster sequences were included, the divergences varied from 0·0 to 31·0% (4·0% on average). As can also be seen in Fig. 2, several D. simulans and D. sechellia sequences exhibited the presence of XhoI restriction sites in one or both extremities. Since the length of these sequences was normally 1·1 kb, the distance between the XhoI sites was around 0·7 kb, and these sequences correspond to ‘deleted hobo sequences’ described in the Southern blot analysis as defective canonical hobo, according to Boussy & Daniels (Reference Boussy and Daniels1991), Periquet et al. (Reference Periquet, Lemeunier, Bigot, Hamelin, Bazin, Ladevèze, Eeken, Galindo, Pascual and Boussy1994) and Loreto et al. (Reference Loreto, Silva, Zaha and Valente1998). Cluster B showed a higher internal divergence, varying from 0·0 to 31·6% (16·6% on average). This cluster is represented mainly by sequences from D. yakuba, D. santomea and D. erecta. However, sequences from D. mauritiana and D. melanogaster are also present. The overall divergence observed in the hobo vahs sequences from clusters A and B varied from 0·0 to 37·4% with an average of 14·7%.
5. Discussion
5.1. hobovahs are disseminated in the D. melanogaster subgroup
hRSs or hobo relics were thought to be vestigial and inactive sequences of previous genome invasions by hobo elements in the Drosophila genome (Lim, Reference Lim1988; Daniels et al., Reference Daniels, Chovnick and Boussy1990). Nevertheless, Torres et al. (Reference Torres, Fonte, Valente and Loreto2006) have shown that one hRS, the hobo va, is mobilizable and probably has been kept transpositionally active for 0·4 million years (MY), which corresponds to the divergence time between D. simulans and D. sechellia. This assumption was suggested since a similar sequence was also observed in this last species. Our results reinforce this supposition since several different hobo vahs sequences (clusters A) are shared by D. simulans and D. sechellia, showing that these sequences are present in the ancestor of these species and, given their structural characteristics, are maintained active since then (Fig. 3).
The presence of sequences with the same hobo va characteristics in every species of the melanogaster subgroup could be explained in two different ways: (i) hobo vahs elements arose in the melanogaster subgroup ancestor, around 13–15 MYA; it was vertically transmitted and was kept mobilizable since then; (ii) it could be supposed that different hobo vahs elements have originated independently, in different species, starting from diverse hobo elements. In this case, it would be interesting to understand why the same structural characteristics have arisen independently, in different times, in these elements. These possibilities are not mutually exclusive.
The fact that a significant portion of hobo vahs described in this work shows high nucleotide similarity, alongside with the observation that part of them preserves intact TIRs and conserved TSDs, constitutes suggestive evidence that these sequences were kept mobilizable. Currently we are not able to discriminate the evolutionary time in which these sequences are maintained mobilizable. One possibility is 13–15 MY, if the element arose in the melanogaster group ancestor. However, the presence of very similar sequences in D. simulans and D. sechellia strongly suggests that these non-autonomous sequences were kept mobilizable at least for 0·4 MY.
The continued presence, over a prolonged evolutionary time, of mobilizable non-autonomous elements ‘parasitizing’ their TE master copies has rarely been reported and is intriguing. Analyses of the D. melanogaster genome have shown remarkable sequence homogeneity among copies of TEs (Bowen & McDonald, Reference Bowen and McDonald2001; Kaminker et al., Reference Kaminker, Bergman, Kronmiller, Carlson, Svirskas, Patel, Frise, Wheeler, Lewis, Rubin, Ashburner and Celniker2002; Lerat et al., Reference Lerat, Rizzon and Biémont2003; Sanchez-Gracia et al., Reference Sanchez-Gracia, Maside and Charlesworth2005). Lerat et al. (Reference Lerat, Rizzon and Biémont2003) have proposed that this minute divergence may have resulted from a rapid turnover that eliminated TE copies as soon as they became inactive. The high similarity observed among the hobo vahs copies – reinforced by scattered chromosome distribution over all chromosomes arms – is in concordance with the Lerat et al. (Reference Lerat, Rizzon and Biémont2003) hypothesis of high TE turnover. From this perspective, the hobo vahs relic sequences could be kept in the genomes of these Drosophila species exactly because they are kept mobilizable, avoiding losses in the turnover process.
It is notable that the number of PMS found in the analysed genomes is higher than the not mobilizable ones. As our analyses were performed in final versions of the genome assemblies released, probably the hobo sequences described here reflect very well the hRSs present in the euchromatic regions of these genomes. However, it is possible that degenerated copies of hRSs and PNM hobo vahs copies can be more abundant in the heterochromatic regions that are under-represented in the available versions of genome assemblies (Clark et al., Reference Clark, Eisen, Smith, Bergman, Oliver, Markow, Kaufman, Kellis and Gelbart2007).
In order to be kept mobilizable for such a long time, a non-autonomous element necessarily requires a transposase source. As for the transposase source for hobo vahs, the canonical hobo is the most likely supplier. The consensus sequences of TSDs observed for hobo vahs in different species correspond to what has been described for the D. melanogaster hobo element (Saville et al., Reference Saville, Warren, Atkinson and O'Brochta1999). However, similar consensus sequences were also observed for other elements of the hAT superfamily (Guimond et al., Reference Guimond, Bideshi, Pinkerton, Atkinson and O'Brochta2003). Furthermore, the hobo element has been cross-mobilized by other transposases, such as the Hermes element (Sundararajan et al., Reference Sundararajan, Atkinson and O'Brochta1999), or else by unidentified transposases from different tephritid species (Handler & Gomez, Reference Handler and Gomez1996). Thus, even though other sources of transposases available to hobo vahs cannot be discarded at this moment, we suggest that the most probable source is indeed the hobo element. Still, the canonical hobo is thought to be a recent acquisition by D. melanogaster and D. simulans genomes through horizontal transfer (Daniels et al., Reference Daniels, Chovnick and Boussy1990; Periquet et al., Reference Periquet, Hamelin, Kalmes and Eeken1990, Reference Periquet, Lemeunier, Bigot, Hamelin, Bazin, Ladevèze, Eeken, Galindo, Pascual and Boussy1994; Simmons, Reference Simmons1992) and, for this reason, the canonical hobo could not be the transposase source available throughout the whole evolution of hobo vahs.
5.2. hobovahs and a hobo-related MITE origin
Some hobo vahs sequences showed a remarkably short length, for example, 83 bp in D. melanogaster, 324 bp in D. yakuba, 193 bp in D. sechellia, 391 bp in D. santomea and 251 bp in D. mauritiana, while the shorter sequences observed in D. simulans and D. erecta were about 800 bp. The short sequences exhibit characteristics that are typical of MITEs. The distinctive marks of this TE group are: (i) the short length, typically ranging from 80 to 500 bp in size (but they sometimes reach lengths of up to 1·6 kb); (ii) the presence of TIRs; (iii) high copy number; and (iv) an internal AT-rich region (Feschotte et al., Reference Feschotte, Zhang, Wessler, Craig, Craigie, Gellert and Lambowitz2002). While it is outstanding that some MITEs can be found in an extraordinarily high copy number (30 000–40 000 copies), this is not an invariable characteristic and many MITEs occur at a lower number (as low as 20 copies) (Feschotte et al., Reference Feschotte, Zhang, Wessler, Craig, Craigie, Gellert and Lambowitz2002).
The origin of MITEs is not fully understood. Solo TIRs, which by recombination became close to each other, could be the origin of some MITEs. However, Feschotte et al. (Reference Feschotte, Zhang, Wessler, Craig, Craigie, Gellert and Lambowitz2002) have proposed a model in which (i) autonomous transposons suffered internal deletions and became non-autonomous, and (ii) some copies of non-autonomous transposons underwent a ‘shrink’ and a rapid amplification of copy number. Some studies have been carried out and support this model. Jiang et al. (Reference Jiang, Feschotte, Zhang and Wessler2004) illustrated it with TEs of the rice genome, showing different cases in which the origin of some MITEs is related to their ‘cousin’ autonomous elements. For example, the MITE mPing is 430 bp long with subterminal sequences (252 bp at the 5′ end and 178 bp at the 3′ end) and with TIRs identical to the autonomous transposon Ping. Also, Saito et al. (Reference Saito, Yonemaru, Ishikawa and Nakamura2005) have shown that the wheat MITE Hikkoshi exhibits subterminal regions and identified TIRs of Hikkoshi-like transposons in rice.
Quesneville et al. (Reference Quesneville, Nouaud and Anxolabéhère2006) described the origin of MITEs related to P elements (PMITE). Ten different PMITE families were found in the Anopheles gambiae genome. These MITEs present conserved ~100 bp fragments in the 5′ and 3′ subterminal regions that permit identification of the P element family that gave rise to each MITE family. A. gambiae has nine different P families and six of them have given rise to MITEs. As in PMITEs described by Quesneville et al. (Reference Quesneville, Nouaud and Anxolabéhère2006), the shorter hobo vahs described here maintain 5′ and 3′ subterminal regions conserved in relation to the hobo element.
By examining the hobo vahs sequences, representative candidates for each phase of MITE origin, according to the model proposed by Feschotte et al. (Reference Feschotte, Zhang, Wessler, Craig, Craigie, Gellert and Lambowitz2002), can be identified. As shown in Fig. 4, examples of each MITE origin phase can be found in D. melanogaster, D. sechellia, D. simulans and D. yakuba and are depicted in a schematic form. In the process of MITE origin suggested here, the starting point could be complete and autonomous elements, like the canonical hobo or elements hobo-like of previous genomic invasions. In the next step, some of the autonomous elements are converted into non-autonomous elements, which maintain the conserved 5′ and 3′ subterminal regions but undergo divergence in the inner region, which becomes AT-rich. These relic elements showed a variation in length from 1·5 to 0·7 kb (the typical hobo vahs described here). Finally, in D. melanogaster, D. sechellia and D. yakuba, there are very short elements (700–83 bp) showing conserved extremities and TIRs with a typical MITE structure. For these reasons, we propose that the short hobo vahs could be classified as MITEs and that they offer a well-documented example of the origin of a ‘hobo-related’ MITE.
We thank the Drosophila genome projects for making sequences freely available to the scientific community; two anonymous referees for constructive comments; Dr Jean David; Daniel Lachaise (CNRS; France) and Marco Gottschakk (UFRGS, Brazil) for the Drosophila strains; Lenira Sepel and Nina Roth Mota for help in preparing the manuscript. This work was supported by grants from CNPq, FAPERGS (number 0411965) and Probic-Fapergs.