Chimera (molecular biology)
In molecular biology, and more importantly high-throughput DNA sequencing, a chimera is a single DNA sequence originating when multiple transcripts or DNA sequences get joined. It can occur in various contexts. Chimeric reads are generally considered artifacts in sequencing applications (such as amplicon sequencing) and are filtered out from the data during processing [1] to prevent spurious inferences of biological variation.[2] In a different context, the deliberate creation of artificial chimeras can also be a useful tool in the molecular biology. For example, in protein engineering, "chimeragenesis (forming chimeras between proteins that are encoded by homologous cDNAs)"[3] is one of the "two major techniques used to manipulate cDNA sequences".[3] For gene fusions that occur through natural processes, see chimeric genes and fusion genes.
Description
Transcript chimera
A chimera can occur as a single cDNA sequence originating from two transcripts. It is usually considered to be a contaminant in transcript and expressed sequence tag (which results in the moniker of EST chimera) databases.[4] It is estimated that approximately 1% of all transcripts in the National Center for Biotechnology Information's Unigene database contain a "chimeric sequence".[5]
PCR chimera
A chimera can also be an artifact of PCR amplification. It occurs when the extension of an amplicon is aborted, and the aborted product functions as a primer in the next PCR cycle. The aborted product anneals to the wrong template and continues to extend, thereby synthesizing a single sequence sourced from two different templates.[6]
PCR chimeras are an important issue to take into account during metabarcoding, where DNA sequences from environmental samples are used to determine biodiversity. A chimera is a novel sequence that will most probably not match to any known organism. Hence, it might be interpreted as a new species thereby overinflating the diversity.
Chimeric read
A chimeric read is a digital DNA sequence (i.e. a string of letters in a file that can be read as a DNA sequence) that originates from an actual chimera (i.e. a physical DNA sequence in a sample) or produced due to misreading the sample. The latter is known to occur with sequencing of electrophoresis gels.[7] Chimeric reads are common with amplicon sequencing applications such as 16S rRNA gene sequencing, since closely related sequences are amplified. The most common mechanism is that incomplete extension during the PCR results in partial sequence strands that can act as primers in subsequent PCR cycles on similar but non identical sequences. Extension of such hybrid priming events causes the formation of chimeric sequences.[1]
Some computational methods have been devised to detect and remove chimeras, like:
Examples
- "The first mRNA transcript isolated for..." the human gene C2orf3 "...was part of an artificial chimera..."
- CYP2C17 was thought to be a human gene, but "...is now considered an artefact based on a chimera of CYP2C18 and CYP2C19."[15]
- Researchers have created receptor chimeras in their studies of Oncostatin M.
References
- "Chimeras". www.drive5.com. Retrieved 2022-10-27.
- Edgar, Robert C. (2016-09-12). "UCHIME2: improved chimera prediction for amplicon sequencing": 074252. doi:10.1101/074252v1.full.
{{cite journal}}
: Cite journal requires|journal=
(help) - Lajtha A, Reith ME (2007). Handbook of Neurochemistry and Molecular Neurobiology Neural Membranes and Transport. Boston, MA: Springer Science+Business Media, LLC. p. 485. ISBN 978-0-387-30347-5. p. 424
- Unneberg P, Claverie JM (February 2007). Hoheisel J (ed.). "Tentative mapping of transcription-induced interchromosomal interaction using chimeric EST and mRNA data". PLOS ONE. 2 (2): e254. Bibcode:2007PLoSO...2..254U. doi:10.1371/journal.pone.0000254. PMC 1804257. PMID 17330142.
- Nelson C. "EST Assembly for the Creation of Oligonucleotide Probe Targets" (PDF). Agilent Technologies. Archived from the original (PDF) on 23 February 2012. Retrieved May 12, 2009.
- Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, et al. (March 2011). "Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons". Genome Research. 21 (3): 494–504. doi:10.1101/gr.112730.110. PMC 3044863. PMID 21212162.
- Porter S (1 February 2007). "Sequencing a Genome, part VI: Chimeras are not just funny-looking animals". ScienceBlogs. Retrieved 2019-01-10.
- Maidak BL, Olsen GJ, Larsen N, Overbeek R, McCaughey MJ, Woese CR (January 1996). "The Ribosomal Database Project (RDP)". Nucleic Acids Research. 24 (1): 82–85. doi:10.1093/nar/24.1.82. PMC 145599. PMID 8594608.
- "Chimera checking sequences with QIIME". Quantitative Insights Into Microbial Ecology (QIIME). Retrieved 2019-01-10.
- Edgar R. "UCHIME algorithm". drive5.com. Retrieved 2019-01-10.
- "removeBimeraDenovo function". R Documentation. www.rdocumentation.org. Retrieved 2019-01-10.
- Huber T, Faulkner G, Hugenholtz P (September 2004). "Bellerophon: a program to detect chimeric sequences in multiple sequence alignments". Bioinformatics. 20 (14): 2317–2319. doi:10.1093/bioinformatics/bth226. PMID 15073015.
- Mysara M, Saeys Y, Leys N, Raes J, Monsieurs P (March 2015). Wommack KE (ed.). "CATCh, an ensemble classifier for chimera detection in 16S rRNA sequencing studies". Applied and Environmental Microbiology. 81 (5): 1573–1584. Bibcode:2015ApEnM..81.1573M. doi:10.1128/AEM.02896-14. PMC 4325141. PMID 25527546.
- Wright ES, Yilmaz LS, Noguera DR (February 2012). "DECIPHER, a search-based approach to chimera identification for 16S rRNA sequences". Applied and Environmental Microbiology. 78 (3): 717–725. Bibcode:2012ApEnM..78..717W. doi:10.1128/AEM.06516-11. PMC 3264099. PMID 22101057.
- "Entrez Gene: CYP2C18 cytochrome P450, family 2, subfamily C, polypeptide 18". National Center for Biotechnology Information. Retrieved May 12, 2009.