References

Veli Mäkinen; Djamal Belazzougui; Fabio Cunial; Alexandru I. Tomescu

doi:10.1017/9781009341257.024

References

Published online by Cambridge University Press: 28 September 2023

Fabio Cunial and

Veli Mäkinen: Affiliation:
University of Helsinki
Djamal Belazzougui: Affiliation:
Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers
Fabio Cunial: Affiliation:
Broad Institute, Massachusetts
Alexandru I. Tomescu: Affiliation:
University of Helsinki

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Type: Chapter
Information: Genome-Scale Algorithm Design
Bioinformatics in the Era of High-Throughput Sequencing
, pp. 414 - 438

DOI: https://doi.org/10.1017/9781009341257.024 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2023

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abouelhoda, Mohamed Ibrahim. A chaining algorithm for mapping cDNA sequences to multiple genomic sequences. In 14th International Symposium on String Processing and Information Retrieval, volume 4726 of Lecture Notes in Computer Science, pages 1–13. Springer, 2007.Google Scholar

Adelson Velskii, Georgy and Landis, Evgenii M.. An algorithm for the organization of information. In Proceedings of the USSR Academy of Sciences, volume 146, pages 263–266, 1962. (Russian) English translation by Myron J. Ricci in Soviet Mathematics – Doklady, 3:1259–1263, 1962.Google Scholar

Ahmed, Omar, Rossi, Massimiliano, Kovaka, Sam, et al. Pan-genomic matching statistics for targeted nanopore sequencing. iScience, 24(6):102696, 2021.CrossRef Google Scholar PubMed

Aho, Alfred V. and Corasick, Margaret J.. Efficient string matching: An aid to bibliographic search. Communications of the ACM, 18(6):333–340, 1975.CrossRef Google Scholar

Ahuja, Ravindra K., Goldberg, Andrew V., Orlin, James B., and Tarjan, Robert E.. Finding minimum-cost flows by double scaling. Mathematical Programming, 53:243–266, 1992.CrossRef Google Scholar

Ahuja, Ravindra K., Magnanti, Thomas L., and Orlin, James B.. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993.Google Scholar

Alanko, Jarno, Cunial, Fabio, Belazzougui, Djamal, and Mäkinen, Veli. A framework for space-efficient read clustering in metagenomic samples. BMC Bioinformatics, 18(3): 49–60, 2017.CrossRef Google Scholar PubMed

Alanko, Jarno, D’Agostino, Giovanna, Policriti, Alberto, and Prezza, Nicola. Wheeler languages. Information and Computation, 281:104820, 2021.CrossRef Google Scholar

Alipanahi, Bahar, Kuhnle, Alan, and Boucher, Christina. Recoloring the colored de Bruijn graph. In International Symposium on String Processing and Information Retrieval, pages 1–11. Springer, 2018.Google Scholar

Almodaresi, Fatemeh, Pandey, Prashant, Ferdman, Michael, Johnson, Rob, and Patro, Rob. An efficient, scalable, and exact representation of high-dimensional color information enabled using de Bruijn graph search. Journal of Computational Biology, 27(4):485–499, 2020.CrossRef Google Scholar

Alneberg, Johannes, Bjarnason, Brynjar Smári, de Bruijn, Ino, et al. Binning metagenomic contigs by coverage and composition. Nature Methods, 11(11):1144–1146, 2014.CrossRef Google Scholar PubMed

Alstrup, Stephen, Husfeldt, Thore, and Rauhe, Theis. Marked ancestor problems. In Foundations of Computer Science, 1998. Proceedings. 39th Annual Symposium on, pages 534–543. IEEE, 1998.Google Scholar

Apostolico, Alberto. Maximal words in sequence comparisons based on subword composition. In Algorithms and Applications, pages 34–44. Springer, 2010.CrossRef Google Scholar

Apostolico, Alberto and Bejerano, Gill. Optimal amnesic probabilistic automata or how to learn and classify proteins in linear time and space. Journal of Computational Biology, 7(3–4): 381–393, 2000.CrossRef Google Scholar PubMed

Apostolico, Alberto and Denas, Olgert. Fast algorithms for computing sequence distances by exhaustive substring composition. Algorithms for Molecular Biology, 3:13, 2008.CrossRef Google Scholar PubMed

Arge, Lars, Fischer, Johannes, Sanders, Peter, and Sitchinava, Nodari. On (dynamic) range minimum queries in external memory. In Algorithms and Data Structures, pages 37–48. Springer, 2013.CrossRef Google Scholar

Arlazarov, Vladimir, Dinic, E., Kronrod, M., and Faradzev, Igor. On economic construction of the transitive closure of a directed graph. Soviet Mathematics Doklady, 11:1209–1210, 1975.Google Scholar

Baaijens, Jasmijn A., Bonizzoni, Paola, Boucher, Christina, et al. Computational graph pangenomics: A tutorial on data structures and their applications. Natural Computing: An International Journal, 21(1):81108, 2022.CrossRef Google Scholar PubMed

Babenko, Maxim A., Gawrychowski, Pawel, Kociumaka, Tomasz, and Starikovskaya, Tatiana. Wavelet trees meet suffix trees. In Indyk, Piotr, editor, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4–6, 2015, pages 572–591. SIAM, 2015.Google Scholar

Backurs, Arturs and Indyk, Piotr. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM Journal on Computing, 47(3):1087–1097, 2018.CrossRef Google Scholar

Baeza-Yates, Ricardo A. and Ribeiro-Neto, Berthier A.. Modern Information Retrieval – The Concepts and Technology behind Search, 2nd edition. Pearson Education Ltd., 2011.Google Scholar

Baier, Uwe, Beller, Timo, and Ohlebusch, Enno. Graphical pan-genome analysis with compressed suffix trees and the Burrows–Wheeler transform. Bioinformatics, 32(4):497–504, 2016.CrossRef Google Scholar PubMed

Baker, Brenda S. On finding duplication in strings and software. Technical report, AT&T Bell Laboratories, New Jersey, 1993.Google Scholar

Bang-Jensen, Jørgen and Gutin, Gregory Z.. Digraphs: Theory, Algorithms and Applications, 2nd edition. Springer Monographs in Mathematics. Springer, 2008.Google Scholar

Bankevich, Anton, Bzikadze, Andrey V., Kolmogorov, Mikhail, Antipov, Dmitry, and Pevzner, Pavel A.. Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads. Nature Biotechnology, 40(7):1075–1081, 2022.CrossRef Google Scholar

Bao, Ergude, Jiang, Tao, and Girke, Thomas. BRANCH: Boosting RNA-Seq assemblies with partial or related genomic sequences. Bioinformatics, 29(10):1250–1259, 2013.CrossRef Google Scholar PubMed

Baran, Yael and Halperin, Eran. Joint analysis of multiple metagenomic samples. PLoS Computational Biology, 8(2):e1002373, 2012.CrossRef Google Scholar PubMed

Baum, Leonard E. An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. In Inequalities III: Proceedings of the Third Symposium on Inequalities, pages 1–8. Academic Press, 1972.Google Scholar

Bayer, Rudolf. Symmetric binary b-trees: Data structure and maintenance algorithms. Acta Informatica, 1:290–306, 1972.CrossRef Google Scholar

Beerenwinkel, Niko, Beretta, Stefano, Bonizzoni, Paola, Dondi, Riccardo, and Pirola, Yuri. Covering pairs in directed acyclic graphs. In Language and Automata Theory and Applications, volume 8370 of Lecture Notes in Computer Science, pages 126–137, Springer, 2014.Google Scholar

Behnam, Ehsan, Waterman, Michael S., and Smith, Andrew D.. A geometric interpretation for local alignment-free sequence comparison. Journal of Computational Biology, 20(7):471–485, 2013.CrossRef Google Scholar PubMed

Bejerano, Gill, Pheasant, Michael, Makunin, Igor, et al. Ultraconserved elements in the human genome. Science, 304(5675):1321–1325, 2004.CrossRef Google Scholar PubMed

Belazzougui, Djamal. Linear time construction of compressed text indices in compact space. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31–June 03, 2014, pages 148–193, 2014.Google Scholar

Belazzougui, Djamal and Cunial, Fabio. A framework for space-efficient string kernels. Algorithmica, 79(3):857–883, 2017.CrossRef Google Scholar

Belazzougui, Djamal and Puglisi, Simon J.. Range predecessor and Lempel–Ziv parsing. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2053–2071. SIAM, 2016.CrossRef Google Scholar

Belazzougui, Djamal and Venturini, Rossano. Compressed static functions with applications. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 229–240. SIAM, 2013.Google Scholar

Belazzougui, Djamal and Zhang, Qin. Edit distance: Sketching, streaming, and document exchange. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 51–60, 2016.CrossRef Google Scholar

Belazzougui, Djamal, Boldi, Paolo, Pagh, Rasmus, and Vigna, Sebastiano. Monotone minimal perfect hashing: Searching a sorted table with O(1) accesses. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 785–794. SIAM, 2009.Google Scholar

Belazzougui, Djamal, Cunial, Fabio, Kärkkäinen, Juha, and Mäkinen, Veli. Versatile succinct representations of the bidirectional Burrows–Wheeler transform. In 21st Annual European Symposium on Algorithms (ESA 2013), volume 8125 of Lecture Notes in Computer Science, pages 133–144. Springer, 2013.Google Scholar

Belazzougui, Djamal, Cunial, Fabio, and Denas, Olgert. Fast matching statistics in small space. In Proceedings of the 17th International Symposium on Experimental Algorithms (SEA 2018), LIPIcs 103, Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 17:1–17:14, 2018.Google Scholar

Belazzougui, Djamal, Cunial, Fabio, Kärkkäinen, Juha, and Mäkinen, Veli. Linear-time string indexing and analysis in small space. ACM Transactions on Algorithms, 16(2):17:1–17:54, 2020.CrossRef Google Scholar

Beller, Timo and Ohlebusch, Enno. A representation of a compressed de Bruijn graph for pan-genome analysis that enables search. Algorithms for Molecular Biology, 11(1):1–17, 2016.Google Scholar

Beller, Timo, Berger, Katharina, and Ohlebusch, Enno. Space-efficient computation of maximal and supermaximal repeats in genome sequences. In 19th International Symposium on String Processing and Information Retrieval (SPIRE 2012), volume 7608 of Lecture Notes in Computer Science, pages 99–110. Springer, 2012.Google Scholar

Beller, Timo, Gog, Simon, Ohlebusch, Enno, and Schnattinger, Thomas. Computing the longest common prefix array based on the Burrows–Wheeler transform. Journal of Discrete Algorithms, 18:22–31, 2013.CrossRef Google Scholar

Bellman, Richard. On a routing problem. Quarterly of Applied Mathematics 16:87–90, 1958.CrossRef Google Scholar

Bender, Michael A. and Farach-Colton, Martin. The LCA problem revisited. In 4th Latin American Symposium on Theoretical Informatics (LATIN 2000), volume 1776 of Lecture Notes in Computer Science, pages 88–94. Springer, 2000.Google Scholar

Bermond, Jean-Claude, Homobono, Nathalie, and Peyrat, Claudine. Connectivity of Kautz networks. Discrete Mathematics, 114(1–3):51–62, 1993.CrossRef Google Scholar

Bernard, Elsa, Jacob, Laurent, Mairal, Julien, and Vert, Jean-Philippe. Efficient RNA isoform identification and quantification from RNA-Seq data with network flows. Bioinformatics, 30(17):2447–2455, 2014.CrossRef Google Scholar PubMed

Bernstein, Aaron, Nanongkai, Danupon, and Wulff-Nilsen, Christian. Negative-weight single-source shortest paths in near-linear time. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31–November 3, 2022, pages 600–611. IEEE, 2022.Google Scholar

Blazewicz, Jacek and Kasprzak, Marta. Complexity of DNA sequencing by hybridization. Theoretical Computer Science, 290(3):1459–1473, 2003.CrossRef Google Scholar

Bloom, Burton H. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422426, 1970.CrossRef Google Scholar

Blumer, Anselm, Blumer, Janet, Haussler, David, Ehrenfeucht, Andrzej, Chen, Mu-Tian, and Seiferas, Joel. The smallest automaton recognizing the subwords of a text. Theoretical Computer Science, 40:31–55, 1985.CrossRef Google Scholar

Blumer, Anselm, Blumer, Janet, Haussler, David, McConnell, Ross, and Ehrenfeucht, Andrzej. Complete inverted files for efficient text retrieval and analysis. Journal of the ACM, 34(3): 578–595, 1987.CrossRef Google Scholar

Boucher, Christina, Gagie, Travis, Kuhnle, Alan, Langmead, Ben, Manzini, Giovanni, and Mun, Taher. Prefix-free parsing for building big BWTs. Algorithms for Molecular Biology, 14(1): 13:1–13:15, 2019.CrossRef Google Scholar PubMed

Bowe, Alexander, Onodera, Taku, Sadakane, Kunihiko, and Shibuya, Tetsuo. Succinct de Bruijn graphs. In Algorithms in Bioinformatics, pages 225–235. Springer, 2012.CrossRef Google Scholar

Brockhurst, Michael A., Harrison, Ellie, Hall, James P.J., Richards, Thomas, McNally, Alan, and MacLean, Craig. The ecology and evolution of pangenomes. Current Biology, 29(20): R1094–R1103, 2019.CrossRef Google Scholar PubMed

Brodal, Gerth Stølting, Davoodi, Pooya, and Rao, S. Srinivasa. Path minima queries in dynamic weighted trees. In Algorithms and Data Structures, pages 290–301. Springer, 2011.CrossRef Google Scholar

Broder, Andrei Z. On the resemblance and containment of documents. In Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171), pages 21–29, 1997.Google Scholar

Büchler, Thomas and Ohlebusch, Enno. An improved encoding of genetic variation in a Burrows–Wheeler transform. Bioinformatics, 36(5):1413–1419, 2020.CrossRef Google Scholar

Burrows, Michael and Wheeler, David. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994.Google Scholar

Cáceres, Manuel, Cairo, Massimo, Mumey, Brendan, Rizzi, Romeo, and Tomescu, Alexandru I.. Sparsifying, shrinking and splicing for minimum path cover in parameterized linear time. In Naor, Joseph (Seffi) and Buchbinder, Niv, editors, Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, Virtual Conference / Alexandria, VA, USA, January 9–12, 2022, pages 359–376. SIAM, 2022.Google Scholar

Cameron, Daniel L., Schröder, Jan, Penington, Jocelyn Sietsma, et al. GRIDSS: Sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. Genome Research, 27(12):2050–2060, 2017.CrossRef Google Scholar PubMed

Cancedda, Nicola, Gaussier, Eric, Goutte, Cyril, and Renders, Jean Michel. Word sequence kernels. The Journal of Machine Learning Research, 3:1059–1082, 2003.Google Scholar

Carter, J. Lawrence and Wegman, Mark N.. Universal classes of hash functions. In Proceedings of the Ninth Annual ACM Symposium on Theory of Computing, pages 106–112, 1977.CrossRef Google Scholar

Chairungsee, Supaporn and Crochemore, Maxime. Using minimal absent words to build phylogeny. Theoretical Computer Science, 450:109–116, 2012.CrossRef Google Scholar

Chaisson, Mark, Pevzner, Pavel A., and Tang, Haixu. Fragment assembly with short reads. Bioinformatics, 20(13):2067–2074, 2004.CrossRef Google Scholar PubMed

Chan, Ho-Leung, Lam, Tak-Wah, Sung, Wing-Kin, Tam, Siu-Lung, and Wong, Swee-Seong. A linear size index for approximate pattern matching. In Proceedings of the Annual Symposium on Combinatorial Pattern Matching, LNCS 4009, pages 49–59. Springer-Verlag, 2006.CrossRef Google Scholar

Chan, Raymond H., Chan, Tony H., Yeung, Hau Man, and Wang, Roger Wei. Composition vector method based on maximum entropy principle for sequence comparison. Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 9(1):79–87, 2012.CrossRef Google Scholar PubMed

Charikar, Moses, Lehman, Eric, Liu, Ding, et al. Approximating the smallest grammar: Kolmogorov complexity in natural models. In Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, pages 792–801. ACM, 2002.CrossRef Google Scholar

Chazelle, B. A functional approach to data structures and its use in multidimensional searching. SIAM Journal on Computing, 17(3):427–462, 1988.CrossRef Google Scholar

Chen, Li, Kyng, Rasmus, Liu, Yang P., Peng, Richard, Gutenberg, Maximilian Probst, and Sachdeva, Sushant. Maximum flow and minimum-cost flow in almost-linear time. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31–November 3, 2022, pages 612–623. IEEE, 2022.Google Scholar

Chen, Sai, Krusche, Peter, Dolzhenko, Egor, et al. Paragraph: A graph-based structural variant genotyper for short-read sequence data. Genome Biology, 20(1):1–13, 2019.CrossRef Google Scholar PubMed

Chikhi, Rayan and Rizk, Guillaume. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms for Molecular Biology, 8:22, 2013.CrossRef Google Scholar

Chikhi, Rayan, Limasset, Antoine, Jackman, Shaun, Simpson, Jared T., and Medvedev, Paul. On the representation of de Bruijn graphs. Journal of Computational Biology, 22(5):336–352, 2015.CrossRef Google Scholar PubMed

Chikhi, Rayan, Limasset, Antoine, and Medvedev, Paul. Compacting de Bruijn graphs from sequencing data quickly and in low memory. Bioinformatics, 32(12):i201–i208, 2016.CrossRef Google Scholar

Cilibrasi, Rudi and Vitányi, Paul M. B.. Clustering by compression. IEEE Transactions on Information Theory, 51(4):1523–1545, 2005.CrossRef Google Scholar

Cilibrasi, Rudi, Iersel, Leo, Kelk, Steven, and Tromp, John. On the complexity of several haplotyping problems. In Casadio, Rita and Myers, Gene, editors, Algorithms in Bioinformatics, volume 3692 of Lecture Notes in Computer Science, pages 128–139. Springer, 2005.Google Scholar

Clark, D. Compact PAT trees. PhD thesis, University of Waterloo, Canada, 1996.Google Scholar

Claude, Francisco and Navarro, Gonzalo. The wavelet matrix. In International Symposium on String Processing and Information Retrieval, pages 167–179. Springer, 2012.CrossRef Google Scholar

Claude, Francisco, Navarro, Gonzalo, and Ordónez, Alberto. The wavelet matrix: An efficient wavelet tree for large alphabets. Information Systems, 47:15–32, 2015.Google Scholar

Cole, Richard, Gottlieb, Lee-Ad, and Lewenstein, Moshe. Dictionary matching and indexing with errors. In Proceedings of the Symposium on Theory of Computing, pages 91–100, 2004.CrossRef Google Scholar

Coleman, J. Robert, Papamichail, Dimitris, Skiena, Steven, Futcher, Bruce, Wimmer, Eckard, and Mueller, Steffen. Virus attenuation by genome-scale changes in codon pair bias. Science, 320 (5884):1784–1787, 2008.CrossRef Google Scholar PubMed

The Computational Pan-Genomics Consortium. Computational pan-genomics: Status, promises and challenges. Briefings in Bioinformatics, 19(1):118–135, 2016.Google Scholar

Coombe, Lauren, Nikolić, Vladimir, Chu, Justin, Birol, Inanc, and Warren, René L.. ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs. Bioinformatics, 36(12):3885–3887, 2020.Google Scholar

Cormen, Thomas H., Leiserson, Charles E., Rivest, Ronald L., and Stein, Clifford. Introduction to Algorithms, 4th edition. MIT Press, 2022.Google Scholar

Cotumaccio, Nicola and Prezza, Nicola. On indexing and compressing finite automata. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2585–2599. SIAM, 2021.CrossRef Google Scholar

Cristianini, Nello and Shawe-Taylor, John. An Introduction to Support Vector Machines. Cambridge University Press, 2000.Google Scholar

Crochemore, Maxime and Vérin, Renaud. Direct construction of compact directed acyclic word graphs. In Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS v. 1264, pages 116–129, 1997a.CrossRef Google Scholar

Crochemore, Maxime and Vérin, Renaud. On compact directed acyclic word graphs. In Structures in Logic and Computer Science, pages 192–211. Springer, 1997b.CrossRef Google Scholar

Crochemore, Maxime and Rytter, Wojciech. Jewels of Stringology. World Scientific, 2002.CrossRef Google Scholar

Crochemore, Maxime, Mignosi, Filippo, and Restivo, Antonio. Automata and forbidden words. Information Processing Letters, 67(3):111–117, 1998.CrossRef Google Scholar

Crochemore, Maxime, Mignosi, Filippo, Restivo, Antonio, and Salemi, Sergio. Data compression using antidictionaries. Proceedings of the IEEE, 88(11):1756–1768, 2000.Google Scholar

Crochemore, Maxime, Landau, Gad, and Ziv-Ukelson, Michal. A sub-quadratic sequence alignment algorithm for unrestricted cost matrices. In Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms (SODA’2002), pages 679–688, 2002.Google Scholar

Cunial, Fabio, Alanko, Jarno, and Belazzougui, Djamal. A framework for space-efficient variable-order Markov models. Bioinformatics, 35(22): 4607–4616, 2019.CrossRef Google Scholar PubMed

Cunial, Fabio, Denas, Olgert, and Belazzougui, Djamal. Fast and compact matching statistics analytics. Bioinformatics, 38(7):1838–1845, 2022.CrossRef Google Scholar PubMed

Davoodi, Pooya. Data structures: Range queries and space efficiency. PhD thesis, Department of Computer Science, Aarhus University, 2011.Google Scholar

Dayarian, Adel, Michael, Todd P., and Sengupta, Anirvan M.. Sopra: Scaffolding algorithm for paired reads via statistical optimization. BMC Bioinformatics, 11:345, 2010.CrossRef Google Scholar PubMed

de Berg, Mark, van Kreveld, Marc, Overmars, Mark, and Schwarzkopf, Otfried. Computational Geometry – Algorithms and Applications. Springer-Verlag, 2000.CrossRef Google Scholar

Delcher, Arthur L., Kasif, Simon, Fleischmann, Robert D., Peterson, Jeremy, White, Owen, and Salzberg, Steven L.. Alignment of whole genomes. Nucleic Acids Research, 27(11):2369–2376, 1999.CrossRef Google Scholar PubMed

Delorme, Charles and Tillich, Jean-Pierre. The spectrum of de Bruijn and Kautz graphs. European Journal of Combinatorics, 19(3):307–319, 1998.CrossRef Google Scholar

Deorowicz, Sebastian and Grabowski, Szymon. Robust relative compression of genomes with random access. Bioinformatics, 27(21):2979–2986, 2011.CrossRef Google Scholar PubMed

Dias, Fernando H. C., Williams, Lucia, Mumey, Brendan, and Tomescu, Alexandru I.. Fast, flexible, and exact minimum flow decompositions via ILP. In Pe’er, Itsik, editor, Research in Computational Molecular Biology – 26th Annual International Conference, RECOMB 2022, San Diego, CA, USA, May 22–25, 2022, Proceedings, volume 13278 of Lecture Notes in Computer Science, pages 230–245. Springer, 2022.Google Scholar

Díaz-Domínguez, Diego. An index for sequencing reads based on the colored de Bruijn graph. In International Symposium on String Processing and Information Retrieval, pages 304–321. Springer, 2019.CrossRef Google Scholar

Dietz, Paul F. Optimal algorithms for list indexing and subset rank. In Algorithms and Data Structures (WADS 89), pages 39–46. Springer, 1989.CrossRef Google Scholar

Dilthey, Alexander, Cox, Charles, Iqbal, Zamin, Nelson, Matthew R., and McVean, Gil. Improved genome inference in the MHC using a population reference graph. Nature Genetics, 47(6): 682–688, 2015.CrossRef Google Scholar PubMed

Dilworth, R. P. A decomposition theorem for partially ordered sets. The Annals of Mathematics, 51(1), 1950.CrossRef Google Scholar

Do, Huy Hoang, Jansson, Jesper, Sadakane, Kunihiko, and Sung, Wing-Kin. Fast relative Lempel–Ziv self-index for similar sequences. In Joint International Conference on Frontiers in Algorithmics and Algorithmic Aspects in Information and Management (FAW-AAIM), volume 7285 of Lecture Notes in Computer Science, pages 291–302. Springer, 2012.Google Scholar

Donmez, Nilgun and Brudno, Michael. Scarpa: Scaffolding reads with practical algorithms. Bioinformatics, 29(4):428–434, 2013.Google Scholar

Drineas, Petros, Mahoney, Michael W., Muthukrishnan, S., and Sarlós, Tamás. Faster least squares approximation. Numerische Mathematik, 117(2):219–249, 2011.CrossRef Google Scholar

Durbin, Richard. Efficient haplotype matching and storage using the positional Burrows– Wheeler transform (PBWT). Bioinformatics, 30(9):1266–1272, 2014.CrossRef Google Scholar PubMed

Durbin, Richard, Eddy, Sean R., Krogh, Anders, and Mitchison, Graeme. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998.Google Scholar

Ebler, Jana, Ebert, Peter, Clarke, Wayne E., et al. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nature Genetics, 54(4):518–525, 2022.Google Scholar

Eddy, Sean R. Accelerated profile HMM searches. PLoS Computational Biology, 7(10), 2011.CrossRef Google Scholar PubMed

Edmonds, Jack and Karp, Richard M.. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the Association for Computing Machinery, 19:248–264, 1972.CrossRef Google Scholar

Edwards, Robert A., Olson, Robert, Disz, Terry, et al. Real time metagenomics: Using k-mers to annotate metagenomes. Bioinformatics, 28(24):3316–3317, 2012.CrossRef Google Scholar PubMed

Eggertsson, Hannes P., Kristmundsdottir, Snaedis, Beyter, Doruk. GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs. Nature Communications, 10(1):5402, 2019.CrossRef Google Scholar PubMed

Eizenga, Jordan M., Novak, Adam M., Sibbesen, Jonas A., et al. Pangenome graphs. Annual Review of Genomics and Human Genetics, 21(1):139–162, 2020.CrossRef Google Scholar PubMed

Ekim, Barış, Berger, Bonnie, and Chikhi, Rayan. Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer. Cell Systems, 12(10): 958–968, 2021.CrossRef Google Scholar

Elias, Peter. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2):194–203, 1975.CrossRef Google Scholar

Ellinghaus, David, Kurtz, Stefan, and Willhoeft, Ute. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics, 9:18, 02 2008.CrossRef Google Scholar

Eppstein, David, Galil, Zvi, Giancarlo, Raffaele, and Italiano, Giuseppe F.. Sparse dynamic programming i: Linear cost functions. Journal of the ACM, 39(3):519–545, 1992a.CrossRef Google Scholar

Eppstein, David, Galil, Zvi, Giancarlo, Raffaele, and Italiano, Giuseppe F.. Sparse dynamic programming ii: Convex and concave cost functions. Journal of the ACM, 39(3):546–567, 1992b.CrossRef Google Scholar

Equi, Massimo, Norri, Tuukka, Alanko, Jarno, Cazaux, Bastien, Tomescu, Alexandru I., and Mäkinen, Veli. Algorithms and complexity on indexing founder graphs. Algorithmica, 2022. doi: 10.1007/s00453-022-01007-w.CrossRef Google Scholar

Farach, Martin. Optimal suffix tree construction with large alphabets. In Proceedings of the 38th IEEE Symposium on Foundations of Computer Science (FOCS), pages 137–143, 1997.Google Scholar

Farach, Martin, Noordewier, Michiel, Savari, Serap, Shepp, Larry, Wyner, Abraham, and Ziv, Jacob. On the entropy of DNA: Algorithms and measurements based on memory and rapid convergence. In Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 48–57. Society for Industrial and Applied Mathematics, 1995.Google Scholar

Feng, Jianxing, Li, Wei, and Jiang, Tao. Inference of isoforms from short sequence reads. Journal of Computational Biology, 18(3):305–321, 2011.Google Scholar

Ferrada, Héctor, Gagie, Travis, Hirvola, Tommi, and Puglisi, Simon J.. Hybrid indexes for repetitive datasets. Philosophical Transactions of the Royal Society A, 2014. In press.CrossRef Google Scholar

Ferragina, Paolo, Pearls of Algorithm Engineering. Cambridge University Press, 2023.CrossRef Google Scholar

Ferragina, Paolo and Manzini, Giovanni. Opportunistic data structures with applications. In Proceedings of the 41st IEEE Symposium on Foundations of Computer Science (FOCS), pages 390–398, 2000.Google Scholar

Ferragina, Paolo and Manzini, Giovanni. Indexing compressed texts. Journal of the ACM, 52(4): 552–581, 2005.CrossRef Google Scholar

Ferragina, Paolo, Luccio, Fabrizio, Manzini, Giovanni, and Muthukrishnan, S.. Compressing and indexing labeled trees, with applications. Journal of the ACM, 57(1), 2009.CrossRef Google Scholar

Ferragina, Paolo, Nitto, Igor, and Venturini, Rossano. On the bit-complexity of Lempel-Ziv compression. SIAM Journal on Computing (SICOMP), 42:1521–1541, 2013.CrossRef Google Scholar

Fischer, Johannes and Heun, Volker. Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM Journal on Computing (SICOMP), 40(2):465–492, 2011.CrossRef Google Scholar

Fischer, Johannes, Köppl, Dominik, Sadakane, Kunihiko, et al. Lempel–Ziv factorization powered by space efficient suffix trees. Algorithmica, 80(7):2048–2081, 2018.CrossRef Google Scholar

Fischer, Johannes, Mäkinen, Veli, and Välimäki, Niki. Space-efficient string mining under frequency constraints. In Proceedings of the Eighth IEEE International Conference on Data Mining (ICDM 2008), pages 193–202. IEEE Computer Society, 2008.CrossRef Google Scholar

Ford, Lester R. Network flow theory, Technical Report Paper P-923, The RAND Corporation, 1956.Google Scholar

Ford, Lester R. and Fulkerson, Delbert R.. Maximal flow through a network. Canadian Journal of Mathematics, 8:399–404, 1956.CrossRef Google Scholar

Fredman, Michael L. On computing the length of longest increasing subsequences. Discrete Mathematics, 11(1):29–35, 1975.CrossRef Google Scholar

Fredman, Michael L. and Willard, Dan E.. Trans-dichotomous algorithms for minimum spanning trees and shortest paths. Journal of Computer and System Sciences, 48(3):533–551, 1994.CrossRef Google Scholar

Fredman, Michael L., Komlós, János, and Szemerédi, Endre. Storing a sparse table with 0 (1) worst case access time. Journal of the ACM (JACM), 31(3):538–544, 1984.CrossRef Google Scholar

Fulkerson, Delbert R. Note on Dilworth’s decomposition theorem for partially ordered sets. Proceedings of the American Mathematical Society, 7(4):701–702, 1956.Google Scholar

Gabow, Harold N. Data structures for weighted matching and nearest common ancestors with linking. In Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’90, pages 434–443, Philadelphia, PA, USA, 1990. Society for Industrial and Applied Mathematics.Google Scholar

Gabow, Harold N. and Tarjan, Robert Endre. Faster scaling algorithms for network problems. SIAM Journal on Computing, 18(5):1013–1036, 1989.CrossRef Google Scholar

Gabow, Harold N., Bentley, Jon Louis, and Tarjan, Robert Endre. Scaling and related techniques for geometry problems. In Proceedings of the 16th Annual ACM Symposium on Theory of Computing (STOC 1984), pages 135–143. ACM, 1984.Google Scholar

Gagie, Travis, Gawrychowski, Pawel, Kärkkäinen, Juha, Nekrich, Yakov, and Puglisi, Simon J.. A faster grammar-based self-index. In 6th International Conference on Language and Automata Theory and Applications (LATA 2012), volume 7183 of Lecture Notes in Computer Science, pages 240–251. Springer, 2012.Google Scholar

Gagie, Travis, Manzini, Giovanni, and Sirén, Jouni. Wheeler graphs: A framework for BWT-based data structures. Theoretical Computer Science, 698:67–78, 2017.CrossRef Google Scholar PubMed

Gagie, Travis, Navarro, Gonzalo, and Prezza, Nicola. Fully functional suffix trees and optimal text searching in BWT-runs bounded space. Journal of the ACM, 67(1):2:1–2:54, 2020.CrossRef Google Scholar

Gallé, Matthias. Searching for compact hierarchical structures in DNA by means of the Smallest Grammar Problem. PhD thesis, Université Rennes 1, 2011.Google Scholar

Ganesh, Arun and Sy, Aaron. Near-linear time edit distance for indel channels. In Kingsford, Carl and Pisanti, Nadia, editors, 20th International Workshop on Algorithms in Bioinformatics, WABI 2020, September 7–9, 2020, Pisa, Italy (Virtual Conference), volume 172 of LIPIcs, pages 17:1–17:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020.Google Scholar

Gao, Song, Sung, Wing-Kin, and Nagarajan, Niranjan. Opera: Reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. Journal of Computational Biology, 18(11):1681–1691, 2011.CrossRef Google Scholar PubMed

Garcia, Sara P. and Pinho, Armando J.. Minimal absent words in four human genome assemblies. PLoS One, 6(12):e29344, 2011.CrossRef Google Scholar PubMed

Garcia, Sara P., Pinho, Armando J., Rodrigues, Joao M.O.S., Bastos, Carlos A.C., and Ferreira, Paulo J.S.G.. Minimal absent words in prokaryotic and eukaryotic genomes. PLoS One, 6(1):e16065, 2011.Google Scholar

Garey, Michael R. and Johnson, David S.. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979.Google Scholar

Garrison, Erik, Sirn, Jouni, Novak, Adam, et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology, 36, 2018. doi: 10.1038/nbt.4227.CrossRef Google Scholar PubMed

Ghaffaari, Ali and Marschall, Tobias. Fully-sensitive seed finding in sequence graphs using a hybrid index. Bioinformatics, 35(14):i81–i89, 2019.CrossRef Google Scholar PubMed

Giancarlo, Raffaele, Rombo, Simona E., and Utro, Filippo. Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies. Briefings in Bioinformatics, 15(3):390–406, 2014.CrossRef Google Scholar PubMed

Gibney, Daniel and Thankachan, Sharma V.. On the hardness and inapproximability of recognizing Wheeler graphs. In Bender, Michael A., Svensson, Ola, and Herman, Grzegorz, editors, Proceedings of the ESA, volume 144 of LIPIcs, pages 51:1–51:16. Schloss Dagstuhl, 2019.Google Scholar

Gibney, Daniel, Thankachan, Sharma V., and Aluru, Srinivas. Feasibility of flow decomposition with subpath constraints in linear time. In Boucher, Christina and Rahmann, Sven, editors, 22nd International Workshop on Algorithms in Bioinformatics, WABI 2022, September 5–7, 2022, Potsdam, Germany, volume 242 of LIPIcs, pages 17:1–17:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022.Google Scholar

Göke, Jonathan, Schulz, Marcel H., Lasserre, Julia, and Vingron, Martin. Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts. Bioinformatics, 28(5):656–663, 2012.Google Scholar

Goldberg, Andrew V. and Tarjan, Robert Endre. Solving minimum-cost flow problems by successive approximation. In Proceedings of the 19th Annual ACM Symposium on Theory of Computing, 1987, New York, USA, pages 7–18. ACM, 1987.Google Scholar

Goldberg, Andrew V. and Tarjan, Robert Endre. Finding minimum-cost circulations by canceling negative cycles. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, May 2–4, 1988, Chicago, Illinois, USA, pages 388–397. ACM, 1988.Google Scholar

Goldberg, Andrew V. and Tarjan, Robert Endre. Finding minimum-cost circulations by canceling negative cycles. Journal of the ACM, 36(4):873–886, 1989.CrossRef Google Scholar

Goldberg, Andrew V. and Tarjan, Robert Endre. Finding minimum-cost circulations by successive approximation. Mathematics of Operations Research, 15:430–466, 1990.CrossRef Google Scholar

Gonnet, Gaston, Baeza-Yates, Ricardo, and Snider, Tim. Information Retrieval: Data Structures and Algorithms, chapter 3: New indices for text: Pat trees and Pat arrays, pages 66–82. Prentice Hall, 1992.Google Scholar

Gori, Fabio, Folino, Gianluigi, Jetten, Mike S. M., and Marchiori, Elena. MTR: Taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks. Bioinformatics, 27(2):196–203, 2011.CrossRef Google Scholar PubMed

Gotoh, Osamu. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162(3):705–708, 1982.CrossRef Google Scholar PubMed

Grice, J. Alicia, Hughey, Richard, and Speck, Don. Reduced space sequence alignment. Bioinformatics, 13(1):45–53, 1997.CrossRef Google Scholar PubMed

Grigorescu, Elena, Azer, Erfan Sadeqi, and Zhou, Samson. Longest alignment with edits in data streams. In 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 405–412, 2017.CrossRef Google Scholar

Grossi, Roberto and Vitter, Jeffrey. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proceedings of the 32nd ACM Symposium on Theory of Computing (STOC), pages 397–406, 2000.CrossRef Google Scholar

Grossi, Roberto and Vitter, Jeffrey. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM Journal on Computing, 35(2):378–407, 2006.CrossRef Google Scholar

Grossi, Roberto, Gupta, Ankur, and Vitter, Jeffrey. High-order entropy-compressed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 841–850, 2003.Google Scholar

Grytten, Ivar, Rand, Knut Dagestad, and Sandve, Geir Kjetil. KAGE: Fast alignment-free graph-based genotyping of SNPs and short indels. Genome Biology, 23(1):1–15, 2022.CrossRef Google Scholar PubMed

Guisewite, G. M. and Pardalos, Panos M.. Minimum concave-cost network flow problems: Applications, complexity, and algorithms. Annals of Operations Research, 25(1):75–99, 1990.CrossRef Google Scholar

Guo, Hongzhe, Fu, Yilei, Gao, Yan, Li, Junyi, Wang, Yadong, and Liu, Bo. deGSM: Memory scalable construction of large scale de Bruijn graph. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(6):2157–2166, 2021.CrossRef Google Scholar

Gusfield, Dan. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.CrossRef Google Scholar

Gusfield, Dan. Integer Linear Programming in Computational and Systems Biology: an Entry-Level Text and Course. Cambridge University Press, 2019.CrossRef Google Scholar

Gusfield, Dan, Landau, Gad M., and Schieber, Baruch. An efficient algorithm for the all pairs suffix-prefix problem. Information Processing Letters, 41(4):181–185, 1992.Google Scholar

Hampikian, Gregory and Andersen, Tim. Absent sequences: Nullomers and primes. In Pacific Symposium on Biocomputing, volume 12, pages 355–366, 2007.Google Scholar

Haque, M. Monzoorul, Ghosh, Tarini Shankar, Komanduri, Dinakar, and Mande, Sharmila S.. SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences. Bioinformatics, 25(14):1722–1730, 2009.CrossRef Google Scholar

Harel, Dov and Tarjan, Robert Endre. Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing (SICOMP), 13(2):338–355, 1984.CrossRef Google Scholar

Hartman, Tzvika, Hassidim, Avinatan, Kaplan, Haim, Raz, Danny, and Segalov, Michal. How to split a flow? In Greenberg, Albert G. and Sohraby, Kazem, editors, INFOCOM, pages 828–836. IEEE, 2012.Google Scholar

Haubold, Bernhard, Pierstorff, Nora, Möller, Friedrich, and Wiehe, Thomas. Genome comparison without alignment using shortest unique substrings. BMC Bioinformatics, 6(1):123, 2005.CrossRef Google Scholar PubMed

Haussler, David. Convolution kernels on discrete structures. Technical report, UC Santa Cruz, 1999.Google Scholar

Herold, Julia, Kurtz, Stefan, and Giegerich, Robert. Efficient computation of absent words in genomic sequences. BMC Bioinformatics, 9(1):167, 2008.CrossRef Google Scholar PubMed

Hickey, Glenn, Heller, David, Monlong, Jean, et al. Genotyping structural variants in pangenome graphs using the vg toolkit. Genome Biology, 21(1):1–17, 2020.CrossRef Google Scholar PubMed

Hirschberg, Daniel S. A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.CrossRef Google Scholar

Hoare, C. A. R. Quicksort. The Computer Journal, 5(1):10–15, 1962.CrossRef Google Scholar

Holley, Guillaume and Melsted, Páll. Bifrost: Highly parallel construction and indexing of colored and compacted de Bruijn graphs. Genome Biology, 21(1):249, 2020.CrossRef Google Scholar PubMed

Hon, Wing-Kai and Sadakane, Kunihiko. Space-economical algorithms for finding maximal unique matches. In Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching (CPM’02), volume 2373 of Lecture Notes in Computer Science, pages 144–152. Springer-Verlag, 2002.Google Scholar

Hon, Wing-Kai, Sadakane, Kunihiko, and Sung, Wing-Kin. Breaking a time-and-space barrier in constructing full-text indices. In Proceedings of the 44th IEEE Symposium on Foundations of Computer Science (FOCS), pages 251–260, 2003.Google Scholar

Hoobin, Christopher, Puglisi, Simon J., and Zobel, Justin. Relative Lempel–Ziv factorization for efficient storage and retrieval of web collections. Proceedings of the VLDB Endowment, 5(3):265–273, 2011.CrossRef Google Scholar

Hopcroft, John E. and Karp, Richard M.. An n^5/2 algorithm for maximum matchings in bipartite graphs. SIAM Journal on Computing (SICOMP), 2(4):225–231, 1973.CrossRef Google Scholar

Hormozdiari, Fereydoun, Alkan, Can, Eichler, Evan E., and Sahinalp, S. Cenk. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Research, 19(7):1270–1278, 2009.CrossRef Google Scholar PubMed

Hu, T. C. Minimum-cost flows in convex-cost networks. Naval Research Logistics Quarterly, 13(1):1–9, 1966.CrossRef Google Scholar

Huang, Katherine, Brady, Arthur, Mahurkar, Anup, White, Owen, Gevers, Dirk, Huttenhower, Curtis, and Segata, Nicola. Metaref: A pan-genomic database for comparative and community microbial genomics. Nucleic Acids Research, page gkt1078, 2013a.Google Scholar

Huang, Lin, Popic, Victoria, and Batzoglou, Serafim. Short read alignment with populations of genomes. Bioinformatics, 29(13):361–370, 2013b.CrossRef Google Scholar PubMed

Hui, Lucas Chi Kwong. Color set size problem with application to string matching. In Proceedings of the Annual Symposium on Combinatorial Pattern Matching (CPM), volume 644 of LNCS, pages 230–243. Springer, 1992.Google Scholar

Hunt, James W., and Szymanski, Thomas G.. A fast algorithm for computing longest common subsequences. Communications of the ACM, 20(5):350–353, 1977.Google Scholar

Hunt, Martin, Letcher, Brice, Malone, Kerri M., et al. Minos: Variant adjudication and joint genotyping of cohorts of bacterial genomes. Genome Biology, 23(1):1–23, 2022.CrossRef Google Scholar PubMed

Huson, Daniel H., Reinert, Knut, and Myers, Eugene W.. The greedy path-merging algorithm for sequence assembly. In RECOMB, pages 157–163, 2001.CrossRef Google Scholar

Huson, Daniel H., Mitra, Suparna, Ruscheweyh, Hans-Joachim, Weber, Nico, and Schuster, Stephan C.. Integrative analysis of environmental sequences using megan4. Genome Research, 21(9):1552–1560, 2011.CrossRef Google Scholar PubMed

Hyyrö, Heikki. Explaining and extending the bit-parallel approximate string matching algorithm of Myers. Technical report, Department of Computer and Information Sciences, University of Tampere, Finland, 2001. Technical report A2001-10.Google Scholar

Ikeno, Nobuichi. A limit of crosspoint number. IRE Transactions on Circuit Theory, 6(5):187–196, 1959.CrossRef Google Scholar

Iqbal, Zamin, Caccamo, Mario, Turner, Isaac, Flicek, Paul, and McVean, Gil. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature Genetics, 44(2):226–232, 2012.CrossRef Google Scholar PubMed

Isenbarger, Thomas Anthony, Carr, Christopher E., Johnson, Sarah Stewart, et al. The most conserved genome segments for life detection on earth and other planets. Origins of Life and Evolution of Biospheres, 38:517–533, 2008.CrossRef Google Scholar PubMed

Ivanov, Pesho, Bichsel, Benjamin, and Vechev, Martin. Fast and optimal sequence-to-graph alignment guided by seeds. In Pe’er, Itsik, editor, Research in Computational Molecular Biology, pages 306–325. Springer, 2022.CrossRef Google Scholar

Jacobson, Guy. Space-efficient static trees and graphs. In Proceedings of the 30th IEEE Symposium on Foundations of Computer Science (FOCS), pages 549–554, 1989.CrossRef Google Scholar

Jain, Chirag, Tavakoli, Neda, and Aluru, Srinivas. A variant selection framework for genome graphs. Bioinformatics, 37(Supplement 1):i460–i467, 2021.CrossRef Google Scholar PubMed

Jain, Chirag, Gibney, Daniel, and Thankachan, Sharma V.. Colinear chaining with overlaps and gap costs. In Pe’er, Itsik, editor, Research in Computational Molecular Biology, pages 246–262. Springer, 2022.Google Scholar

Jänicke, Stefan, Büchler, Marco, and Scheuermann, Gerik. Improving the layout for text variant graphs. In VisLR Workshop at LREC Conference, pages 41–48, 2014.Google Scholar

Jenkins, Paul, Lyngsø, Rune B., and Hein, Jotun. How many transcripts does it take to reconstruct the splice graph? In Algorithms in Bioinformatics, 6th International Workshop, WABI 2006, Zurich, Switzerland, September 11–13, 2006, Proceedings, pages 103–114, 2006.CrossRef Google Scholar

Jiang, Bai, Song, Kai, Ren, Jie, Deng, Minghua, Sun, Fengzhu, and Zhang, Xuegong. Comparison of metagenomic samples using sequence signatures. BMC Genomics, 13(1):730, 2012.Google Scholar

Kärkkäinen, Juha and Na, Joong Chae. Faster filters for approximate string matching. In Proceedings of the 9th Workshop on Algorithm Engineering and Experiments (ALENEX07), pages 84–90. SIAM, 2007.Google Scholar

Kärkkäinen, Juha, Sanders, Peter, and Burkhardt, Stefan. Linear work suffix array construction. Journal of the ACM, 53(6):918–936, 2006.Google Scholar

Karp, Richard M. and Rabin, Michael O.. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249–260, 1987.CrossRef Google Scholar

Kasai, Toru, Lee, Gunho, Arimura, Hiroki, Arikawa, Setsuo, and Park, Kunsoo. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proceedings of the Annual Symposium on Combinatorial Pattern Matching (CPM), volume 2089 of LNCS, pages 181–192. Springer, 2001.Google Scholar

Kautz, W. Bounds on directed (d,k) graphs. Theory of Cellular Logic Networks and Machines, Stanford Research Institute, pages 20–28, 1968.Google Scholar

Kececioglu, John. Exact and approximation algorithms for DNA sequence reconstruction. PhD thesis, The University of Arizona, USA, 1991.Google Scholar

Kececioglu, John D. and Myers, Eugene W.. Combinatorial algorithms for DNA sequence assembly. Algorithmica, 13:7–51, 1993.CrossRef Google Scholar

Khan, Jamshed, Kokot, Marek, Deorowicz, Sebastian, and Patro, Rob. Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2. Genome Biology, 23(1):1–32, 2022.CrossRef Google Scholar

Kim, Dong Kyue, Sim, Jeong Seop, Park, Heejin, and Park, Kunsoo. Constructing suffix arrays in linear time. Journal of Discrete Algorithms, 3(2–4):126–142, 2005.CrossRef Google Scholar

Kim, Daehwan, Paggi, Joseph M., Park, Chanhee, Bennett, Christopher, and Salzberg, Steven L.. Graph-based genome alignment and genotyping with hisat2 and hisat-genotype. Nature Biotechnology, 37:907–915, 2019.CrossRef Google Scholar PubMed

Kloster, Kyle, Kuinke, Philipp, O’Brien, Michael P., et al. A practical FPT algorithm for flow decomposition and transcript assembly. In Pagh, Rasmus and Venkatasubramanian, Suresh, editors, Proceedings of the Twentieth Workshop on Algorithm Engineering and Experiments, ALENEX 2018, New Orleans, LA, USA, January 7–8, 2018, pages 75–86. SIAM, 2018.Google Scholar

Knuth, Donald E., Morris, James H., and Pratt, Vaughan R.. Fast pattern matching in strings. SIAM Journal of Computing, 6(2):323–350, 1977.CrossRef Google Scholar

Ko, Pang and Aluru, Srinivas. Space efficient linear time construction of suffix arrays. Journal of Discrete Algorithms, 3(2–4):143–156, 2005.CrossRef Google Scholar

Koch, Philipp, Platzer, Matthias, and Downie, Bryan R.. RepARK–de novo creation of repeat libraries from whole-genome NGS reads. Nucleic Acids Research, 42(9):e80–e80, 03 2014.CrossRef Google Scholar PubMed

Kovaka, Sam, Fan, Yunfan, Ni, Bohan, Timp, Winston, and Schatz, Michael C.. Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED. Nature Biotechnology, 39(4):431–441, 2021.CrossRef Google Scholar PubMed

Krogh, Anders, Brown, Michael, Mian, I. Saira, Sjölander, Kimmen, and Haussler, David. Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235(5):1501–1531, 1994.CrossRef Google Scholar PubMed

Kuksa, Pavel P. and Pavlovic, Vladimir. Efficient evaluation of large sequence kernels. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 759–767. ACM, 2012.Google Scholar

Kuksa, Pavel P., Huang, Pai-Hsi, and Pavlovic, Vladimir. Scalable algorithms for string kernels with inexact matching. In NIPS, pages 881–888, 2008.Google Scholar

Kulekci, M. Oguzhan, Vitter, Jeffrey Scott, and Xu, Bojian. Efficient maximal repeat finding using the Burrows–Wheeler transform and wavelet tree. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9(2):421–429, 2012.Google Scholar

Kurtz, Stefan, Choudhuri, Jomuna V., Ohlebusch, Enno, Schleiermacher, Chris, Stoye, Jens, and Giegerich, Robert. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Research, 29(22):4633–4642, 11 2001.CrossRef Google Scholar PubMed

Kuruppu, Shanika, Puglisi, Simon J., and Zobel, Justin. Optimized relative Lempel–Ziv compression of genomes. In Proceedings of the Thirty-Fourth Australasian Computer Science Conference-Volume 113, pages 91–98. Australian Computer Society, Inc., 2011a.Google Scholar

Kuruppu, Shanika, Puglisi, Simon J., and Zobel, Justin. Reference sequence construction for relative compression of genomes. In Grossi, Roberto, Sebastiani, Fabrizio, and Silvestri, Fabrizio, editors, String Processing and Information Retrieval, pages 420–425. Springer, 2011b.CrossRef Google Scholar

Lam, Tak Wah, Sung, Wing-Kin, Tam, Siu-Lung, Wong, Chi-Kwong, and Yiu, Siu-Ming. Compressed indexing and local alignment of DNA. Bioinformatics, 24(6):791–797, 2008.Google Scholar

Landau, Gad M. and Vishkin, Uzi. Fast string matching with k differences. Journal of Computer and System Sciences, 37:63–78, 1988.CrossRef Google Scholar

Landau, Gad M. and Vishkin, Uzi. Fast parallel and serial approximate string matching. Journal of Algorithms, 10(2):157–169, 1989.CrossRef Google Scholar

Langmead, Ben, Trapnell, Cole, Pop, Maihai, and Salzberg, Steven L.. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3): R25, 2009.Google Scholar

Lecompte, Lolita, Peterlongo, Pierre, Lavenier, Dominique, and Lemaitre, Claire. SVJedi: Genotyping structural variations with long reads. Bioinformatics, 36(17):4568–4575, 2020.CrossRef Google Scholar PubMed

Lee, Christopher, Grasso, Catherine, and Sharlow, Mark F.. Multiple sequence alignment using partial order graphs. Bioinformatics, 18(3):452–64, 2002.CrossRef Google Scholar PubMed

Lehman, Eric and Shelat, Abhi. Approximation algorithms for grammar-based compression. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 205–212. Society for Industrial and Applied Mathematics, 2002.Google Scholar

Leslie, Christina and Kuang, Rui. Fast kernels for inexact string matching. In Learning Theory and Kernel Machines, pages 114–128. Springer, 2003.CrossRef Google Scholar

Letcher, Brice, Hunt, Martin, and Iqbal, Zamin. Gramtools enables multiscale variation analysis with genome graphs. Genome Biology, 22(1):1–27, 2021.CrossRef Google Scholar PubMed

Levenshtein, Vladimir I. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10:707, 1966.Google Scholar

Li, Gaoyang, Jiang, Tao, Li, Junyi, and Wang, Yadong. PanSVR: Pan-genome augmented short read realignment for sensitive detection of structural variations. Frontiers in Genetics, Volume 12, 2021.Google Scholar PubMed

Li, Heng. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics, 34(18): 3094–3100, 2018.CrossRef Google Scholar PubMed

Li, Heng and Durbin, Richard. Fast and accurate short read alignment with BurrowsWheeler transform. Bioinformatics, 25(14):1754–1760, 2009.CrossRef Google Scholar PubMed

Li, Heng, Feng, Xiaowen, and Chu, Chong. The design and construction of reference pangenome graphs with minigraph. Genome Biology, 21(1):1–19, 2020.Google Scholar

Li, Jingyi Jessica, Jiang, Ci-Ren, Brown, James B., Huang, Haiyan, and Bickel, Peter J.. Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation. Proceedings National Academy of Sciences, 108(50): 19867–19872, 2011.CrossRef Google Scholar PubMed

Li, Ming and Vitányi, Paul M. B.. An Introduction to Kolmogorov Complexity and Its Applications. Springer, 2009.Google Scholar

Li, Ming, Chen, Xin, Li, Xin, Ma, Bin, and Vitányi, Paul M. B.. The similarity metric. IEEE Transactions on Information Theory, 50(12):3250–3264, 2004.Google Scholar

Li, Ruiqiang, Yu, Chang, Li, Yingrui, et al. Soap2. Bioinformatics, 25(15):1966–1967, 2009.CrossRef Google Scholar PubMed

Li, Wei, Feng, Jianxing, and Jiang, Tao. IsoLasso: A LASSO regression approach to RNA-Seq based transcriptome assembly. Journal of Computational Biology, 18(11):1693–707, 2011.CrossRef Google Scholar PubMed

Lifshits, Yury, Mozes, Shay, Weimann, Oren, and Ziv-Ukelson, Michal. Speeding up HMM decoding and training by exploiting sequence repetitions. Algorithmica, 54(3):379–399, 2009.CrossRef Google Scholar

Lin, Jie, Adjeroh, Donald, and Jiang, Bing-Hua. Probabilistic suffix array: Efficient modeling and prediction of protein families. Bioinformatics, 28(10):1314–1323, 04 2012a.CrossRef Google Scholar PubMed

Lin, Yen-Yi, Dao, Phuong, Hach, Faraz, et al. CLIIQ: Accurate comparative detection and quantification of expressed isoforms in a population. In Raphael, Benjamin J. and Tang, Jijun, editors, WABI, volume 7534 of Lecture Notes in Computer Science, pages 178–189. Springer, 2012b.Google Scholar

Lindsay, James, Salooti, Hamed, Zelikovsky, Alex, and MȈandoiu, Ion. Scalable genome scaffolding using integer linear programming. In Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB ’12, New York, NY, USA, pages 377–383. ACM, 2012.CrossRef Google Scholar

Liu, Bo, Gibbons, Theodore, Ghodsi, Mohammad, Treangen, Todd, and Pop, Mihai. Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics, 12(Suppl 2):S4, 2011.CrossRef Google Scholar PubMed

Lo, Christine, Kim, Sangwoo, Zakov, Shay, and Bafna, Vineet. Evaluating genome architecture of a complex region via generalized bipartite matching. BMC Bioinformatics, 14(Suppl 5): S13, 2013.Google Scholar

Lodhi, Huma, Saunders, Craig, Shawe-Taylor, John, Cristianini, Nello, and Watkins, Chris. Text classification using string kernels. The Journal of Machine Learning Research, 2:419–444, 2002.Google Scholar

Löytynoja, Ari, Vilella, Albert J., and Goldman, Nick. Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm. Bioinformatics, 28(13):1684–1691, 2012.CrossRef Google Scholar PubMed

Maciuca, Sorina, del Ojo Elias, Carlos, McVean, Gil, and Iqbal, Zamin. A natural encoding of genetic variation in a Burrows–Wheeler transform to enable mapping and genome inference. In International Workshop on Algorithms in Bioinformatics, pages 222–233. Springer, 2016.CrossRef Google Scholar

Maier, David. The complexity of some problems on subsequences and supersequences. Journal of the ACM, 25(2):322–336, 1978.Google Scholar

Maillet, Nicolas, Lemaitre, Claire, Chikhi, Rayan, Lavenier, Dominique, and Peterlongo, Pierre. Compareads: comparing huge metagenomic experiments. BMC Bioinformatics, 13(Suppl 19):S10, 2012.CrossRef Google Scholar PubMed

Mäkinen, Veli and Navarro, Gonzalo. Succinct suffix arrays based on run-length encoding. Nordic Journal of Computing, 12(1):40–66, 2005.Google Scholar

Mäkinen, Veli and Navarro, Gonzalo. Rank and select revisited and extended. Theoretical Computer Science, 387(3):332–347, 2007.CrossRef Google Scholar

Mäkinen, Veli and Norri, Tuukka. Applying the positional Burrows–Wheeler transform to all-pairs Hamming distance. Information Processing Letters, 146:17–19, 2019.CrossRef Google Scholar

Mäkinen, Veli and Sahlin, Kristoffer. Chaining with overlaps revisited. In Gørtz, Inge Li and Weimann, Oren, editors, 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, June 17-19, 2020, Copenhagen, Denmark, volume 161 of LIPIcs, pages 25:1–25:12. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020.Google Scholar

Mäkinen, Veli, Navarro, Gonzalo, and Ukkonen, Esko. Algorithms for transposition invariant string matching. In Proceedings of the 20th International Symposium on Theoretical Aspects of Computer Science (STACS’03), volume 2607 of LNCS, pages 191–202. Springer-Verlag, 2003.Google Scholar

Mäkinen, Veli, Navarro, Gonzalo, Síren, Jouni, and Välimäki, Niko. Storage and retrieval of individual genomes. In Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2009), volume 5541 of LNBI, pages 121–137. Springer-Verlag, 2009.Google Scholar

Mäkinen, Veli, Navarro, Gonzalo, Sirén, Jouni, and Välimäki, Niko. Storage and retrieval of highly repetitive sequence collections. Journal of Computational Biology, 17(3):281–308, 2010.CrossRef Google Scholar PubMed

Mäkinen, Veli, Salmela, Leena, and Ylinen, Johannes. Normalized N50 assembly metric using gap-restricted co-linear chaining. BMC Bioinformatics, 13:255, 2012.CrossRef Google Scholar PubMed

Mäkinen, Veli, Tomescu, Alexandru I., Kuosmanen, Anna, Paavilainen, Topi, Gagie, Travis, and Chikhi, Rayan. Sparse dynamic programming on DAGs with small width. ACM Transactions on Algorithms, 15(2):29:1–29:21, 2019.CrossRef Google Scholar

Mäklin, Tommi, Kallonen, Teemu, Alanko, Jarno, et al. Bacterial genomic epidemiology with mixed samples. Microbial Genomics, 7(11), 2021.CrossRef Google Scholar PubMed

Manber, Udi and Myers, Gene. Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing, 22(5):935–948, 1993.CrossRef Google Scholar

Marco-Sola, Santiago, Moure, Juan Carlos, Moreto, Miquel, and Espinosa, Antonio. Fast gap-affine pairwise alignment using the wavefront algorithm. Bioinformatics, 37(4):456–463, 2020.CrossRef Google Scholar

Marcus, Shoshana, Lee, Hayan, and Schatz, Michael C.. SplitMEM: A graphical algorithm for pan-genome analysis with suffix skips. Bioinformatics, 30(24):3476–3483, 2014.CrossRef Google Scholar PubMed

Marschall, Tobias, Costa, Ivan G., Canzar, Stefan, et al. Clever: Clique-enumerating variant finder. Bioinformatics, 28(22):2875–2882, 2012.CrossRef Google Scholar PubMed

Masek, William and Paterson, Michael. A faster algorithm for computing string edit distances. Journal of Computer and System Sciences, 20(1):18–31, 1980.CrossRef Google Scholar

Mathé, Catherine, Sagot, Marie-France, Schiex, Thomas, and Rouzé, Pierre. Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Research, 30(19):4103–4117, 2002.CrossRef Google Scholar PubMed

McCreight, Edward M. A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262–272, 1976.CrossRef Google Scholar

Medvedev, Paul, Georgiou, Konstantinos, Myers, Gene, and Brudno, Michael. Computability of models for sequence assembly. In Workshop on Algorithms in Bioinformatics (WABI 2007), pages 289–301, 2007.Google Scholar

Medvedev, Paul, Fiume, Marc, Dzamba, Misko, Smith, Tim, and Brudno, Michael. Detecting copy number variation with mated short reads. Genome Research, 20(11):1613–1622, 2010.Google Scholar

Minkin, Ilia, Pham, Son, and Medvedev, Paul. TwoPaCo: An efficient algorithm to build the compacted de Bruijn graph from many complete genomes. Bioinformatics, 33(24):4024–4032, 2017.CrossRef Google Scholar

Minoux, Michel. Solving integer minimum cost flows with separable convex cost objective polynomially. In Gallo, Giorgio and Sandi, Claudio, editors, Netflow at Pisa, volume 26 of Mathematical Programming Studies, pages 237–239. Springer, 1986.Google Scholar

Mitzenmacher, Michael and Upfal, Eli. Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis. Cambridge University Press, 2017.Google Scholar

Moore, Edward. The shortest path through a maze, in Aiken, Howard, ed., Proceedings of an International Symposium on the Theory of Switching, 2–5 April 1957, Part II, pages 285–292. Harvard University Press, 1959.Google Scholar

Morris, James H. and Pratt, Vaughan R.. A linear pattern-matching algorithm. Technical Report 40, University of California, Berkeley, 1970.Google Scholar

Muggli, Martin D., Bowe, Alexander, Noyes, Noelle R., et al. Succinct colored de Bruijn graphs. Bioinformatics, 33(20):3181–3187, 2017.CrossRef Google Scholar PubMed

Munro, I. Tables. In Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS), LNCS v. 1180, pages 37–42, 1996.CrossRef Google Scholar

Myers, Eugene W. The fragment assembly string graph. Bioinformatics, 21(suppl 2):ii79–ii85, 2005.Google Scholar

Myers, Gene. A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM, 46(3):395–415, 1999.Google Scholar

Na, Joong Chae, Kim, Hyunjoon, Park, Heejin, et al. FM-index of alignment: A compressed index for similar strings. Theoretical Computer Science, 638:159–170, 2016.CrossRef Google Scholar

Na, Joong Chae, Kim, Hyunjoon, Min, Seunghwan, et al. Fm-index of alignment with gaps. Theoretical Computer Science, 710:148–157, 2018.CrossRef Google Scholar

Nagarajan, Niranjan and Pop, Mihai. Parametric complexity of sequence assembly: Theory and applications to next generation sequencing. Journal of Computational Biology, 16(7):897–908, 2009.CrossRef Google Scholar

Navarro, Gonzalo. Compact Data Structures: A Practical Approach. Cambridge University Press, 2016.CrossRef Google Scholar

Navarro, Gonzalo. Improved approximate pattern matching on hypertext. Theoretical Computer Science, 237(1-2):455–463, 2000.Google Scholar

Navarro, Gonzalo. A guided tour to approximate string matching. ACM Computing Surveys, 33(1):31–88, 2001.Google Scholar

Navarro, Gonzalo. Wavelet trees for all. In Kärkkäinen, Juha and Stoye, Jens, editors, Combinatorial Pattern Matching, volume 7354 of Lecture Notes in Computer Science, pages 2–26. Springer, 2012.Google Scholar

Navarro, Gonzalo and Mäkinen, Veli. Compressed full-text indexes. ACM Computing Surveys, 39(1):Article 2, 2007.CrossRef Google Scholar

Navarro, Gonzalo and Raffinot, Mathieu. Flexible Pattern Matching in Strings – Practical On-Line Search Algorithms for Texts and Biological Sequences. Cambridge University Press, 2002.CrossRef Google Scholar

Needleman, Saul B. and Wunsch, Christian D.. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443–453, 1970.Google Scholar

Nevill-Manning, Craig G. Inferring sequential structure. PhD thesis, Citeseer, 1996.Google Scholar

Nishimoto, Takaaki, Kanda, Shunsuke, and Tabei, Yasuo. An optimal-time RLBWT construction in BWT-runs bounded space. In Bojanczyk, Mikolaj, Merelli, Emanuela, and Woodruff, David P., editors, 49th International Colloquium on Automata, Languages, and Programming, ICALP 2022, July 4–8, 2022, Paris, France, volume 229 of LIPIcs, pages 99:1–99:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022.Google Scholar

Norri, Tuukka, Cazaux, Bastien, Kosolobov, Dmitry, and Mäkinen, Veli. Linear time minimum segmentation enables scalable founder reconstruction. Algorithms for Molecular Biology, 14(1):12:1–12:15, 2019.CrossRef Google Scholar PubMed

Norri, Tuukka, Cazaux, Bastien, Dönges, Saska, Valenzuela, Daniel, and Mäkinen, Veli. Founder reconstruction enables scalable and seamless pangenomic analysis. Bioinformatics, 37(24): 4611–4619, 2021.CrossRef Google Scholar PubMed

Ntafos, Simeon C. and Hakimi, S. Louis. On path cover problems in digraphs and applications to program testing. IEEE Transactions on Software Engineering, 5(5):520–529, 1979.CrossRef Google Scholar

Nykänen, Matti and Ukkonen, Esko. The exact path length problem. Journal of Algorithms, 42(1):41–52, 2002.Google Scholar

Ohlebusch, Enno, Gog, Simon, and Kügel, Adrian. Computing matching statistics and maximal exact matches on compressed full-text indexes. In SPIRE, pages 347–358, 2010.CrossRef Google Scholar

Okanohara, Daisuke and Tsujii, Jun’ichi. Text categorization with all substring features. In Proceedings of SDM 2009, pages 838–846, 04 2009.Google Scholar

Onodera, Taku, Sadakane, Kunihiko, and Shibuya, Tetsuo. Detecting superbubbles in assembly graphs. In WABI, pages 338–348, 2013.CrossRef Google Scholar

Orlin, James B. A faster strongly polynominal minimum cost flow algorithm. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, May 2–4, 1988, Chicago, Illinois, USA, pages 377–387, 1988.Google Scholar

Orlin, James B. A faster strongly polynomial minimum cost flow algorithm. Operations Research, 41:338–350, 1993.CrossRef Google Scholar

Pabinger, Stephan, Dander, Andreas, Fischer, Maria, et al. A survey of tools for variant analysis of next-generation genome sequencing data. Briefings in Bioinformatics, 2013.CrossRef Google Scholar

Pagh, Anna, Pagh, Rasmus, and Rao, S. Srinivasa. An optimal Bloom filter replacement. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 823–829, 2005.Google Scholar

Pandey, Prashant, Gao, Yinjie, and Kingsford, Carl. VariantStore: An index for large-scale genomic variant search. Genome Biology, 22(1):1–25, 2021.CrossRef Google Scholar PubMed

Parida, Laxmi. Pattern Discovery in Bioinformatics: Theory and Algorithms. Chapman & Hall/CRC, 2007.CrossRef Google Scholar

Paten, Benedict, Novak, Adam M., Eizenga, Jordan M., and Garrison, Erik. Genome graphs and the evolution of genome inference. Genome Research, 27(5):665–676, 2017.Google Scholar

Patterson, Murray, Marschall, Tobias, Pisanti, Nadia, et al. WhatsHap: Haplotype assembly for future-generation sequencing reads. In 18th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2014), volume 8394 of Lecture Notes in Computer Science, pages 237–249. Springer, 2014.Google Scholar

Payne, Alexander, Holmes, Nadine, Clarke, Thomas, et al. Readfish enables targeted nanopore sequencing of gigabase-sized genomes. Nature Biotechnology, 39(4):442–450, 2021.Google Scholar

Pell, Jason, Hintze, Arend, Canino-Koning, Rosangela, Howe, Adina, Tiedje, James M., and Brown, C. Titus. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proceedings of the National Academy of Sciences, 109(33):13272–13277, 2012.CrossRef Google Scholar PubMed

Pellicer, Jaume, Fay, Michael F., and Leitch, Ilia J.. The largest eukaryotic genome of them all? Botanical Journal of the Linnean Society, 164(1):10–15, 2010.CrossRef Google Scholar

Pevzner, Pavel A. ℓ-tuple DNA sequencing: Computer analysis. Journal of Biomolecular Structure and Dynamics, 7(1):63–73, 1989.CrossRef Google Scholar

Pevzner, Pavel A., Tang, Haixu, and Waterman, Michael S.. A new approach to fragment assembly in DNA sequencing. In RECOMB, pages 256–267, 2001.Google Scholar

Pijls, Wim and Potharst, Rob. Another note on Dilworth’s decomposition theorem. Journal of Discrete Mathematics, 2013:692645, 2013.CrossRef Google Scholar

Policriti, Alberto and Prezza, Nicola. Hashing and Indexing: Succinct data structures and smoothed analysis. In ISAAC 2014 – 25th International Symposium on Algorithms and Computation, 2014.CrossRef Google Scholar

Policriti, Alberto and Prezza, Nicola. Fast randomized approximate string matching with succinct hash data structures. BMC Bioinformatics, 16 Suppl 9:S4, 2015.CrossRef Google Scholar PubMed

Policriti, Alberto, Tomescu, Alexandru I., and Vezzi, Francesco. A randomized numerical aligner (rNA). Journal of Computer and System Sciences, 78(6):1868–1882, 2012.Google Scholar

Prezza, Nicola, Pisanti, Nadia, Sciortino, Marinella, and Rosone, Giovanna. Detecting mutations by eBWT. In Parida, Laxmi and Ukkonen, Esko, editors, 18th International Workshop on Algorithms in Bioinformatics, WABI 2018, August 20-22, 2018, Helsinki, Finland, volume 113 of LIPIcs, pages 3:1–3:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018.Google Scholar

Pritt, Jacob, Chen, Nae-Chyun, and Langmead, Ben. FORGe: Prioritizing variants for graph genomes. Genome Biology, 19(1):1–16, 2018.CrossRef Google Scholar PubMed

Puglisi, Simon J., Smyth, William F., and Turpin, Andrew. A taxonomy of suffix array construction algorithms. ACM Computing Surveys, 39(2), 2007.CrossRef Google Scholar

Qi, Ji, Wang, Bin, and Hao, Bai-Iin. Whole proteome prokaryote phylogeny without sequence alignment: A K-string composition approach. Journal of Molecular Evolution, 58(1):1–11, 2004.Google Scholar

Qin, Zhaohui S., Yu, Jianjun, Shen, Jincheng, et al. HPeak: An HMM-based algorithm for defining read-enriched regions in ChIP-Seq data. BMC Bioinformatics, 11:369, 2010.CrossRef Google Scholar PubMed

Qiu, Yutong and Kingsford, Carl. Constructing small genome graphs via string compression. Bioinformatics, 37(Supplement 1):i205–i213, 2021.Google Scholar

Rabin, Michael O. and Scott, Dana. Finite automata and their decision problems. IBM Journal of Research and Development, 3(2):114–125, 1959.Google Scholar

Raffinot, Mathieu. On maximal repeats in strings. Information Processing Letters, 80(3):165–169, 2001.CrossRef Google Scholar

Räihä, Kari-Jouko and Ukkonen, Esko. The shortest common supersequence problem over binary alphabet is NP-complete. Theoretical Computer Science, 16:187–198, 1981.CrossRef Google Scholar

Rakocevic, Goran, Semenyuk, Vladimir, Lee, Wan-Ping, et al. Fast and accurate genomic analyses using genome graphs. Nature Genetics, 51(2):354–362, 2019.Google Scholar

Rautiainen, Mikko and Marschall, Tobias. Aligning sequences to general graphs in O(V + mE) time. bioRxiv, 2017. doi: 10.1101/216127.Google Scholar

Rautiainen, Mikko and Marschall, Tobias. GraphAligner: Rapid and versatile sequence-to-graph alignment. Genome Biology, 21(1):1–28, 2020.CrossRef Google Scholar PubMed

Rautiainen, Mikko and Marschall, Tobias. MBG: Minimizer-based sparse de Bruijn graph construction. Bioinformatics, 37(16):2476–2478, 2021.CrossRef Google Scholar PubMed

Reinert, Gesine, Chew, David, Sun, Fengzhu, and Waterman, Michael S.. Alignment-free sequence comparison (i): Statistics and power. Journal of Computational Biology, 16(12): 1615–1634, 2009.Google Scholar

Ren, Jie, Song, Kai, Sun, Fengzhu, Deng, Minghua, and Reinert, Gesine. Multiple alignment-free sequence comparison. Bioinformatics, 29(21):2690–2698, 2013.CrossRef Google Scholar PubMed

Rho, Mina, Choi, Jeong-Hyeon, Kim, Sun, Lynch, Michael, and Tang, Haixu. De novo identiffication of LTR retrotransposons in eukaryotic genomes. BMC Genomics, 8:90, 2007.CrossRef Google Scholar

Rieck, Konrad and Laskov, Pavel. Linear-time computation of similarity measures for sequential data. Journal of Machine Learning Research, 9:2348, 2008.Google Scholar

Rieck, Konrad, Laskov, Pavel, and Sonnenburg, Sören. Computation of similarity measures for sequential data using generalized suffix trees. In Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference. The MIT Press, 2007.Google Scholar

Rizzi, Romeo, Tomescu, Alexandru I., and Mäkinen, Veli. On the complexity of minimum path cover with subpath constraints for multi-assembly. BMC Bioinformatics, 15(9):S5, 2014.CrossRef Google Scholar PubMed

Roberts, Michael, Hayes, Wayne, Hunt, Brian R., Mount, Stephen M., and Yorke, James A.. Reducing storage requirements for biological sequence comparison. Bioinformatics, 20(18): 3363–3369, 2004.Google Scholar

Rødland, Einar Andreas. Compact representation of k-mer de Bruijn graphs for genome read assembly. BMC Bioinformatics, 14(1):313, 2013.Google Scholar

Ronen, Roy, Boucher, Christina, Chitsaz, Hamidreza, and Pevzner, Pavel. SEQuel: Improving the accuracy of genome assemblies. Bioinformatics, 28(12):i188–i196, 2012.CrossRef Google Scholar PubMed

Rousu, Juho, Shawe-Taylor, John, and Jaakkola, Tommi. Efficient computation of gapped substring kernels on large alphabets. Journal of Machine Learning Research, 6(9), 2005.Google Scholar

Ruan, Jue and Li, Heng. Fast and accurate long-read assembly with wtdbg2. Nature Methods, 17(2):155–158, 2020.CrossRef Google Scholar PubMed

Rytter, Wojciech. Application of Lempel–Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science, 302(1):211–222, 2003.CrossRef Google Scholar

Sacomoto, Gustavo. Efficient algorithms for de novo assembly of alternative splicing events from RNA-seq data. PhD thesis, The University Claude Bernard Lyon 1, Lyon, France, 2014.Google Scholar

Sadakane, Kunihiko. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Proceedings of the 11th International Symposium on Algorithms and Computation (ISAAC), LNCS v. 1969, pages 410–421, 2000.Google Scholar

Sahlin, Kristoffer, Street, Nathaniel, Lundeberg, Joakim, and Arvestad, Lars. Improved gap size estimation for scaffolding algorithms. Bioinformatics, 28(17):2215–2222, 2012.CrossRef Google Scholar PubMed

Sakharkar, Meena K., Chow, Vincent T., and Kangueane, Pandjassarame. Distributions of exons and introns in the human genome. In Silico Biol, 4(4):387–393, 2004.Google Scholar

Salikhov, Kamil, Sacomoto, Gustavo, and Kucherov, Gregory. Using cascading Bloom filters to improve the memory usage for de Brujin graphs. In Algorithms in Bioinformatics, pages 364–376. Springer, 2013.CrossRef Google Scholar

Salmela, Leena and Schröder, Jan. Correcting errors in short reads by multiple alignments. Bioinformatics, 27(11):1455–1461, 2011.CrossRef Google Scholar PubMed

Salmela, Leena, Mäkinen, Veli, Välimäki, Niko, Ylinen, Johannes, and Ukkonen, Esko. Fast scaffolding with small independent mixed integer programs. Bioinformatics, 27(23):3259–3265, 2011.Google Scholar

Salmela, Leena, Sahlin, Kristoffer, Mäkinen, Veli, and Tomescu, Alexandru I.. Gap filling as exact path length problem. Journal of Computational Biology, 23(5):347–361, 2016.CrossRef Google Scholar PubMed

Sankoff, David. Minimal mutation trees of sequences. SIAM Journal on Applied Mathematics, 28:35–42, 1975.CrossRef Google Scholar

Schleimer, Saul, Wilkerson, Daniel S., and Aiken, Alex. Winnowing: Local algorithms for document fingerprinting. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD ’03, page 7685, New York, NY, USA, 2003. Association for Computing Machinery.Google Scholar

Schnattinger, Thomas, Ohlebusch, Enno, and Gog, Simon. Bidirectional search in a string with wavelet trees. In 21st Annual Symposium on Combinatorial Pattern Matching (CPM 2010), volume 6129 of Lecture Notes in Computer Science, pages 40–50. Springer, 2010.Google Scholar

Schneeberger, Korbinian, Hagmann, Jörg, Ossowski, Stephan, et al. Simultaneous alignment of short reads against multiple genomes. Genome Biology, 10:R98, 2009.CrossRef Google Scholar PubMed

Schröder, Jan, Schrder, Heiko, Puglisi, Simon J., Sinha, Ranjan, and Schmidt, Bertil. SHREC: A short-read error correction method. Bioinformatics, 25(17):2157–2163, 2009.Google Scholar

Schrijver, Alexander. Combinatorial Optimization. Polyhedra and Efficiency. Vol. A, volume 24 of Algorithms and Combinatorics. Springer-Verlag, 2003. Paths, flows, matchings, Chapters 1–38.Google Scholar

Schulz, Marcel H., Weese, David, Rausch, Tobias, Döring, Andreas, Reinert, Knut, and Vingron, Martin. Fast and adaptive variable order Markov chain construction. In Crandall, Keith A. and Lagergren, Jens, editors, Algorithms in Bioinformatics, pages 306–317. Springer, 2008.CrossRef Google Scholar

Schulz, Tizian, Wittler, Roland, and Stoye, Jens. Sequence-based pangenomic core detection. iScience, 25(6):104413, 2022.CrossRef Google Scholar PubMed

Segata, Nicola, Waldron, Levi, Ballarini, Annalisa, Narasimhan, Vagheesh, Jousson, Olivier, and Huttenhower, Curtis. Metagenomic microbial community profiling using unique clade-specific marker genes. Nature Methods, 9(8):811–814, 2012.CrossRef Google Scholar PubMed

Seth, Sohan, Välimäki, Niko, Kaski, Samuel, and Honkela, Antti. Exploration and retrieval of whole-metagenome sequencing samples. Bioinformatics, 30(17):2471–2479, 2014.Google Scholar

Shafin, Kishwar, Pesout, Trevor, Lorig-Roach, Ryan, et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nature Biotechnology, 38(9):1044–1053, 2020.CrossRef Google Scholar PubMed

Sharma, Vineet K., Kumar, Naveen, Prakash, Tulika, and Taylor, Todd D.. Fast and accurate taxonomic assignments of metagenomic sequences using MetaBin. PLoS One, 7(4):e34030, 2012.CrossRef Google Scholar PubMed

Shawe-Taylor, John and Cristianini, Nello. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.CrossRef Google Scholar

Sherman, Rachel M. and Salzberg, Steven L.. Pan-genomics in the human genome era. Nature Reviews Genetics, 21(4):243–254, 2020.CrossRef Google Scholar PubMed

Shimbel, A. Structure in communication nets. In Proceedings of the Symposium on Information Networks (New York, 1954), pages 199–203. Polytechnic Press of the Polytechnic Institute of Brooklyn, 1955.Google Scholar

Sibbesen, Jonas Andreas, Maretty, Lasse, and Krogh, Anders. Accurate genotyping across variant classes and lengths using variant graphs. Nature Genetics, 50(7):1054–1059, 2018.Google Scholar

Simpson, Jared T. and Durbin, Richard. Efficient construction of an assembly string graph using the FM-index. Bioinformatics, 26(12):367–373, 2010.Google Scholar

Sims, Gregory E., Jun, Se-Ran, Wu, Guohong A., and Kim, Sung-Hou. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proceedings of the National Academy of Sciences, 106(8):2677–2682, 2009.CrossRef Google Scholar PubMed

Sirén, Jouni, Välimäki, Niko, and Mäkinen, Veli. Indexing finite language representation of population genotypes. In 11th International Workshop on Algorithms in Bioinformatics (WABI 2011), volume 6833 of Lecture Notes in Computer Science, pages 270–281. Springer, 2011.Google Scholar

Sirén, Jouni, Välimäki, Niko, and Mäkinen, Veli. Indexing graphs for path queries with applications in genome research. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 11(2):375–388, 2014.CrossRef Google Scholar PubMed

Sirén, Jouni, Garrison, Erik, Novak, Adam M., Paten, Benedict, and Durbin, Richard. Haplotype-aware graph indexes. Bioinformatics, 36(2):400–407, 2020.Google Scholar

Sirén, Jouni, Monlong, Jean, Chang, Xian, et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science (New York, N.Y.), 374:abg8871, 2021.CrossRef Google Scholar PubMed

Smith, T. F. and Waterman, M. S.. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195–197, 1981.CrossRef Google Scholar PubMed

Smola, Alex J. and Vishwanathan, S. V. N.. Fast kernels for string and tree matching. In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15, pages 585–592. MIT Press, 2003.Google Scholar

Sobih, Ahmed, Tomescu, Alexandru I., and Mäkinen, Veli. Metaflow: Metagenomic profiling based on whole-genome coverage analysis with min-cost flows. In Singh, Mona, editor, Research in Computational Molecular Biology – 20th Annual Conference, RECOMB 2016, Santa Monica, CA, USA, April 17–21, 2016, Proceedings, volume 9649 of Lecture Notes in Computer Science, pages 111–121. Springer, 2016.Google Scholar

Song, Kai, Ren, Jie, Reinert, Gesine, Deng, Minghua, Waterman, Michael S., and Sun, Fengzhu. New developments of alignment-free sequence comparison: Measures, statistics and next-generation sequencing. Briefings in Bioinformatics, page bbt067, 2013a.Google Scholar

Song, Kai, Ren, Jie, Zhai, Zhiyuan, Liu, Xuemei, Deng, Minghua, and Sun, Fengzhu. Alignment-free sequence comparison based on next-generation sequencing reads. Journal of Computational Biology, 20(2):64–79, 2013b.CrossRef Google Scholar PubMed

Song, Li and Florea, Liliana. CLASS: Constrained transcript assembly of RNA-seq reads. BMC Bioinformatics, 14(S-5):S14, 2013. Proceedings of RECOMB-seq: Third Annual Recomb Satellite Workshop on Massively Parallel Sequencing Beijing, China. 11–12 April 2013.Google Scholar

Spang, Rainer, Rehmsmeier, Marc, and Stoye, Jens. A novel approach to remote homology detection: Jumping alignments. Journal of Computational Biology, 9(5):747–760, 2002.CrossRef Google Scholar PubMed

Su, Chien-Hao, Wang, Tse-Yi, Hsu, Ming-Tsung, et al. The impact of normalization and phylogenetic information on estimating the distance for metagenomes. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9(2):619–628, 2012.CrossRef Google Scholar PubMed

Su, Xiaoquan, Wang, Xuetao, Xu, Jian, and Ning, Kang. GPU-Meta-Storms: Computing the similarities among massive microbial communities using GPU. In Systems Biology (ISB), 2013 7th International Conference on, pages 69–74. IEEE, 2013.Google Scholar

Tanaseichuk, Olga, Borneman, James, and Jiang, Tao. Separating metagenomic short reads into genomes via clustering. In WABI, pages 298–313, 2011.Google Scholar

Tardos, É. A strongly polynomial algorithm to solve combinatorial linear programs. Combinatorica, 5:247–255, 1985.CrossRef Google Scholar

Tavakoli, Neda, Gibney, Daniel, and Aluru, Srinivas. Haplotype-aware variant selection for genome graphs. In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pages 1–9, 2022.Google Scholar

Tempel, Sebastien, Rousseau, Christine, Tahi, Fariza, and Nicolas, Jacques. ModuleOrganizer: Detecting modules in families of transposable elements. BMC Bioinformatics, 11:474, 2010.CrossRef Google Scholar PubMed

Teo, Choon Hui and Vishwanathan, SVN. Fast and space efficient string kernels using suffix arrays. In Proceedings of the 23rd International Conference on Machine Learning, pages 929–936. ACM, 2006.CrossRef Google Scholar

Tettelin, Hervé, Masignani, Vega, Cieslewicz, Michael J., et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome”. Proceedings of the National Academy of Sciences, 102(39):13950–13955, 2005.CrossRef Google Scholar PubMed

Tomescu, Alexandru I., Kuosmanen, Anna, Rizzi, Romeo, and Mäkinen, Veli. A novel combinatorial method for estimating transcript expression with RNA-Seq: Bounding the number of paths. In Darling, Aaron E. and Stoye, Jens, editors, Proceedings of the 13th International Workshop on Algorithms in Bioinformatics (WABI), LNCS 8126, pages 85–98, Springer, 2013a.CrossRef Google Scholar

Tomescu, Alexandru I., Kuosmanen, Anna, Rizzi, Romeo, and Mäkinen, Veli. A novel min-cost flow method for estimating transcript expression with RNA-Seq. BMC Bioinformatics, 14 (S-5):S15, 2013b. Proceedings of RECOMB-seq 2013.CrossRef Google Scholar PubMed

Trapnell, Cole, Pachter, Lior, and Salzberg, Steven L.. TopHat: Discovering splice junctions with RNA-seq. Bioinformatics, 25(9):1105–1111, 2009.CrossRef Google Scholar PubMed

Trapnell, Cole, Williams, Brian A., Pertea, Geo, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology, 28(5):511–515, 2010. doi: 10.1038/nbt.1621.Google Scholar

Turner, Isaac, Garimella, Kiran V., Iqbal, Zamin, and McVean, Gil. Integrating long-range connectivity information into de Bruijn graphs. Bioinformatics, 34(15):2556–2565, 2018.CrossRef Google Scholar PubMed

Tutte, William T. A short proof of the factor theorem for finite graphs. Canadian Journal of Mathematics, 6(1954):347–352, 1954.CrossRef Google Scholar

Ukkonen, Esko. Algorithms for approximate string matching. Information and Control, 64(1–3):100–118, 1985.CrossRef Google Scholar

Ukkonen, Esko. On-line construction of suffix trees. Algorithmica, 14(3):249–260, 1995.Google Scholar

Ulitsky, Igor, Burstein, David, Tuller, Tamir, and Chor, Benny. The average common substring approach to phylogenomic reconstruction. Journal of Computational Biology, 13(2):336–350, 2006.Google Scholar

Uricaru, Raluca, Rizk, Guillaume, Lacroix, Vincent, et al. Reference-free detection of isolated SNPs. Nucleic Acids Research, 43(2):e11–e11, 2014.Google Scholar

Valenzuela, Daniel, Norri, Tuukka, Välimäki, Niko, Pitkänen, Esa, and Mäkinen, Veli. Towards pan-genome read alignment to improve variation calling. BMC Genomics, 19: Article number 87, 2018.Google Scholar

Välimäki, Niko. Least random suffix/prefix matches in output-sensitive time. In 23rd Annual Symposium on Combinatorial Pattern Matching (CPM 2012), volume 7354 of Lecture Notes in Computer Science, pages 269–279. Springer, 2012.Google Scholar

Välimäki, Niko and Puglisi, Simon J.. Distributed string mining for high-throughput sequencing data. In 12th International Workshop on Algorithms in Bioinformatics (WABI 2012), volume 7534 of Lecture Notes in Computer Science, pages 441–452. Springer, 2012.Google Scholar

Välimäki, Niko and Rivals, Eric. Scalable and versatile k-mer indexing for high-throughput sequencing data. In Bioinformatics Research and Applications, pages 237–248. Springer, 2013.Google Scholar

Välimäki, Niko, Ladra, Susana, and Mäkinen, Veli. Approximate all-pairs suffix/prefix overlaps. Information and Computation, 213:49–58, 2012.Google Scholar

van Emde Boas, Peter. Preserving order in a forest in less than logarithmic time and linear space. Information Processing Letters, 6(3):80–82, 1977.Google Scholar

Vatinlen, Benedicte, Chauvet, Fabrice, Chrétienne, Philippe, and Mahey, Philippe. Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths. European Journal of Operational Research, 185(3):1390–1401, 2008.Google Scholar

Vazirani, Vijay V. Approximation Algorithms. Springer-Verlag. 2001.Google Scholar

Vinga, Susana and Almeida, Jonas. Alignment-free sequence comparison – A review. Bioinformatics, 19(4):513–523, 2003.CrossRef Google Scholar PubMed

Viterbi, Andrew. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2):260–269, 1967.Google Scholar

Volfovsky, Natalia, Haas, Brian, and Salzberg, Steven. A clustering method for repeat analysis in DNA sequences. Genome Biology, 2:RESEARCH0027, 02 2001.CrossRef Google Scholar PubMed

Wagner, Robert A., and Fischer, Michael J.. The string-to-string correction problem. Journal of the Association for Computing Machinery, 21:168–173, 1974.CrossRef Google Scholar

Wan, Lin, Reinert, Gesine, Sun, Fengzhu, and Waterman, Michael S.. Alignment-free sequence comparison (ii): Theoretical power of comparison statistics. Journal of Computational Biology, 17(11):1467–1490, 2010.CrossRef Google Scholar PubMed

Wandelt, Sebastian, Starlinger, Johannes, Bux, Marc, and Leser, Ulf. RCSI: Scalable similarity search in thousand(s) of genomes. PVLDB, 6(13):1534–1545, 2013.Google Scholar

Wang, Yi, Leung, Henry C. M., Yiu, Siu-Ming, and Chin, Francis Y. L.. MetaCluster 4.0: A novel binning algorithm for NGS reads and huge number of species. Journal of Computational Biology, 19(2):241–249, 2012a.CrossRef Google Scholar PubMed

Wang, Yi, Leung, Henry C. M., Yiu, Siu-Ming, and Chin, Francis Y. L.. MetaCluster 5.0: A two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics, 28(18):i356–i362, 2012b.Google Scholar

Wang, Yi, Leung, Henry C. M., Yiu, Siu M., and Chin, Francis Y. L.. MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning. BMC Genomics, 15 (Suppl 1):S12, 2014a.Google Scholar

Wang, Ying, Liu, Lin, Chen, Lina, Chen, Ting, and Sun, Fengzhu. Comparison of metatranscriptomic samples based on k-tuple frequencies. PLoS One, 9(1):e84348, 2014b.CrossRef Google Scholar PubMed

Weiner, P. Linear pattern matching algorithm. In Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory, pages 1–11, 1973.Google Scholar

Weintraub, Andres. A primal algorithm to solve network flow problems with convex costs. Management Science, 21(1):87–97, 1974.CrossRef Google Scholar

West, Douglas B. Introduction to Graph Theory, 2nd edition. Prentice Hall, 2000.Google Scholar

Williams, Lucia, Tomescu, Alexandru Ioan, and Mumey, Brendan. Flow decomposition with subpath constraints. IEEE/ACM Transactions on Computational Biology and Bioinformatics, pages 1–1, 2022.CrossRef Google Scholar

Willner, Dana, Thurber, Rebecca Vega, and Rohwer, Forest. Metagenomic signatures of 86 microbial and viral metagenomes. Environmental Microbiology, 11(7):1752–1766, 2009.CrossRef Google Scholar PubMed

WolframAlpha. www.wolframalpha.com, 2022. Last accessed 3 May 2022.Google Scholar

Yang, Xiao, Chockalingam, Sriram P., and Aluru, Srinivas. A survey of error-correction methods for next-generation sequencing. Briefings in Bioinformatics, 14(1):56–66, 2012.Google Scholar

Zerbino, Daniel and Birney, Ewan. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18:821, 2008.CrossRef Google Scholar

Ziv, Jacob. On finite memory universal data compression and classification of individual sequences. IEEE Transactions on Information Theory, 54(4):1626–1636, 2008.Google Scholar

Ziv, Jacob and Lempel, Abraham. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3):337–343, 1977.CrossRef Google Scholar

Book contents

References

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive