Introduction

Welcome to the documentation for the kallisto | bustools suite — a fast, lightweight, and flexible toolkit for RNA-seq quantification, designed to support both bulk and single-cell workflows.

The toolkit comprises three main components:

  • kallisto — performs pseudoalignment of sequencing reads to a reference transcriptome or target set. Instead of full alignment, pseudoalignment rapidly determines which transcripts are compatible with each read. This retains all information needed for accurate quantification while dramatically reducing compute time and resource requirements.

    For complex or custom assays, kallisto also supports seqspec, a compact, machine-readable format that describes barcode and UMI positions so kallisto can parse nonstandardread layouts.

    Input: sequencing reads, an index built from your reference targets, and (for single-cell data) an optional technology string or assay specification.

    Output: abundances, or, optionally, a BUS file (for barcode/UMI workflows).

  • bustools — processes BUS-format files generated by kallisto. It supports a modular workflow for single-cell data: extracting and correcting barcodes, deduplicating UMIs, performing error correction, and producing transcript compatibility count (TCC) or gene count matrices. For bulk data, it can be used for compatible workflows as needed.

  • kb-python — a wrapper that bundles kallisto and bustools, simplifying common tasks. kb-python automates reference index building, manages file organization, handles metadata, and streamlines workflows. It is especially helpful for new users, multi-sample projects, or pipelines that need reproducible and portable execution.

kallisto, bustools, and kb-python support a broad range of RNA-seq applications, including standard bulk RNA-seq, droplet-based single-cell RNA-seq, single-nucleus RNA-seq, and niche or customized assays via flexible technology definitions.

Background

The kallisto project began in August 2013 when Nicolas Bray (then a post-doc in the Pachter Lab) realized that for transcript quantification, full read alignment is not required — only read compatibility with transcripts matters. This insight led to the development of pseudoalignment, which laid the foundation for the first kallisto implementation. The method was described in:

  • Nicolas L. Bray, Harold Pimentel, Páll Melsted, and Lior Pachter. "Near-optimal probabilistic RNA-seq quantification." Nature Biotechnology, 34,525–527 (2016). doi:10.1038/nbt.3519

That work demonstrated that pseudoalignment is orders-of-magnitude faster than traditional alignment, while maintaining comparable accuracy.

As single-cell RNA-seq gained popularity, kallisto was extended through the introduction of the BUS (Barcode, UMI, Set) format and the bustools software. These developments were described in:

  • P. Melsted, V. Ntranos, and L. Pachter. "The Barcode, UMI, Set format and BUStools." Bioinformatics, btz279 (2019).

  • P. Melsted, M. Booeshaghi, F. Gao, J. Beltrame, H. Lu, K. Hjorleifsson, V. Gehring, and L. Pachter. "Modular and efficient pre-processing of single-cell RNA-seq." Nature Biotechnology, 39, 813–818 (2021).

These papers show how kallisto can generate BUS files from diverse single-cell protocols, and how bustools uses them for barcode error correction, UMI deduplication, and efficient generation of transcript compatibility or gene count matrices.

To support the growing diversity of single-cell genomics assays, the seqspec standard was introduced in 2024:

  • Booeshaghi, A. S., Chen, X., Pachter, L. "A machine-readable specification for genomics assays." Bioinformatics, 40, 4 (2024).

seqspec allows precise, reproducible description of read layouts, barcodes, UMIs, and adapters. kallisto (with kb-python or direct command-line use) supports seqspec — enabling many custom and emerging single-cell protocols to be processed in a unified framework.

More recently, significant updates to kallisto and bustools have improved performance, data structures, and support for new workflows (e.g., translated pseudoalignment). These updates were described in:

  • Hjörleifsson, K. E., Sullivan, D. K., Swarna, N. P., Holley, G., Melsted, P.,& Pachter, L. "Accurate quantification of single-cell and single-nucleus RNA-seq transcriptsusing distinguishing flanking k-mers." bioRxiv, 2022.

  • Sullivan, D. K., Min, K. H. (Joseph), Hjörleifsson, K. E., Luebbert, L.,Holley, G., Moses, L., Gustafsson, J., Bray, N. L., Pimentel, H., Booeshaghi,A. S., Melsted, P., & Pachter, L. "kallisto, bustools and kb-python for quantifying bulk, single-cell andsingle-nucleus RNA-seq." Nature Protocols (2024).

These enhancements help keep kallisto and bustools among the fastest, most flexible RNA-seq quantification tools available.

Thank you for using kallisto | bustools. We hope you find this documentation helpful, and welcome feedback or contributions via our GitHub repository.