Vclust enables the fast and reliable analysis and clustering of millions of viral genomes.

Viral world organized

International research team develops speedbreaking analysis tool
Vclust enables the fast and reliable analysis and clustering of millions of viral genomes.
Graphic: Juliane Seeber (Gemini generated)
  • Life
  • Research

Published: | By: Juliane Seeber

Researchers from Cluster of Excellence “Balance of Microverse” at the University of Jena, in collaboration with international partners, have developed a new tool that significantly simplifies and accelerates the genetic analysis of viruses. The method, called Vclust, can analyze millions of viral genomes within just a few hours and group them accurately based on similarity—a level of performance not previously achieved in viromics research.

Modern microbiology is facing a data explosion. Through environmental sampling (metagenomics), millions of viral genomes and genome fragments are discovered every year. These data consist of long sequences of A, C, G, and T nucleotides, and it can be challenging to know if a sequence has been seen before, or whether it’s totally new. Until now, powerful tools to reliably compare and classify these sequences have been lacking.

Without fast and accurate comparison methods, it's tempting to assume that every sequence you find represents a completely new virus,” explains Prof. Bas Dutilh, research group leader and Microverse Professor. “With Vclust, we are finally giving researchers a tool that allows them to find out whether their viruses have been seen before.

Vclust integrates three tightly coordinated components to efficiently handle large-scale data analysis. The Kmer-db 2 module quickly detects related genomes based on short DNA fragments. In the next step, a newly developed algorithm called LZ-ANI precisely determines the genetic similarity between genomes—even if they are incomplete. Finally, the clustering tool Clusty automatically sorts the genomes into meaningful groups based on internationally recognized virus classification standards.

Compared to existing tools, Vclust not only works significantly faster but also delivers more reliable results—even when analyzing highly fragmented genetic material from environmental samples. The new algorithm produces clusters that closely match the classifications of the International Committee on Taxonomy of Viruses.

Practical Impact and Availability

In the future, Vclust could play a central role in cataloging new viruses, understanding their evolution, and supporting applications in medicine and biotechnology—from virus diagnostics to the targeted use of viruses to combat harmful bacteria.

The software is freely available as open-sourceExternal link and can also be used via a web serviceExternal link, even without access to local high-performance computing resources.

About the Project

This project is a collaboration between research institutions in Poland, Germany, and the Netherlands. In addition to funding from the European Union, the work was made possible through support from the Cluster of Excellence “Balance of the Microverse”External link at the University of Jena.

Information

Original publication:

Zielezinski A, Gudyś A, Barylski J, Siminski K, Rozwalak P, Dutilh BE, Deorowicz S (2025) Ultrafast and accurate sequence alignment and clustering of viral genomes. Nat Methods. https://doi.org/10.1038/s41592-025-02701-7External link

Contact:

Bas Dutilh, Prof. Dr
Head of the Research Group Viral Ecology and Omics
Professorship of Viral Ecology and Omics
Room 103
Rosalind-Franklin-Straße 1
07745 Jena Google Maps site planExternal link