pathoSPOT is an open-source bioinformatics pipeline that turns pathogen genome sequences sampled from patients into interactive visualizations of probable transmission scenarios.
Click to explore these visualizations of methicillin-resistant Staphylococcus aureus collected over a 24 month period throughout the Mount Sinai Health System (preprint available).
Dates have been shifted during data anonymization.
Clustered heatmap of related genomes reveals outbreaks over multiple wards.
Figure 2 in Berbel Caban & Pak, et al.
A timeline of spatiotemporal movements and overlaps of patients in the largest cluster.
Figure 3 in Berbel Caban & Pak, et al.
Animated network diagram showing the spatial relationships among related genomes over time.
Figure S1 in Berbel Caban & Pak, et al.
A 7-day overlap in inpatient ward stays precedes a transmission event detected one year later.
Figure S3B in Berbel Caban & Pak, et al.
Interested in viruses? Also see our preprint on influenza A, where pathoSPOT characterized a nosocomial outbreak affecting 66 patients and healthcare workers and identified “patient zero.”
You can use pathoSPOT to analyze your own pathogen genomes and create visualizations similar to the ones above. As a tutorial, we will reproduce the analysis in Berbal Caban & Pak et al. starting from the raw data (tar.gz).
This dataset contains FASTA sequences for 226 MRSA genomes, gene annotations in BED format, and a relational database (in SQLite format) with metadata for each genome (anonymized patient IDs, collection locations, healthcare encounters for each patient, and more).
pathoSPOT is designed to run on Linux; however, we provide a Vagrant configuration so that anybody, including Mac and Windows users, can quickly create a virtual machine (VM) that runs the pipeline either on their personal computer or on the Amazon EC2 cloud. We'll use VirtualBox to run the VM locally for this example; you'll need 5GB of disk space and 8GB of RAM.
$ git clone https://github.com/powerpak/pathospot-compare Cloning into 'pathospot-compare'... ... More output ... $ cd pathospot-compare $ vagrant up ... Much more output ... may want to get some ☕ $ vagrant ssh
If everything worked, you should see
vagrant@stretch:/vagrant$ which is a shell running on your brand new Linux VM. The VM has already downloaded the example dataset, which you will find inside
By default, the VM is configured to run a full analysis on those data, which you can kick off with:
$ rake all ... More output, takes 1/2 to 2 hours. The last line should be ... WARN: re-invoking parsnp task since the mash clusters were rebuilt $
pathoSPOT is made of two components. One runs the comparative genomics analysis, and the other drives the visualization engine.
Prefilters FASTA sequences, clusters them by estimating nucleotide identity, and then creates multisequence alignments for each cluster of genomes to calculate SNP distances.
Converts calculated SNP distances and metadata on sample collection and patient movements to produce interactive heatmap and timeline visualizations viewable in a web browser.
It currently supports active surveillance of transmissible pathogens in facilities throughout the Mount Sinai Health System.
Bugs? Please file an issue on our GitHub projects:
How do I cite this? If you use pathoSPOT for your own research, we would appreciate if you reference:
- Berbel Caban A, Pak TR, Obla A et al. 2020. PathoSPOT genomic surveillance reveals under the radar outbreaks of methicillin resistant S. aureus bloodstream infections. medRxiv (preprint). doi:10.1101/2020.05.11.20098103