Pathogen Sequencing
Phylogenomic Outbreak Toolkit

pathoSPOT is an open-source bioinformatics pipeline that turns pathogen genome sequences sampled from patients into interactive visualizations of probable transmission scenarios.

Try it now

Click to explore these visualizations of methicillin-resistant Staphylococcus aureus collected over a 24 month period throughout the Mount Sinai Health System (preprint available).

Dates have been shifted during data anonymization.

Clustered heatmap of related genomes reveals outbreaks over multiple wards.

Figure 2 in Berbel Caban & Pak, et al.

A timeline of spatiotemporal movements and overlaps of patients in the largest cluster.

Figure 3 in Berbel Caban & Pak, et al.

Animated network diagram showing the spatial relationships among related genomes over time.

Figure S1 in Berbel Caban & Pak, et al.

A 7-day overlap in inpatient ward stays precedes a transmission event detected one year later.

Figure S3B in Berbel Caban & Pak, et al.

Interested in viruses? Also see our preprint on influenza A, where pathoSPOT characterized a nosocomial outbreak affecting 66 patients and healthcare workers and identified “patient zero.”

Get Started

You can use pathoSPOT to analyze your own pathogen genomes and create visualizations similar to the ones above. As a tutorial, we will reproduce the analysis in Berbal Caban & Pak et al. starting from the raw data (tar.gz).

This dataset contains FASTA sequences for 226 MRSA genomes, gene annotations in BED format, and a relational database (in SQLite format) with metadata for each genome (anonymized patient IDs, collection locations, healthcare encounters for each patient, and more).

pathoSPOT is designed to run on Linux; however, we provide a Vagrant configuration so that anybody, including Mac and Windows users, can quickly create a virtual machine (VM) that runs the pipeline either on their personal computer or on the Amazon EC2 cloud. We'll use VirtualBox to run the VM locally for this example; you'll need 5GB of disk space and 8GB of RAM.

To get started, install Vagrant and VirtualBox. Then open your terminal program and run the following commands:

$ git clone https://github.com/powerpak/pathospot-compare
Cloning into 'pathospot-compare'...
... More output ...
$ cd pathospot-compare
$ vagrant up
... Much more output ... may want to get some 
$ vagrant ssh

If everything worked, you should see vagrant@stretch:/vagrant$ which is a shell running on your brand new Linux VM. The VM has already downloaded the example dataset, which you will find inside /vagrant/example.

By default, the VM is configured to run a full analysis on those data, which you can kick off with:

$ rake all
... More output, takes 1/2 to 2 hours. The last line should be ...
WARN: re-invoking parsnp task since the mash clusters were rebuilt
$

When it's finished, open http://localhost:8989 in your browser. (It will look exactly like this website, except the Try it Now visualizations have now been built and are being served by your own VM.)

How it works

pathoSPOT is made of two components. One runs the comparative genomics analysis, and the other drives the visualization engine.

pathospot-compare

Prefilters FASTA sequences, clusters them by estimating nucleotide identity, and then creates multisequence alignments for each cluster of genomes to calculate SNP distances.

View pathospot-compare on GitHub.

pathospot-visualize

Converts calculated SNP distances and metadata on sample collection and patient movements to produce interactive heatmap and timeline visualizations viewable in a web browser.

View pathospot-visualize on GitHub.

Team

pathoSPOT is developed, used, and maintained by the Pathogen Surveillance Program and the Bakel Lab at the Icahn School of Medicine at Mount Sinai.

It currently supports active surveillance of transmissible pathogens in facilities throughout the Mount Sinai Health System.

Contributing developers include: Theodore Pak, Mitchell Sullivan, Harm van Bakel, and Elizabeth Webster. On GitHub, you can also view the list of contributors for each project.

Questions? Please contact Theodore Pak or Harm van Bakel.

Bugs? Please file an issue on our GitHub projects:

How do I cite this? If you use pathoSPOT for your own research, we would appreciate if you reference: