Last updated:

Note: This tutorial was made for the purposes of running Roary the Cooper Lab’s beagle server. This is also still a working draft so please reach out if you get stuck or if anything is unclear.

Other resources: The GitHub repository for Roary can be found here.

  • The official website for Roary can be found here.
  • The publication on Roary can be found here.
  • Another Roary tutorial that I found very helpful is the Roary Pathogen Informatics Training by the Pathogen Informatics at Wellcome Sanger Institute.




Prepare GFFs with sequence information

Roary uses gff files with sequence information. Make sure that gff files contain the sequence information as not all gff files do.

Luckily, gff3 files generated from breseq run (found in /data/reference.gff3) contain sequence information so this can be used. Put the gff3 files in one directory (/home/nak177/roary/gff/)




Run Roary

Load the Roary module on beagle by:

module load roary/roary-3.13.0


Then change your directory to within the directory will the gff files:

cd /home/nak177/roary/gff/


Run roary:

roary -f output *.gff3


The wildcard (*) indicates to grab all gff files in my current directory with a gff3 file extension. Running this creates a directory called “output” in your current directory. The “gene_presence_absence.csv” should contain the translation of locus tags between different strains.

Note: if you also have miniconda loaded at the same time, you can run into an error when you use the wildcard (*). Make sure that you only have roary loaded when running it.