Finding gene fusions in RNA-Seq data
Francis RW, Thompson-Wicking K, Carter KW, Anderson D, Kees UR, et al. (2012) FusionFinder: A Software Tool to Identify Expressed Gene Fusion Candidates from RNA-Seq Data. PLoS ONE 7(6): e39987. doi:10.1371/journal.pone.0039987
A fusion transcript is an aberrant RNA molecule comprising exonic sequence from two normally separate genes. They can be formed either by transcription of a fusion gene, following some sort of translocation event, or by trans-splicing. They have been implicated as the cause of both haematological malignancies, and solid tumours, including prostate, breast and lung cancers.
FusionFinder is a perl-based software package, which can be used to find fusion transcript candidates in RNA-Seq data.
The basic requirements to run FusionFinder are an operating system and some disk space for your input and output files. FusionFinder is written in and requires Perl and associated additional modules described below to run. The FusionFinder protocol also requires an aligner. We recommend Bowtie for this purpose, which can be obtained and installed from the link below. FusionFinder relies heavily on Ensembl and access to an Ensembl mirror is critical. If you wish you can install a local version which speeds processing immensely and instructions can be found below. MySQL is required if you want to install a local version of Ensembl.
OS (Windows, MacOSX, Linux)
Smaller datasets will work on 32-bit and 4GB memory
Larger datasets will require more memory and a 64-bit OS
2. Perl Modules (direct links below, or you can search at CPAN - http://search.cpan.org
Bio::Tools::Run::Alignment::Muscle (install C/CJ/CJFIELDS/BioPerl-Run-1.006900.tar.gz if using cpan)
3. Ensembl API (download the appropriate version for the reference data and the Ensembl database you are connecting to)
4. The multiple sequence aligner Muscle
A mysql client (including devel package if installing with yum etc) is required to connect to Ensembl
MySQL server is required if you want to install a local version of Ensembl
6. Ensembl Database
If you want to use the default UK Ensembl database then FusionFinder will use this by default.
However, it is highly recommended to use the closest Ensembl resource to you in order to speed up processing.
You can find some public Ensembl mirrors here
or alternatively for ultimate performance you can easily install your own.
Either way you need access to (at a very minimum) the Ensembl human Core database and the human Compara database.
When using a mirror database, you will need to give the connection details (server hostname, username and password)
in the FusionFinder configuration file.
An example configuration file can be found here.
Downloading and Installing FusionFinder
You will need to download the software itself and some reference data
Reference data can be found below
The current version of FusionFinder is:
FusionFinder version 1.2.1 - second public release stable version (29/07/2012)
Recent bug fixes
- Fixes applied in v1.2.1
- Fixed a bug spotted by Richard Donovan where the user defined path to bowtie was not being used in all calls to bowtie.
Once all system requirements are fullfilled and you have downloaded FusionFinder, simply extract the contents to an appropriate local, or system accessible directory.
For example on Linux: unzip FusionFinder_v1.2.zip
An example configuration file provided here. The essential part of this file is to tell the scripts where your Ensembl API is. For those users who install a local Ensembl database or wish to point to a local Ensembl mirror simply modify the server hostname, username and password details in this file.
The documentation for each script used in the workflow can be found
online here - FF Manual
or as a PDF - FusionFinder1.2.manual.pdf
You can also access any of the documentation for each script by running the following at the command line:
perl script_name.pl --help
There are two perl scripts used in a complete analysis.
1. fusionfinder.pl - Specifically searches for fusion candidates.
2. make_alignments.pl - Generates multiple alignments of selected fusions candidates.
Example FusionFinder Workflow
This is a simple workflow. The data file used in the examples can be found in the test data.
Step 1. Find candidate gene fusions in your read data
fusionfinder.pl --reads <fastq read file(s)> --config <config file> --cref <coding transcript reference file> --ncref <noncoding transcript reference file> --threads <number of threads to use>
eg fusionfinder.pl --reads BCRABL1_testdata_reads.fq --cref human_62_coding --ncref human_62_noncoding --mp_cutoff 1 --config fusionfinder.cnf
Step 2. Generation of multiple alignments for interesting fusion candidates
make_alignments.pl --readsfile <fusionfinder_reads_file.tsv> --g1 <G1 HGNC symbol> --g2 <G2 HGNC symbol> --config <config file>
eg make_alignments.pl --g1 BCR --g2 ABL1 --readsfile fusionfinder_reads.tsv --limit 20 --config fusionfinder.cnf
Test Read Data
The full dataset used in our paper is from published work by Levin and colleagues.
This data can be found below and represents the enriched dataset referred to in their manuscript
The total processing time for this dataset will depend on your system but will be approximately 3 hours with a local Ensembl database.
The levin dataset ~14 million 76mer reads (compressed 1.1GB; uncompressed 2.6GB)
A smaller subset of this dataset consisting of reads from a single fusion can be found below
The total processing time for this dataset will depend on your system but will be approximately 2 minutes with a local Ensembl database.
Subset test dataset ~85 thousand 76mer reads (compressed 6.2M; uncompressed 17MB)
These files contain both the coding and noncoding reference bowtie indices used in the protocol and the corresponding fasta files annotated in the listed Ensembl version
- Latest (Ensembl 68)
Coding and noncoding transcripts references
Alternatively if you have access to an older version of Ensembl that you want to use, you can generate a custom reference sequence
using the make_reftrans.pl script distributed with FusionFinder.