Finding gene fusions in RNA-Seq data


If you use FusionFinder in any of your work, please cite the following publication. Thank you!

Francis RW, Thompson-Wicking K, Carter KW, Anderson D, Kees UR, et al. (2012) FusionFinder: A Software Tool to Identify Expressed Gene Fusion Candidates from RNA-Seq Data. PLoS ONE 7(6): e39987. doi:10.1371/journal.pone.0039987



A fusion transcript is an aberrant RNA molecule comprising exonic sequence from two normally separate genes. They can be formed either by transcription of a fusion gene, following some sort of translocation event, or by trans-splicing. They have been implicated as the cause of both haematological malignancies, and solid tumours, including prostate, breast and lung cancers.

FusionFinder is a perl-based software package, which can be used to find fusion transcript candidates in RNA-Seq data.

System Requirements

The basic requirements to run FusionFinder are an operating system and some disk space for your input and output files. FusionFinder is written in and requires Perl and associated additional modules described below to run. The FusionFinder protocol also requires an aligner. We recommend Bowtie for this purpose, which can be obtained and installed from the link below. FusionFinder relies heavily on Ensembl and access to an Ensembl mirror is critical. If you wish you can install a local version which speeds processing immensely and instructions can be found below. MySQL is required if you want to install a local version of Ensembl.

Hardware requirements
OS (Windows, MacOSX, Linux)
Smaller datasets will work on 32-bit and 4GB memory
Larger datasets will require more memory and a 64-bit OS

Software requirements
1. Bowtie
2. Perl Modules (direct links below, or you can search at CPAN -
   Bio::Tools::Run::Alignment::Muscle (install C/CJ/CJFIELDS/BioPerl-Run-1.006900.tar.gz if using cpan)
3. Ensembl API (download the appropriate version for the reference data and the Ensembl database you are connecting to)
4. The multiple sequence aligner Muscle
5. MySQL
   A mysql client (including devel package if installing with yum etc) is required to connect to Ensembl
   MySQL server is required if you want to install a local version of Ensembl
6. Ensembl Database

   If you want to use the default UK Ensembl database then FusionFinder will use this by default.
   However, it is highly recommended to use the closest Ensembl resource to you in order to speed up processing.
   You can find some public Ensembl mirrors here
   or alternatively for ultimate performance you can easily install your own.
   Either way you need access to (at a very minimum) the Ensembl human Core database and the human Compara database.

   When using a mirror database, you will need to give the connection details (server hostname, username and password)
   in the FusionFinder configuration file.

   An example configuration file can be found here.

Downloading and Installing FusionFinder


You will need to download the software itself and some reference data

Reference data can be found below

The current version of FusionFinder is:

   FusionFinder version 1.2.1 - second public release stable version (29/07/2012)

Recent bug fixes

    Fixes applied in v1.2.1
  • Fixed a bug spotted by Richard Donovan where the user defined path to bowtie was not being used in all calls to bowtie.

Install FusionFinder

Once all system requirements are fullfilled and you have downloaded FusionFinder, simply extract the contents to an appropriate local, or system accessible directory.

For example on Linux: unzip


An example configuration file provided here. The essential part of this file is to tell the scripts where your Ensembl API is. For those users who install a local Ensembl database or wish to point to a local Ensembl mirror simply modify the server hostname, username and password details in this file.


The documentation for each script used in the workflow can be found
online here - FF Manual
or as a PDF - FusionFinder1.2.manual.pdf
You can also access any of the documentation for each script by running the following at the command line:
perl --help

There are two perl scripts used in a complete analysis.
1. - Specifically searches for fusion candidates.
2. - Generates multiple alignments of selected fusions candidates.

Example FusionFinder Workflow

This is a simple workflow. The data file used in the examples can be found in the test data.

Step 1. Find candidate gene fusions in your read data --reads <fastq read file(s)> --config <config file> --cref <coding transcript reference file> --ncref <noncoding transcript reference file> --threads <number of threads to use>

eg --reads BCRABL1_testdata_reads.fq --cref human_62_coding --ncref human_62_noncoding --mp_cutoff 1 --config fusionfinder.cnf

Step 2. Generation of multiple alignments for interesting fusion candidates --readsfile <fusionfinder_reads_file.tsv> --g1 <G1 HGNC symbol> --g2 <G2 HGNC symbol> --config <config file>

eg --g1 BCR --g2 ABL1 --readsfile fusionfinder_reads.tsv --limit 20 --config fusionfinder.cnf


Test Read Data

The full dataset used in our paper is from published work by Levin and colleagues.
This data can be found below and represents the enriched dataset referred to in their manuscript
The total processing time for this dataset will depend on your system but will be approximately 3 hours with a local Ensembl database.

The levin dataset ~14 million 76mer reads (compressed 1.1GB; uncompressed 2.6GB)

A smaller subset of this dataset consisting of reads from a single fusion can be found below
The total processing time for this dataset will depend on your system but will be approximately 2 minutes with a local Ensembl database.

Subset test dataset ~85 thousand 76mer reads (compressed 6.2M; uncompressed 17MB)

Reference Data

These files contain both the coding and noncoding reference bowtie indices used in the protocol and the corresponding fasta files annotated in the listed Ensembl version

  • Latest (Ensembl 68)
    Coding and noncoding transcripts references
    • Fasta (compressed 58MB; uncompressed 275MB)
    • Bowtie Index (compressed 243MB; uncompressed 323MB)

Archive versions can be found here

Alternatively if you have access to an older version of Ensembl that you want to use, you can generate a custom reference sequence
using the script distributed with FusionFinder.

Contact us

FusionFinder was written by Richard Francis as part of his PhD in Bioinformatics at the University of Western Australia. Contact Us