ABySS for Intel® Xeon® Processors

Purpose

This code recipe describes how to get, build, and use the ABySS de novo assembly code for the Intel® Xeon® processor.

Introduction

Widespread adoption of massively parallel deoxyribonucleic acid (DNA) sequencing instruments has prompted the recent development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data generated from large-scale sequencing projects, such as the sequencing of individual human genomes to catalog natural genetic variation. To address this limitation, a de novo, parallel, paired-end sequence assembler - ABySS (Assembly By Short Sequences), was designed and developed for short reads.

The single-node version is useful for assembling genomes up to 100 Mbases in size. There is also a parallel version of ABySS implemented using MPI and capable of assembling larger genomes. The script abyss-pe will run a more comprehensive set of tools to process paired-end data.

The most current version of ABySS source files can be downloaded from http://www.bcgsc.ca/platform/bioinfo/software/abyss.

Support Software

Building the ABySS application requires the installation of the Boost* Graph Library and the Google* Sparsehash library. The current version of the Boost library can be found at www.boost.org; the current version of the Sparsehash library can be found at https://code.google.com/p/sparsehash/.

Build and Install Boost Graph Library

Download the current version of the software from www.boost.org.
Unzip and untar the downloaded file into a desired directory; change to the decompressed Boost directory.
Execute the bootstrap.sh shell script with the --prefix parameter for the parent directory in which to install the library files. If you have admin privileges, you may wish to put this in some global directory such as /usr/local.
```
$> ./bootstrap.sh --prefix=/usr/local
```
Install the library files using the b2 script.
```
$> sudo ./b2 install
```

Build and Install Sparsehash library

Download the current version of the software from https://code.google.com/p/sparsehash/.
Unzip and untar the downloaded file into a desired directory; change to the decompressed Sparsehash directory.
Use configure to set up build environment. The compiler and root directory for installation, among other options, can be given as command line arguments. For example,
```
$> ./configure CC=icc CXX=icpc --prefix=/usr/local 
```
Use make to build the library.
```
$> make
```
Install the library.
```
$> sudo make install
```

Build and Install the ABySS Software

Download the current version of the software from http://www.bcgsc.ca/platform/bioinfo/software/abyss.
Unzip and untar the downloaded file into a desired directory; change to the decompressed ABySS directory.
Use configure to set up build environment; include your compiler choice, the location of the include files for the Boost library, any compiler flags you want to use, and the root directory in which you want to install the ABySS built programs and documentation.
```
$> ./configure CC=icc CXX=icpc   --with-boost=/usr/local/include     \
             CPPFLAGS=-I/usr/local/include  --prefix=/usr/local 
```
Use make to build the library.
```
$> make
```
Install the library.
```
$> sudo make install     
```

Running ABySS on a Sample Yeast Dataset

As an example, the following steps describe how to download a short yeast dataset (ERR156523) and run the ABySS genome assembly software on the dataset.

Download the two paired-end data files from http://www.ebi.ac.uk/ena/data/view/ERR156523&display=html. Direct URLS for the files are:
```
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR156/ERR156523/ERR156523_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR156/ERR156523/ERR156523_2.fastq.gz 
```
- The files can be left as compressed FASTQ files or uncompressed for input to ABYSS
The following command will run the ABYSS executable with the data files from the previous step. This assumes that the command is run from the directory in which the data files reside; if this is not the case, add directory information to the file name parameters to locate these files from the execution directory.
```
$> ABYSS -k57  --coverage-hist=coverage.hist \
         -s err156523-bubbles.fa -o err156523-1.fa\
         ERR156523_1.fastq.gz ERR156523_2.fastq.gz
```
- The -k parameter sets the length of the kmers to be used in the de Bruijn graph. The default maximum is 64. A good setting is some value over half the length of the input reads
- The above command will generate the files coverage.hist, err156523-bubbles.fa, and err156523-1.fa.

Running ABySS Paired-End Analysis

An alternate execution that will utilize the ABYSS executable and many other tools in the ABySS suite is available when you have paired end data (as given above). This uses the abyss-pe script file.

The following command will run the abyss-pe script with the data files from the previous steps. This assumes that the command is run from the directory in which the data files reside; if this is not the case, add directory information to the file name parameters to locate these files from the execution directory

$> abyss-pe name=err156523 k=57 \
         in='ERR156523_1.fastq.gz ERR156523_2.fastq.gz'

The assembled contigs output will be stored in the file err156523-contigs.fa (using the name parameter from the command line).
The parameter in specifies the input files. The pair of read files must be named with the suffixes 1 and 2 to identify the first and second read. A single file with the paired reads interleaved could be used.

Build and Install the ABySS Software for Distributed Memory Execution

Download the current version of the software from http://www.bcgsc.ca/platform/bioinfo/software/abyss.
Unzip and untar the downloaded file into a desired directory; change to the decompressed ABySS directory.
Use configure to set up build environment; include your compiler choice, the location of the include files for the Boost library, the path to the home of the MPI files, any compiler flags you want to use, and the root directory in which you want to install the ABySS built programs and documentation
```
$> ./configure CC=mpiicc CXX=mpiicpc  --with-boost=/usr/local/include     \
            --with-mpi=$MPI_HOME_DIR  CPPFLAGS=-I/usr/local/include       \
            --prefix=/usr/local
```
- The above will use the Intel® MPI compiler script that employs the Intel® compiler. (The Open MPI library is the version recommended by the code authors.)
Use make to build the library.
```
$> make
```
Install the library.
```
$> sudo make install
```

Running the Distributed Memory Version of ABySS

The following instructions assume that you have downloaded the ERR156526 data files or some other appropriate data set.
The following command will run the ABYSS executable in a distributed memory fashion. The launch is done via the appropriate “run” command used by the MPI library that was used to build the application. The command assumes that it is run from the directory in which the data files reside; if this is not the case, add directory information to the file name parameters to locate these files from the execution directory.
```
$> mpirun -np 8 ABYSS-P -k57 --coverage-hist=coverage.hist  -s err156523-bubbles.fa  -o err156523-1.fa  ERR156523_1.fastq.gz ERR156523_2.fastq.gz
```
- The executable ABYSS-P is the MPI-enabled version of the ABYSS application.
- The above will start 8 processes (-np 8) and divide up the input reads from the data files. Each process will use its set of reads to construct the overall de Bruijn graph.

Running ABySS Paired-End Analysis on Distributed Memory platform

An alternate execution that will utilize the ABYSS-P executable and many other tools in the ABySS suite is available when you have paired end data (as given above). This uses the abyss-pe script file.

The following command will run the abyss-pe script using the ABYSS-P executable. (At the time of this writing, none of the other ABySS tools were written to run on multiple processes.)

$> abyss-pe name=err156523 k=57 np=8 \
         in='ERR156523_1.fastq.gz ERR156523_2.fastq.gz'

If compiled for distributed execution, you MUST NOT launch the script with something like “mpirun -np 8 abyss-pe”. The abyss-pe driver script will launch the MPI processes.

References

About the Authors

Clay Breshears

Clay Breshears
Life Sciences Software Architect

Dr. Clay Breshears is a Life Sciences Software Architect for the Intel® Health & Life Sciences group. He currently works to parallelize and optimize genomic and bioinformatics codes. Prior to this, Clay was a Courseware Architect on the Innovative Software Education team, specializing in multi-core and multithreaded programming and training and working with university faculty to incorporate parallel programming as a natural part of the curriculums in Computer Science and other computational science fields of study. During his time with ISE, Clay was the co-host of the popular weekly online show "Parallel Programming Talk."

Sunny Gogar

Sunny Gogar
Software Engineer

Sunny Gogar received a Master’s degree in Electrical and Computer Engineering from the University of Florida, Gainesville and a Bachelor’s degree in Electronics and Telecommunications from the University of Mumbai, India. He is currently a software engineer with Intel Corporation's Software and Services Group. His interests include parallel programming and optimization for Multi-core and Many-core Processor Architectures.

ABySS for Intel® Xeon® Processors

Purpose

Contents

Introduction

Support Software

Build and Install Boost Graph Library

Build and Install Sparsehash library

Build and Install the ABySS Software

Running ABySS on a Sample Yeast Dataset

Running ABySS Paired-End Analysis

Build and Install the ABySS Software for Distributed Memory Execution

Running the Distributed Memory Version of ABySS

Running ABySS Paired-End Analysis on Distributed Memory platform

References

About the Authors

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112