Table of Contents


Blast/Scan Caller

Introduction

This package exports the ExecScan method, which can be used to perform a BLAST or pattern scan from a FASTA or pattern file against one or more genomes. The output from the method is a series of n-tuples. The first element is a location object for the matching region in the input. The second element is a location object for the matching region in the target genome. This could be a real contig-based location or it could be a location inside an identified feature. The third element is the P-score of the match. For scans, this will always by 0. The fourth and fifth elements are the alignment length and the bit score. For scans, the alignment length will be the entire length and the bit score will be 0.

Data Structures

ToolTable

The tool table is a hash that provides useful information about each blast tool. The hash maps each tool name to a hash reference. The various fields in the hashes are as follows.

db_type

Type of database against which the tool runs: prot for a protein database and dna for a DNA database.

exec

Execution string for the tool. The variable $seqFile is presumed to be the location of the input sequence, $db is the directory, and $options are the user-specified options.

Public Methods

ExecScan

my @sims = ExecScan($fig, $seqFile, \@genomes, $tool, $options);

Call BLAST or SCAN to search for DNA sequences or features.

fig

A FIG-like object for accessing the data store.

seqFile

Name of a file containing the input sequence. This will either be a FASTA or a scan pattern.

genomes

A list of the IDs for the target genomes of the search.

tool

Name of the tool to use.

options

Options to pass to the tool, formatted for the command line.

RETURN

Returns a list of 5-tuples, each consisting of a location from the input, a location in one of the target genomes or features, a match score (with 0 being the best), the alignment length, and the bit score. For some tools, the bit score will be replaced by the matching text.

Blast Utilities

Canonize

my $newName = CallScanner::Canonize($name, $genomeID);

If the specified name is a contig ID, insure it has a genome ID in front of it.

name

Name to fix up.

genomeID

ID of the genome to be added to the contig ID, if necessary.

RETURN

Returns a fixed-up name.

VerifyDB

CallScanner::VerifyDB($db, $type);

Verify that the specified FASTA file has BLAST databases. If the databases do not exist, they will be created. If they are older than the FASTA file, they will be regenerated.

db

Name of the FASTA file.

type

Type of database desired: prot for protein and dna for DNA.