FigGFFTable of Contents

gff3_for_featurewrite_gff3GFFParserfeature_treenew()parse()parse_fasta()parse_featureparse_gff3_directive()parse_local_directive()parse_seed_directive()GFFFeaturecontig_length()contigs()fasta_data()methodsInformation about the source of the sequence.genome_id()genome_name()project()taxonomy()gff3_for_featureReturns the GFF3 information for a given feature.

The return is a pair ($contig_data, $fasta_sequences) that can be passed into write_gff3().

$contig_data is a hashref mapping a contig name to a list of GFF3 file line for the sequences in that contig.

write_gff3Write a set of gff3 per-contig data and fasta sequence data to a file or filehandle.

$genome is the genome these contigs are a part of $contig_list is a list of contig-data hashes as returned by gff_for_feature $fast_list is a list of fasta data strings.

GFFParserA parser for GFF3 files.

new()Instantiat my $fgff = GFFParser->new($fig);

parse()Takes a filename as an argument, and returns a file object.

The file object is a reference to a hash with the following keys
features_by_genom
An array of all the features in this genom
feature_inde
A hash with a key of the features by ID and the value being the GFFFeatur
feature
All the features in the genome, as an array with each element being a GFFFeature elemen
filenam
The filename of the file that was parse
fasta_dat
A hash with the key being the ID and the value being the sequence

This is method now stores the data internally, so you can then access the data as $fgff->features_by_genome->{ $fgff->feature_index->{ $fgff->features->{ $fgff->filename->{ $fgff->fasta_data->{ $fgff->contig_checksum->{ $fgff->genome_checksum->{ $fgff->contigs->{ $fgff->fig->{ =cut

sub pars my($self, $file) = @_;

my($fh, $close_handle);my $fobj = GFFFile->new($self->fig);$self->current_file($fobj);if (ref($file) ? (ref($file) eq 'GLOB'|| UNIVERSAL::isa($file, 'GLOB')|| UNIVERSAL::isa($file, 'IO::Handle')): (ref(\$file) eq 'GLOB')){$fh = $file;}else{if ($file =~ /\.gz$/) {open($fh, "gunzip -c $file |") or confess "can't open a pipe to gunzip $file"}else {open($fh, "<$file") or confess "Cannot open $file: $!";}$fobj->filename($file);$close_handle = 1;}## Start parsing by verifying this is a gff3 file.#$_ = <$fh>;if (m,^\#gff-version\t(\S+),){if ($1 != 3){confess "Invalid GFF File: version is not 3";}}## Now parse.#while (<$fh>){chomp;next unless ($_); # ignore empty lines## Check first for the fasta directive so we can run off and parse that# separately.#if (/^>/){$self->parse_fasta($fh, $_);last;}elsif (/^\#\#FASTA/){# print "Got fasta directive\n";$_ = <$fh>;chomp;$self->parse_fasta($fh, $_);last;}elsif (/^\#\s/){## comment.#next;}elsif (/^\#$/){## blank line starting with ##next;}elsif (/^\#\#(\S+)(?:\t(.*))?/){## GFF3 directive.#$self->parse_gff3_directive($1, $2);}elsif (/^\#(\S+)(?:\t(.*))?/){## Directive.#if (lc($1) eq "seed"){$self->parse_seed_directive($2);}else{$self->parse_local_directive($1, $2);}}elsif (/^([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)$/){$self->parse_feature($1, $2, $3, $4, $5, $6, $7, $8, $9);}else{die "bad line: '$_'\n";}}foreach my $k (qw[features_by_genome feature_index features filename fasta_data contig_checksum genome_checksum contigs]){$self->{$k}=$fobj->{$k};}return $fobj;}feature_treeGenerate and return a feature tree for the features in the GFF3 file. Most features have Parent/Child relationships, eg. an exon is a child of a gene, and a CDS is a child of an mRNA. This method will return the tree so that you can recurse up and down it.

parse_gff3_directive()Pases the directives within the files (e.g. headers, flags for FASTA, and so on).

parse_seed_directive()Parse out seed information that we hide in the headers, eg, project, name, taxid, and so on. These are our internal representations, but are generally treated as comments by other gff3 parsers

parse_local_directive()I haven't seen one of these :)

parse_featureReads a feature line and stuffs it into the right places, as appropriate.

parse_fasta()Read the fasta sequence into memory

GFFFeatureA GFFFeature that acceesses the data

methodsfig seqid source type start end score strand phase attributes genome fig_id

fasta_data()Get or set the fasta data. Given an id and some data will set the data for that id. Given an id will return the data for that id. Called without arguments will return a reference to a hash of sequences.

This means that if you give it an id and sequence it will return that sequence. Hmmm.

contigs()Add a contig to the list, or return a reference to an array of contigs

contig_length()Get or set the length of a specfic contig. my $length=$fob->contig_length($contig, $length) my $length=$fob->contig_length($contig);

Information about the source of the sequence.These are things that we have parsed out the GFF3 file, or want to add into the GFF3 file. We can use these methods to get or set them as required. In general, if a value is supplied that will be used as the new value.

genome_id()Get or set a genome id for this file.

genome_name()Get or set a genome id for this file.

project()Get or set the project.

taxonomy()Get or set the taxonomy