FIGStatisticalModelsTable of Contents

FIGStatisticalModelsnewmeanmedianminmaxstdevFunctional Coupling ScoringEQUATIONfc_score2probabilityfc_bitscore2probabilityscore2success_problinear_logistic_predictionspline_matrixFIGStatisticalModelsA package of Statistical models for applying to FIG data.

At inception this was handling some methods developed by Jon McAuliffe, but feel free to add other methods here

newInstantiate the object. Not necessary, but nice

meanCalculate the mean on an arraye.g. my $mean=FIGStatisticalModels->mean(\@array);medianCalculate the median for an arraye.g. my $median=FIGStatisticalModels->median(\@array);This is the median, the middle number on a list.minReturn the minimum value in an array.e.g. my $min=FIGStatisticalModels->min($array);maxReturn the maximum value in an array.e.g. my $max=FIGStatisticalModels->max($array);stdeve.g. my $stdev=FIGStatisticalModels->stdev(\@array);my $stdev=FIGStatisticalModels->stdev([1,2,3,4]);Functional Coupling ScoringScoring direct functional coupling. These are some methods to score the direct functional coupling using a method developed by Jon McAuliffe and Ross Overbeek. It was coded by Rob Edwards. The approach uses a generalized additive model using a thin plate regression spline smoother with 5 degrees of freedom.

Many of these notes have been extracted directly from discussions between Ross, Jon, and Rob. I am including them here for clarity and posterity.

Offline, a matrix was computed whose rows correspond to values of the logscore input on a fine grid. In the ith row, each column corresponds to a basis function of the smoother, evaluated at the ith grid point.

To get the predicted success probability for a given log score, we linearly interpolate between the two nearest grid points, then pass the interpolant and the coefficient vector to a classical linear-logistic prediction routine.

EQUATIONThe formula can be expressed like this:

then for a given score (s) the probability of success is

where b1, b2, b3, and b4 are the basis functions of a thin-plate regression spline with 5 fixed degrees of freedom.

fc_score2probabilityConvert a functional coupling score based on E value to a probability

use: my $probabilty = $statistic_model->fc_score2probability($score);

fc_bitscore2probabilityConvert a BBH bit score and optional tough call flag to a probability. This uses the fourth column of the BBH data - the bit score normalized by length - as the basis of calculating a probability of being a good match.

use: my $probability = $statistic_model->fc_bitscore2probability($bitscore, $toughcall);

Toughcall is a boolean. True is if it meets this regexp if ($function =~ /aminotransferase|system|component|oxidase|regulator|cytochrome|specific|permease|transcriptional|transport|dehydrogenase/i)

score2success_probConvert a score to a probability of success

linear_logistic_predictionGiven the coefficients of a logistic regression fit and values for the covariates, return the corresponding predicted probability: take the inner product of covariates and coefficients, then pass the result through the inverse link function (the logistic function, h(u) = (1 + exp(-u))^{-1}).

spline_matrixThis data is the matrix that was calculated offline by Jon mcAuliffe using the thin plate regression spline smoother. This matrix was provided by Jon, and I just converted it to this array form for easier coding.