GOREGRESS

NAME
SYNOPSIS
DESCRIPTION
BUGS

NAME

goregress −− perform multiple regression on variables in a Candis file

SYNOPSIS

goregress dep_var_data indep_var_data1 ...

DESCRIPTION

Goregress accepts a Candis file from the standard input, performs a multiple (or single) regression on the specified variables and returns the results the results added to the Candis file which is written to the standard output. In particular, it adjusts the a_j and b in the equation

f(x_j) = sum_j a_j*x_j + b

so as to minimize

chisquared = sum_i (sum_j a_j*x_ji + b - f_i)^2

where sum_j is over the independent variables and sum_i is over the sampled values of f and x_j. The command line argument "dep_var_data" is the name of the one-dimensional variable field containing the f_i while "indep_var_data1" is the name of the variable field containing the x_1i data, etc. Any (reasonable) number of "indep_var_data*" fields may be specified and each has the same size and shape as the "dep_var_data" field. Let’s assume that there are ns samples and nv independent variables.

The dimensions "iv1" and "iv2" are added to the Candis file. Each of these contain the values 0, 1, 2, 3, ... , nv - 1. In addition, the fields "avals", "bval", "rsq", "F", "xcorrs", and "dep_var_data-fit" are added. The first two are the fitted values of a_j and b. "rsq" the value of R^2, i.e., the percent of variance explained by the fit. "F" is the F statistic, equal to the explained variance over the unexplained variance for ns - 1 degrees of freedom in the numerator and ns - nv degrees of freedom in the denominator. F-test tables yield the probability of the correlation not being null. Finally, "xcorrs" has dimensions iv1 and iv2 and is a table of correlation coefficients between the various independent variables.

If bad data values occur in the input fields, they are ignored and the sample size is reduced accordingly.

BUGS

The names of the new variables might clash with those of existing variables, resulting in the failure of the program.