DayOftheNewDan.com

AWK Linear Regression

December 26, 2012

A simple AWK program for calculating linear regression. Input data fields should be seperated by tabs, spaces, or commas (or any combination thereof).

BEGIN { FS = "[ ,\t]+" }
NF == 2 { x_sum += $1
          y_sum += $2
          xy_sum += $1*$2
          x2_sum += $1*$1
          num += 1
          x[NR] = $1
          y[NR] = $2
        }
END { mean_x = x_sum / num
      mean_y = y_sum / num
      mean_xy = xy_sum / num
      mean_x2 = x2_sum / num
      slope = (mean_xy - (mean_x*mean_y)) / (mean_x2 - (mean_x*mean_x))
      inter = mean_y - slope * mean_x
      for (i = num; i > 0; i--) {
          ss_total += (y[i] - mean_y)**2
          ss_residual += (y[i] - (slope * x[i] + inter))**2
      }
      r2 = 1 - (ss_residual / ss_total)
      printf("Slope      :  %g\n", slope)
      printf("Intercept  :  %g\n", inter)
      printf("R-Squared  :  %g\n", r2)
    }

I packaged this up into a shell script called fitline. Then I can run:

$ fitline
1, 1
2, 2
3, 3
Slope      :  1
Intercept  :  0
R-Squared  :  1

Tagged

programming awk

Related

← Text Balloon in a QGraphicsView Raspberry Pi Laser Cut Case →