LeastSquares Help

leastSquares does regression on data supplied by the user. It can do the typical linear regression to find the "best" straight line of the form y = a * x + b that minimizes the differences between the straight line and the data points. The coefficients a and b are stored in the variables below the canvas. It also calculates R2 which is stored in the variable c

The parameter a is the slope of the regression line which is often called m discussions of linear regression. b is the y intercept.

It also can do a nontraditional nonlinear curve fit on a certain class of two parameter curves that can be "linearized". For example, suppose that it appears that the data points might fit an exponential curve of the for y = eax + b. One can "linearize" the data by use of the transformation log y. (This technique is suggested in Applied Numerical Analysis by Curtis F. Gerald, Adison-Wesley Publishing Company, ©1970, page 287, for example) The transformed data can be processed by a the traditional linear regression techniques. After finding the a and b, the nonlinear curve y = expa * x + b (at least in some sense) is a good estimate for the data. Again the variables a and b along with R2 for the linearized line are stored as the values a, b, and c. It should be pointed out that because of the properties of exponents, this also fits curves of form y = beax although the value of b will be different.

The program as 5 modes of operation:
1. Linear (the traditional linear regression)
2. exp(ax + b) as in the above example
3. ln(ax + b) where ln represents the natural log
4. Other where the user can supply the desired formulas for linearization and fitting the curve.
5. Data only scatter chart. (This allows looking at the data in order determine what kinds regression might be appropriate.

Modes 2 and 3 illustrate linearization techniques. y = exp(ax + b) can be linearized by taking the ln of y. On the other hand, y = ln(ax + b) can linearized by taking ey. Basically the idea is to find an "inverse" of the desired curve. Another example is y = sqrt(ax + b) which can be linearized squaring y. The following is table has some of the possibilities.

curve formulalinearization formula Comment
exp(a * x + b)Math.log(y)
Math.log(a * x + b)exp(y)1
sqrt(a * x + b)y**21
(a * x + b) ** 2sqrt(y)2
1/(a * x + b)1/y3
sin(a * x + b) asin(y)4
asin(a * x + b)sin(y)5
tan(a * x + b)atan(y)4
sinh(a * x + b)asinh(y)
10(ax + b)Math.log10(y)1

Notes:
1: a * x + b > 0
2: y >= 0 and the slope of a * x + b cannot change sign, i.e. all values must either left or right of the vertex
3: y values cannot be 0
4: all x values must satisfy -π/2 ≤ a * x + b ≤ π/2 (radians) or -90 ≤ a * x + b ≤ 90 (degrees), i.e. all values must be in a half period where the sin is increasing or decreasing.
5: y values must correspond to a half period where the sin function is increasing or decreasing.

Most of these examples appear as examples in the apt.

When the linearization formula is applied to curve, the result must be a * x + b. As one might guess from the above examples, the linearization formula is an "inverse" of the curve function. Moreover, the function/inverse function must be 1 to 1, i.e. the curve function must be increasing or decreasing in domain specified by the x values.

These examples are rather restrictive. For example, for the (a * x + b) ** 2 example the parabola is always concave up and the minimum is always 0. Fortunately, the least squares apt has to variables d and e (together with a slider s that allow generalization. For example, consider
d * (a * x + b) ** 2 + e
The multiplier d is normally +1 or -1. When it is +1, the curve formula allows for a concave up parabola as before. But when it is -1, the curve formula provides a concave down parabola. The variable e pushes the curve up (when positive) or down (when negative). In fact, it is either the minimum or maximum of the parabola. The following table list a few possibilities for using these variables.

curve formulalinearization formula Comment
d * (a * x + b) ** 2 + esqrt(d * (y - e))6
e * sin(a * x + b)asin(y/e)7
d/(a * x + b) + ed/(y - e)8

Notes
6: When d is +1, the parabola is concave up, when it is -1, the curve is concave down. e pushes the curve up or down and is the minimum or maximum. If e is too small or too large (depending on the concavity) some y values may be out of range and are ignored. When this happens, it is best to adjust the value of e accordingly so all the data is used.
7: e is the amplitude of the sine wave. (One could use d for inverting the curve but it isn't really needed because of periodicity of the sin curve.) a helps determine the period and b determines the phase shift. If e is too small, some of the data points may be out of range for asin(y/e) and are ignored. When this happens, it is appropriate to increase the value of e.
8: When d is -1, the curve is inverted. e pushes the curve up or down and is the horizontal asymptote. Unfortunately the formula works well when the data is exact but if the data is slightly random, the result is very unstable and hence is formula is not recommended.

The variable e can be set in three different ways. The most obvious is to just type the value in to the e text box. However, in the examples, e can be set with the s slider because e is given a formula something like (s - 50)/2. The values of s are always integers and can go from 0 to 100 so this example e can go from -25 to +25. When a formula such as this is used, one can also just type a value 0 to 100 into the s text box.

Data values can be typed into the x values and y values text boxes or one can browse local files to inport those data values. Either way it is possible to use expressions like 1/2 or sqrt(5) and can use variables d, e and s.

This graphics for this apt are based on the grapher apt although many of the input boxes are used in different, dedicated ways.

How does one get the curve and linear formulas into the apt?

There are three way to enter the formulas:

File data - browse

One can use this button to input x and y values from a file which is particular helpful when there are lots of data points. Each point is on a separate line using the format
    x value , y value
For example, the point (3, 5) would appear as "3, 5".

The values from the file are stored in the "x values" and "y values" text boxes.

Select the desired output option and the Exact, and Random Buttons

These items appear only in the "exp(ax + b)", "ln(ax + b)" and "Other" modes.

When using linearization of nonlinear curves, there are 3 output options because the original curve and the linearization line may have quite different scaling:

When using linearization, several things can go wrong. The curve and the linearization formulas may not match properly. The x values might not be appropriate for those formulas even if the formulas match. In order to make sure the curve and linearization formulas agree and the x values are appropriate, LeastSquares provides a couple of tests.

Before using the "Exact" button, make sure you have appropriate x values. (The y values are irrelevant because they will be replaced.) When one clicks the "Exact" button, LeastSquares generate a y value for each x value using the formula y = 2x + 3. The curve generated should pass through each of the plotted points, a should be 2, b should be 3 and c (R2) should be 1 (at least to many decimal points).

When the "Random" button is pressed, the apt does the same thing except after the y values are calculated, they are randomized by multiplying that value by a random value between 0.9 and +1.1 . This time the a, b, and c values should be fairly close to 2, 3, and 1 but not exactly equal to those values. Each time the button is clicked, a different set of y values will be produced and the values will change a little.

Using your data with formulas from one of the examples.

You can replace the example's x and y values by your own values either by typing them in their text box or reading them from a file. If you already have the data points in text boxes, you can select an example's formulas by selecting the example's "Formulas only: no data" radio button.

The Functions Boxes

See Options for special features. below.

Many of function boxes have dedicated purposes.

  1. x values: x values are separated by commas.
  2. y values: y values are separated by commas.
  3. Fit to: Automatically is "a * x + b" which is used to plot the (linearized) regression line.
  4. f3: Not dedicated for the "Linear mode" or "Scatter chart mode" but for the other 3 modes, it automatically stores the linearized values of the x and y values.
  5. f4: Not dedicated for the "Linear mode" or "Scatter chart mode" but for the other 3 modes, it automatically stores the curve formula.
  6. f5: Not dedicated. Can be used to plot a function (use x as the independent variable) or for special features such as labels as in many of the examples.

Writing the Functions

When writing functions: The operational signs are +, -, *, /, % and **. You can use ( and ) to control the order of operations and with functions. They can also be used in almost all input expressions.

The % operator means modulus. That is, if we have a % b we would divide a by b as integers and the result is the remainder. For example, 13 % 3 = 1 as 13 divided by 3 is 4 with a remainder of 1.   Some more examples: 11 % 5 = 1.   215 % 10 = 5

The ** operator means "to the power". Thus, 3**2 = 9, 2**3 = 8, and 10**3 = 1000.

You can use some constants: PI (π = 3.141592653589793), QUARTER_PI, HALF_PI, TWO_PI, Math.E (e = 2.718281828459045).

You can use some values: a, b, c, d, e, and s. Variables a to c have a special use and should not be used for input. They will be explained in the values section.

You may be able to use minX, maxX, minY, maxY for the minimum and maximum allowed values for x and y although these values may change if not specified or if equal spacing is checked.

Some of the functions that can be used follow. (Note: some of these functions normally are not appropriate when defining data.)

Order of evaluation:

  1. Inside () -- first
  2. functions
  3. **
  4. *, / and % (left to right in case of ties)
  5. + and - (left to right in case of ties) -- last

Illegal values like sqrt(-2) and 1/0 are ignored.

After you finish typing a function, press "Enter".

Formulas can contain comments using // and /*...*/.

Special Options

There are 7 special options of which Words is normally the only one that is normally useful in LeastSquares. These options can be used to add labels to the graph. If you want a label you can put it in one of the undedicated function boxes as illustrated in several of the examples.

Show additional special options.

Minimums and Maximums

Minimum x and Maximum x are the left and right end points of the x axis. If the input boxes are blank, the max and min is set by domain of the x values.
Minimum y and Maximum y, if specified, are the end points of the y axis. If these fields are blank, appropriate values will be calculated automatically based on the y values of the function.

If "Equal spacing" is true (checked) then the mins and maxs for x and y will be adjusted as needed in order to make the equal spacing possible.

Note: After you finish typing a minimum or maximum, press "Enter". The minimum and maximum values can be formulas which are evaluated the same way functions and values are evaluated. Just don't use x, t, or o in the formulas.

Zoom In and Zoom Out

Sometimes it is useful to zoom in to see a portion of the plot more closely. The Zoom In button does this. The domain is reduced by one half and centered at the last x value. If both the y minimum and x maximum are specified the range is reduced in a similar manner.

Zoom out increases the domain by a factor of 2.

Equal Spacing

When Equal Spacing is checked, the units on the x and y axis are the same so a 45o line would look like a 45o line and circles will look like circles.

Allow Motion

For the sake of efficiency, LeastSquares stops redrawing plots when nothing is changed. Checking Allow motion says to keep drawing. However, this is rarely useful in LeastSquares.

Show enlarged canvas and enlargement ratio

The standard canvas is medium sized to allow the user see the canvas and many of the controls at the same time without having to scroll. This is convenient while entering data and picking a mode. But a larger plot is drawn above the original canvas and controls when the "Show enlarged canvas" item is checked or the "Enlargement ratio" is changed. When using this option, one will normally have to scroll up to see the larger plot and scroll down to see the original canvas and the control items. The enlarged canvas shows a magnified image of the original canvas except text elements are the same size as in the original.

Notes:

Angle Measurement

The Radians and Degrees radio buttons allow the user to specify if the trig functions and trig inverse functions assume radians or degrees. The default is radians.

Values

You can supply values for undedicated values. a is dedicated to the slope of the regression line. b is dedicated to the y intercept of the that line.

c is reserved for R2 which is measure of how good the data fits the regression line. The values range from 0 to 1 where 0 means the data and regression are completely unrelated. 1 means the data fits the regression line exactly.

d, e and s are not dedicated and can be used as desired. They are useful in special examples as illustrated in the introduction.

The value of a variable is 0 when its input box is blank. They are evaluated the same way functions are evaluated. Just don't use x, t, or o in the formulas. See below for the special s value. You can use the value of one of the variables in another. For example, the expressions for a and b might be s/10 and 2 * d. Because one can use formulas for variables a, ...,e, the numerical values for these variables (rounded to 3 decimal places) is shown below the values.

The s Value and its Slider

The slider moves from 0 to 100 and is always an integer. If you move the slider, its value is sent to the s value input box. One can type a value from 0 to 100 in the s value input box to adjust the slider to that value. Anything else typed into that s value box is ignored and will be replaced by the value of the slider.

If the value range of the slider is not appropriate you can do something like the following: f1(x) = b * sin(x) where b value is set to s/10.

Important note: When "Allow motion" is false (not checked) the value of s is set when the mouse releases the slider but the value of s does not track the slider while the slider is moving. However, if "Allow motion" is true (checked) (and at least one function has been defined) the value of s tracks the slider.

Number of Evaluations

The initial value of 200 is almost always adequate as that means the functions are evaluated at about every other pixel in the x direction.

Show Current Setup

This button is useful if you have your own copy of LeastSquares on your computer. If you set up a problem that you want to use again in another session, you can click this button and dialog box will show the information needed for the current plot to add it to the setupExamples function in the leastSquares.pjs. You can copy the information and paste it into the function. You will have to provide an unique example number and a name for the graph. Formulas are shown in the order of the example numbers.

If a "," is included in a "word" special option w in function boxes or as a data separator in the x and y value boxes, it will be replaced by "^$". The substitution is required because "," are used to separate items in the data. However, this is not a problem because the "^$" is automatically replaced by "," when used in a example.

Save as a Temp Example

If you click this button, the current setup saved as a new example which added to the list of examples. You will be required to supply a name for the new example. The temporary example will only be available for the rest of current session.

Save as an Image File Button

After finishing a plot, it can be saved as an image file. Click the button. Depending on your browser, you may be asked if you want to save the file. In any case, you will probably find it your normal download folder with name leastSquares.jpg. If you save more than one time, the multiple copies will normally be numbered. Your browser may have a button to give direct access to this folder.

Error Messages

When the user types invalid info into one of the input boxes, a message is displayed on the top of the plot area. After fixing error, normally the error message will be hidden. Occasionally it may be necessary to click anywhere in the plot area to hide the message.



James Brink, 6/15/2021