# Prediction

Prediction enables users to predict a signal based on other correlated signals with linear and non-linear regression models.

## Using the **Prediction** Tool

**Signal to model: ** From the dropdown menu, select the signal to model from the details pane, pinned items or recently accessed sections.

**Input signals: **From the dropdown menu, select at least one signal to serve as independent variable(s) for the regression model.

**Training window: ** Select the fixed window of data on which the regression model will be based.

**Options:** Select the type of regression between Linear, Log, Polynomial (2^{nd} to 9^{th} degree) and Expanded Basis (called Automatic before R20.0.38.00). The Expanded Basis option applies a combination of linear, 2^{nd} and 3^{rd} polynomials and cross-products.

**Available outside this analysis: **Checking this box will allow this condition to be used in other workbench analysis and topics. It will be available for all Seeq users and will display in their search results. Consider using a unique name as the signal will be published to the global name space.

**Advanced Options:**

**Limit to condition within training window:**You can choose to limit the regression only to the data from within the training window and within a condition. For example, choose a condition that identifies when an operation is running and all data outside of the condition (when the unit is not running) will be ignored for the regression analysis.

**Regression method:**You can choose from one of several different regression algorithms:**Ordinary Least Squares (OLS)**can safely be used for most predictions. It calculates a linear regression by minimizing standard error with respect to the target signal. OLS's main disadvantage is that is does not correct for multicollinearity (where one input can be predicted from another input). In most cases, this is fine because the output still provides an accurate linear model. It becomes problematic in if one wishes to use some of the coefficients from the model individually rather than all of them together.*Must go through zero:*You can choose to force the regression to go through zero or the origin (0, 0). This may be used if you are reasonably certain the data you are modeling fits the selected scale and that the target truly is zero when all the inputs are zero (E.G. units produced based on conveyor belt speed). The Sum of Squares statistics are zero-based rather than average-based when this option is used, therefore the R-Squared value is not comparable to the R-Squared of other options.

**Ridge Regression**is one way to reduce issues from collinearity. Ridge includes a basis factor to scale some coefficients toward 0. The coefficients which are scaled are those which have less of an influence on the dependent variable.*Lambda:*The bias-variance trade-off tuning parameter. Values are typically between 0 and 1, but can approach infinity. The higher the lambda, the more strongly the coefficients will get scaled toward 0. A non-zero value is generally inadvisable unless you are dealing with issues from multicollinearity.

**Principal Component Analysis (PCA)**approaches the collinearity differently. Each of the (possibly collinear) input signals will be transformed into a set of (uncorrelated) principal components and are ranked. The worst n components are ignored. After this removal, the regression itself is performed.*Components to Keep*: Keep between 1 and the number of input components. Components are not necessarily the same as an input signal, but are linearly uncorrelated variables calculated from those input signals.

**Variable selection:**You can choose to filter away coefficients with P-values that are higher than the provided number. The P-value tests the impact of the "null hypothesis" for each coefficient. Values approaching 0 indicate that the data contributes significantly the model; values approaching 1 indicate that the contribution*could*be attributed to random noise. Coefficients with a P-value of 0.05 or less are generally considered "statistically significant" for predictions.

**Prediction Model: **You can view the Legend, Coefficients and Statistics for the resulting prediction model. You can also change the model type in "Options" and observe how the prediction model results change within the tool.

*New in R21.042:* adjusted R Squared can now be viewed alongside the standard R Squared calculation. The adjusted R squared value takes into account the number of independent variables used in the model. Use the adjusted R squared to better compare Prediction models with a different number of input variables.

*New in R22.0.44: *Any single input Predictions can be displayed on a Scatter Plot.

*New in R22.0.48: *The coefficients and statistics tables can now be copied and pasted outside of Seeq