Request

To request a blog written on a specific topic, please email James@StatisticsSolutions.com with your suggestion. Thank you!
Showing posts with label Univariate Outliers. Show all posts
Showing posts with label Univariate Outliers. Show all posts

Tuesday, March 31, 2009

Univariate Outliers

Outliers can cause serious problems for any regression-based test such as SEM. Due to distance separation from the normal data swarm outliers tend to make the regression line deviate in the direction of the outlier. Outliers can appear in both univariate and multivariate situations but Tabachnick and Fidell (2001) suggest first assessing univariate outliers. Outliers can be assessed using bivariate scatterplots or boxplots. Examination of the boxplots revealed some variables have outliers that are between 1.5 and three box lengths from the top or bottom edge of the box. The boxplots are in Appendix A. The data were reexamined to check for accuracy. All values were within the expected range and appeared to be a legitimate part of the sample so will remain in the analysis.

Factor analysis has also been articulated by numerous seminal authors as a vital component of a model’s validity. Churchill (1979) stated, “Factor analysis can then be used to confirm whether the number of dimensions can be verified empirically” (p. 69). Nunnally (1978) also stated, “factor analysis is intimately involved with questions of validity…Factor analysis is at the heart of the measurement of psychological constructs” (pp. 112-113). The factor-analytic stage (EFA) therefore, is an assessment of one aspect of model validity.

Principle components analysis transforms the original measures into a smaller set of linear combinations with all of the variance being used (Pallant, 2003). Factor analysis (FA) is similar; however it uses a mathematical model and only analyzes shared variance. Tabachnick and Fidell (2001) described the difference between EFA and FA: “If you are interested in a theoretical solution uncontaminated by unique and error variability, FA is your choice. If on the other hand you want an empirical summary of he data set, PCA is the better choice” (p. 611). In addition, PCA yields components whereas FA provides factors. Sometimes they are used interchangeably. This study will follow the factor analysis guidelines of Mertler and Vannatta (2005).

There are two basic requirements to factor analysis: sample size and the strength of the relationship of the measures. The sample size of 286 is close to the 300 recommended by Tabachnick and Fidell (2001) and is sufficient. The authors also caution that a matrix that is factorable should include correlations in excess of .30. If none are found, reconsider use of factor analysis.

An inspection of the correlation matrix in Table 3 shows less than half the values < .3 as recommended by Tabachnick and Fidell (2001). Principal axis factoring with Promax rotation was performed to see if the items that were written for the survey to index the seven constructs (communication, commitment, organizational justice, organizational citizenship, affect based trust, cognition based trust and resistance to change) actually do hang together. That is, are the participants’ responses to the communication questions more similar to each other than their responses to the commitment items? The Kaiser-Meyer-Olkin (Kaiser, 1970, 1974) Measure of Sampling Adequacy (KMO) value of .891 exceeds the recommended value of .6 and the Bartlett’s Test of Sphericity (Bartlett, 1954) reached statistical significance (Chi-square = 11,880.86 (p < .001; df = 1431)) supporting the factorability of the correlation matrix. Examination of the correlation matrix revealed some collinearity (Determinant = 0.000) between variable pairs. In example, the correlation between cmit6 and comu4 was .72. The first analysis resulted in eleven factors, however, these made no theoretical sense, but were based on Eigenvalues greater than one. For instance, factor 11 included no variables. These results are shown in Appendix B. To determine which questions load on a factor, the cutoff of .278 was chosen (twice the significant correlation of a sample of 350 at the .01 level). However, when the question loading magnitude was much greater on another factor, the question was identified loading just on that factor. The variables from the following questions have been reverse coded: Organizational Justice (ORGJ-1), (ORG-2), (ORG-6), (ORG-7) (ORG-9), and (ORG-11). The variables from the following questions have been reverse coded: Resistance to Change (RCHG-2) and (RCHG-6). Click here for dissertation statistics help