Request

To request a blog written on a specific topic, please email James@StatisticsSolutions.com with your suggestion. Thank you!

Tuesday, March 31, 2009

Cronbach’s Alpha Rule of Thumb

To examine reliability and internal consistency Cronbach’s alpha tests were conducted using the survey subscales: communication, organizational commitment, organizational justice, organizational citizenship, affect based trust, cognition based trust and resistance to change. George and Mallery (2003) suggest the following rules of thumb for evaluating alpha coefficients, “> .9 excellent, > .8 good, > .7 acceptable, > .6 questionable, > .5 poor, < .5 unacceptable.” The measure (alpha or α) assumes that items measuring the same thing will be highly correlated (Welch & Comer, 1988). Click here for dissertation statistics help

Number of Components to Extract

There are a number of ways to determine the number of factors to extract. The Kaiser criterion suggests that one should retain any factors with eigenvalues greater than one. Scanning the Total Variance Explained (see Appendix C), the first eleven components (λ or eigenvalues) are above one. The total variance explained for the first two components is 27.683 and 7.598 percent of the variance, respectively. The total variance explained for the first two factors is 5.281 percent. Total variance explained by the first eleven factors is 72.012 percent. The Kaiser technique is only reliable, however, when the number of variables is less than thirty and the communalities are greater than .7. Inspection of the communalities shows twenty three of the 54 communality coefficients are greater than .7. Communalities (h2) “represent how much of the variance of a measured variable was useful in delineating the extracted factors” (Thompson, 2004, p. 61). These communalities also represent the R-square (R2) between the factor scores (latent variable scores) and measured scores on the measured variable.

The Catell (1996) technique suggests keeping factors above the elbow in the Scree plot (see Figure 1 in Appendix B). In this study, the scree plot suggests two factors be retained. Breaks also appear for 3, 4, 5, and 7 components. The seven factor solution supports the theoretical model with seven factors. When this model was specified, however, the majority of the variables loaded onto the first factor and the model made no theoretical sense. Theoretical considerations supported a nine factor model. The pattern matrix is shown in Table 4.

The items cluster cleanly into the communication, commitment, citizenship, cognition based trust, and resistance to change factors. Nearly all the affect based trust measures hang together with the exception of abt6 (This person approaches his/her job with professionalism and dedication. (ABT-6)) which is strongly correlated with the items that measure cognition based trust. The variable, organizational justice, includes three subfactors, fairness, employee voice and justification. The first four items for fairness cluster together and appear to measure that concept. The questions are shown below. Survey questions Orgjv6, Orgjv7, Orgjv8 and OrgjJ11 cluster together to form the Employee Voice measure. Similarly, Orgjv5, Orgjv8, and OrgjJ10 load onto Justification. These subscales will be utilized as one scale, organization justice, in the computation of Cronbach’s alpha.

Click here for dissertation statistics help

Univariate Outliers

Outliers can cause serious problems for any regression-based test such as SEM. Due to distance separation from the normal data swarm outliers tend to make the regression line deviate in the direction of the outlier. Outliers can appear in both univariate and multivariate situations but Tabachnick and Fidell (2001) suggest first assessing univariate outliers. Outliers can be assessed using bivariate scatterplots or boxplots. Examination of the boxplots revealed some variables have outliers that are between 1.5 and three box lengths from the top or bottom edge of the box. The boxplots are in Appendix A. The data were reexamined to check for accuracy. All values were within the expected range and appeared to be a legitimate part of the sample so will remain in the analysis.

Factor analysis has also been articulated by numerous seminal authors as a vital component of a model’s validity. Churchill (1979) stated, “Factor analysis can then be used to confirm whether the number of dimensions can be verified empirically” (p. 69). Nunnally (1978) also stated, “factor analysis is intimately involved with questions of validity…Factor analysis is at the heart of the measurement of psychological constructs” (pp. 112-113). The factor-analytic stage (EFA) therefore, is an assessment of one aspect of model validity.

Principle components analysis transforms the original measures into a smaller set of linear combinations with all of the variance being used (Pallant, 2003). Factor analysis (FA) is similar; however it uses a mathematical model and only analyzes shared variance. Tabachnick and Fidell (2001) described the difference between EFA and FA: “If you are interested in a theoretical solution uncontaminated by unique and error variability, FA is your choice. If on the other hand you want an empirical summary of he data set, PCA is the better choice” (p. 611). In addition, PCA yields components whereas FA provides factors. Sometimes they are used interchangeably. This study will follow the factor analysis guidelines of Mertler and Vannatta (2005).

There are two basic requirements to factor analysis: sample size and the strength of the relationship of the measures. The sample size of 286 is close to the 300 recommended by Tabachnick and Fidell (2001) and is sufficient. The authors also caution that a matrix that is factorable should include correlations in excess of .30. If none are found, reconsider use of factor analysis.

An inspection of the correlation matrix in Table 3 shows less than half the values < .3 as recommended by Tabachnick and Fidell (2001). Principal axis factoring with Promax rotation was performed to see if the items that were written for the survey to index the seven constructs (communication, commitment, organizational justice, organizational citizenship, affect based trust, cognition based trust and resistance to change) actually do hang together. That is, are the participants’ responses to the communication questions more similar to each other than their responses to the commitment items? The Kaiser-Meyer-Olkin (Kaiser, 1970, 1974) Measure of Sampling Adequacy (KMO) value of .891 exceeds the recommended value of .6 and the Bartlett’s Test of Sphericity (Bartlett, 1954) reached statistical significance (Chi-square = 11,880.86 (p < .001; df = 1431)) supporting the factorability of the correlation matrix. Examination of the correlation matrix revealed some collinearity (Determinant = 0.000) between variable pairs. In example, the correlation between cmit6 and comu4 was .72. The first analysis resulted in eleven factors, however, these made no theoretical sense, but were based on Eigenvalues greater than one. For instance, factor 11 included no variables. These results are shown in Appendix B. To determine which questions load on a factor, the cutoff of .278 was chosen (twice the significant correlation of a sample of 350 at the .01 level). However, when the question loading magnitude was much greater on another factor, the question was identified loading just on that factor. The variables from the following questions have been reverse coded: Organizational Justice (ORGJ-1), (ORG-2), (ORG-6), (ORG-7) (ORG-9), and (ORG-11). The variables from the following questions have been reverse coded: Resistance to Change (RCHG-2) and (RCHG-6). Click here for dissertation statistics help

MVA Analysis

The SPSS missing value analysis (MVA) was used to analyze the data for both MAR and MCAR data loss using an expectation maximization technique. Little’s (Little & Rubin, 2002) MCAR resulted in a Chi-square = 1852.25 (p = 0.099; df = 1778). This significance denotes that the missing data is MCAR and the data loss pattern is not systematic.

The SPSS MVA module also incorporates an expectation-maximization (EM) algorithm for generation of imputed values used to fill in all the missing data. Since the data is MCAR, listwise deletion is a better alternative than pairwise deletion which may cause covariance matrix issues due to unequal numbers of cases (Kline, 2005).

The AMOS application is unique in that it can be used to analyze data that includes missing data. AMOS incorporates a special form of maximum likelihood estimation (Special ML) which partitions all cases with the same missing data patterns. Peters and Enders (2002) found that this method for analyzing datasets with incomplete data “outperformed traditional (available case) methods” (cited in Kline, 2005, p. 56). Tabachnick and Fidell (2001) suggest using both methods (with and without missing data) but favor the EM imputation method and listwise methods (if data is ignorable) over mean substitution or pairwise deletion. Tabachnick and Fidell (2001) state, “The decision about to handle missing data is important. At best, the decision is among several bad alternatives” (p. 59).

Caution should be exercised with any method using a dataset with a high percentage of missing values (> 5%). Nunnally and Bernstein (1994) suggest that when there is a high percentage of missing values any of these methods may be unsatisfactory. Incorporating listwise deletion may be the best option for MCAR data since EM imputation may cause distorted coefficients of association and correlations (Kalton & Kasprzyk, 1982). In the present data set, listwise deletion resulted in a final sample size of 286 respondents.

Click here for dissertation statistics help

MIssing Values

According to Tabachnick and Fidell (2001), “Missing data is one of the most pervasive problems in data analysis “ (p. 58). Missing data can have serious effects on the reliability, validity and generalizability of the data (Tabachnick & Fidell, 2001). Missing data can be indicative of lack of knowledge, fatigue or sensitivity, or interpretation by the respondent of the questionnaire relevance. When the number of missing cases is small (< 5%) it is common to exclude the cases from the analysis (Tabachnick & Fidell, 2001). In the present analysis, every variable is missing at least 16% of the responses. The univariate statistics are shown below in Table 2.

Before exploratory factor analysis it must be determined if missing data is systematic (represents bias) or is ignorable. Missing data also has other important ramifications, especially in factor analysis. Factor analysis using listwise deletion should not be conducted unless the missing data is at least missing completely at random (MCAR).

Normality

Data normality is focused on the premise that data is from one or more normally distributed populations. Characteristics of a distribution can be described by its moments (the average of its values which are then raised to a certain power). The normal distribution in standard form has a first moment (mean) of zero, a second moment (variance) of one, a third moment (skewness) of zero and a fourth moment (kurtosis) of three. Many statistical programs like SPSS subtract the three from the kurtosis value to normalize to zero for reporting purposes. These statistics are based on the distribution curve as a whole and not on individual cases. Data normality is usually focused on skewness and kurtosis which are measures of shape. A skewness and kurtosis of zero is indicative of a normal distribution. Skewness is associated with the symmetry of the distribution. Kurtosis is associated with how peaked or flat the distribution is. A kurtosis above zero is indicative of a peaked distribution while a negative value is indicative of a flat distribution. Some authors suggest that univariate values approaching at least 2.0 for skewness and 7.0 for kurtosis should be suspect (West et al., 1995; Yuan & Bentler, 1999). The descriptive statistics, including skewness and kurtosis are shown below in Table 1. Examination of the distributions indicated only one variable, cmit8 has a high negative skew, -2.179. Computing the log transformation reduced the skew to .851 and the kurtosis to.247. The transformed variable will be used in further analyses.

Screening of the Data

Careful analysis of data applicability after collection and before analysis is probably the most time-consuming part of data analysis (Tabachnick & Fidell, 2001). This step is, however, of utmost importance as it provides the foundation for any subsequent analysis and decision-making which rests on the accuracy of the data. Incorrect analysis of the data during purification, including EFA, and before conducting confirmatory SEM analysis may result in poor fitting models or, worse, models that are inadmissible.

Data screening is important when employing covariance-based techniques such as structural equation modelling where assumptions are stricter than for the standard t-test. Many of the parametric statistical tests (based on probability distribution theory) involved in this study assume that: (a) normally distributed data – the data are from a normally distributed population, (b) homogeneity of variance – the variances in correlational designs should be the same for each level of each variable, (c) interval data – data where the distance between any two points is the same and is assumed in this study for Likert data, and (d) independence – the data from each respondent has no effect on any other respondent’s scores.

Many of the common estimation methods in SEM (such as maximum-likelihood estimation) assume: (a) “all univariate distributions are normal, (b) joint distribution of any pair of the variables is bivariate normal, and (c) all bivariate scatterplots are linear and homoscedastic” (Kline, 2005, p. 49). Unfortunately, SPSS does not offer an assessment of multivariate normality but Field (2005) and others (Kline, 2005; Tabachnick & Fidell, 2001) recommend first assessing univariate normality. The data were checked for plausible ranges and examination was satisfactory. There were no data out of range.

Assessing Reliability and Validity of Constructs and Indicators

One of the most important advantages offered by latent-variable analyses is the opportunity that is provided to assess the reliability and validity of the study’s variables. In general, reliability refers to consistency of measurement; validity refers to the extent to which an instrument measures what it is intended to measure. For example, a survey is reliable if it provides essentially the same set of responses for a group of respondents upon repeated administration. Similarly, if a scale is developed to measure marketing effectiveness and scores on the scale do in fact reflect respondents’ underlying levels of marketing performance, then the scale is valid. For both reliability and validity, there are a number of different ways that they may be measured.

Indicator reliability. The reliability of an indicator (observed variable) is defined as the square of the correlation (squared multiple correlation or SMC) between a latent factor and that indicator. For instance, looking at Table 1, the standardized loading for the path between Sympathique and F1 is 0.970 and the reliability is 0.939. Looking at the range of indicator reliabilities, many have relative high reliabilities (0.6 and above), however, several have really low reliabilities, like Effacee with an indicator reliability of 0.313.

Composite reliability has been computed for each latent factor included in the model. This index is similar to coefficient and reflects the internal consistency of the indicators measuring a particular factor (Fornell and Larcker, 1981). Both the composite reliability and the variance extracted estimates are shown in Table 1. Fornell and Larcker (1981) recommend a minimum composite reliability of .60. An examination of the composite reliabilities revealed that all meet that minimum acceptable level.

The variance extracted estimates assesses the amount of variance that is explained by an underlying factor in relation to the amount of variance due to measurement error. For instance, the variance estimate for F1 was 0.838, meaning that 83.8% of the variance is explained by the F1 construct, and 16.2% is due to measurement error. Fornell and Larcker (1981) suggest that constructs should exhibit estimates of .50 or larger. Estimates less than .50 indicate that variance due to measurement error is larger than the variance captured by the factor. The variance extracted estimates all meet this minimum threshold, so the validity of the latent construct as well as the associated constructs is acceptable. It should also be noted that Hatcher (1994), cautions that the variances extracted estimate test is conservative; reliabilities can be acceptable even if variances extracted estimates are less than .50.

Convergent validity is present when different instruments are used to measure the same construct and scores from these different instruments are strongly correlated. In contrast, discriminant validity is present when different instruments are used to measure different constructs and the measures of these different constructs are weakly correlated.

In the present study, convergent validity was assessed by reviewing the t-tests for the factor loadings. If all the factor loadings for the indicators were greater than twice their standard errors, the parameter estimates demonstrated convergent validity. That all t-tests are significant showed that all indicators were effectively measuring the same construct (Anderson & Gerbing, 1988). Consider the convergent validity of the ten indicators that measure F1. The results show that the t-values for these ten indicators range from -14.480 to 18.510. These results support the convergent validity of Sympathique, Desagreable, Amicale, Souple, Severe, Autoritaire, Compatissante, au coeur tender, Spontanée, Distante, and Attentive aux autres as measures of F1.

Discriminant validity was assessed through the use of variance extracted test. Constructs were evaluated by comparing the variance extracted estimates for two factors, and then compared with the square of the correlation between the two factors. Discriminant validity is demonstrated if both variance extracted estimates are greater than the squared correlation. In the present study, the correlation between the factors F1 and F2 was 0.154; the squared correlation was 0.024. The correlations and squared correlations are shown in Table 2. The variance extracted estimate was 0.838 for F1 and 0.666 for F2. Because the variance extracted estimates are greater than the square of the interfactor correlation, the test supports the discriminant validity of these two factors. Examination of the other variance extracted estimates and squared correlation coefficients supported discriminant validity within the model.


References

Anderson, J.C. & Gerbing, D.W. (1988). Structural equation modeling in practice: A
review and recommended two-step approach. Psychological Bulletin, 103, 411-423.

Bollen, K.A. (1989). Structural equations with latent variables. New York: John Wiley
& Sons.

Fornell, C. & Larcker, D.F. (1981). Evaluating structural equation models with
unobservable variables and measurement error. Journal of Marketing Research, 18,
39-50.

Hatcher, L. (1994) A step-by-step approach to using SAS for factor analysis and
structural equation modeling. Cary, NC: SAS Institute Inc.

Jöreskog, K.G. & Sörbom, D. (1989). LISREL 7: A guide to the program and
application, 2nd edition. Chicago: SPSS Inc.

Dissertation Statistics Samples

Everybody knows how tiring it is to write a dissertation. Completing a dissertation is the final and most important task in receiving your degree. A dissertation is a lengthy piece of discourse imparting new views, approaches and findings on any topic from any field. One does not really feel good about facing hurdles and problems while going for their degree, and writing a dissertation is the biggest hurdle one can ever face. The task of authoring a dissertation is very complex, and it demands dedication and hard work on the part of students. Students learn and study the entire year, yet it is often in the writing of a dissertation that they have many, many unanswered questions. It is very unfortunate that at such crucial times, faculty and advisors are unavailable and help is very scarce. It is in these times— when faculty and advisors are not available, when unanswered questions about dissertations continually discourage students, when the pressure continues to mount on students as the work in writing their dissertations becomes more and more tedious and time consuming— that statistics samples can rescue struggling students in need of help and guidance.

A dissertation statistic sample is a tool that can help as it is much like an example of another dissertation. They are copies of approved and successful dissertations statistics samples. A dissertation statistics sample is the help that every dissertation-writing student requires to overcome his/her stress of attaining a degree. Dissertation statistics samples can be acquired from various services that cater to this particular need of dissertation statistics sample. Getting your dissertation is ensured when you have some dissertation statistics samples in your grasp. Additionally, dissertation statistics samples are very helpful for those “first timers” who have never written a dissertation before.

Though students are well read and perform research the entire year, they can struggle in deciding upon a particular topic on which to base their dissertation. Dissertation statistics samples provide these students with many examples and the samples can help them come up with an idea for a topic. Then, the dissertation statistics samples give them ideas about the dissertation proposals, which involve extensive research on the part of the students if they do not have any guidance from dissertation statistics samples. Dissertation statistics samples are dissertations that have been approved, signed and published by proper authorities, and they are always helpful to students. They are helpful because after reviewing the sample dissertation statistics samples, students will be better able to write their own dissertation and check for faults in their own ideas.

While students get step-by-step guidance from dissertation statistics samples, they also get constructive ideas as to how to enhance the uniqueness of their own dissertation. Consulting dissertation statistics samples does not mean that students are copying ideas and theory from existing dissertation statistics samples. Rather, it means that by having a look at dissertation statistics samples, students get ideas as to how to make their own dissertation an error free dissertation. Dissertation Statistics samples basically help the students in doing the statistics part of their dissertation. With the help of dissertation statistics samples, students can inquire about and interpret their statistical issues with an approved form of guidance. Through dissertation statistics samples, students can approach their own dissertations with a new and unique perspective. It helps them to edit and proofread their own dissertation, and finally, it helps them to feel confident that their own dissertation will be accepted and approved.

Consulting dissertation statistics samples is becoming more and more popular. Students must perform at top level and to do so, they must complete a top-notch dissertation. In order to finish in a timely manner, they must seek professional help and guidance. Using dissertation statistics samples is one such means of getting help. With this guidance, students will no-doubt achieve what they sought out to achieve when they began their studies.

Thursday, March 26, 2009

Run Test of Randomness

Run test of randomness is a non parametric test that is widely used to test the randomness of a sample. While run test of randomness is a sufficient test, it does not necessarily give an exact result in all cases. For example, this can be used in the stock market. If we want to test whether prices of a particular company are behaving randomly or if there are any patterns in the price of that company, we can use the run test of randomness. Or, in another case, if we want to test whether or not a sample is independent of each other or if the sample has any pattern, we can use the run test of randomness. Thus, the test for such problems is called the Run test of randomness.

Key concept and terms:

Run: Run is basically the newly assigned value to a part of a particular series. For instance, if in a sample M= male and F=female, the first 22 responses in that sample might come as MMMMFFFMFFFFMFFFFMMMFF. Starting from MMMM and ending with FF, there are 8 runs in this example. Basically, run test for randomness assumes binary value for that particular series. Run test for randomness assumes that the value of the binary variable must be equal to 2 binary values or more than 2 values. In SPSS, run test of randomness can test many values in a single time but that value must be numeric or we should convert them into numeric form.

Run Test: Run test is based on the law of probability. Run test of randomness can be performed in SPSS very easily. SPSS computes observer run and gives a critical value for run. We can compare that observed value with the computed critical value. SPSS shows two tailed test value by default. For a small sample, binary variable exact test is available to test its significance. For the larger sample, Monte Carlo estimation gives the significant value to test its randomness.

Cut Point: The algorithm of Run test for randomness divides the series through cut points. We can select cut point mean, median, or mode to specify as a custom point.

Type of significance estimate: Run test of randomness significance can be easily tested by using an exact button in SPSS available in run test.

Assumptions in Run test of randomness:

1. Data order: run test of randomness assumes that data is entered in order (not grouped).

2. Numeric data: Run test of randomness assumes that data is in numeric form. This is a compulsory condition for run test, because in numeric form, it is easy to assign run to that particular value.

3. Data Level: In run test or randomness we assume that data should be in order. But if data is not in ordered form, then the researcher has to assign a value. These values are one of the following: mean, median, mode or a cut point. By assigning one of these values, data can be ordered.

4. Distribution: Run test of randomness is a non-parametric test. Hence this test does not assume any distribution like any other parametric test.

Run Test in SPSS: Run test in SPSS is available in a non-parametric test in the analysis menu. By selecting this option, we will drag the variable in to the test variable list and select the “cut point” option. After clicking the “ok” button, the result will come in front of us. By examining the significance value, we can accept or reject the null hypothesis.

Click here for further assistance.