Request

To request a blog written on a specific topic, please email James@StatisticsSolutions.com with your suggestion. Thank you!

Thursday, March 19, 2009

ANCOVA Assumptions

While examining the differences in the mean values of the dependent variable related to the effect of controlled independent variables, it becomes necessary to take into account the influence of uncontrolled independent variables. In such cases, Analysis of Covariance (ANCOVA) assumptions are used. In ANCOVA, assumptions include at least one categorical independent variable and at least one interval or metric independent variable. The categorical independent variable is called a factor, whereas the metric independent variable is called a covariate.

In ANCOVA assumptions, the most common use of covariate is to remove extraneous variations from the dependent variable. This is because in ANCOVA assumptions, the effect of factors is of major concern.

Like ANOVA, ANCOVA assumptions have similar assumptions. These assumptions are as follows:

The variance that is being analyzed or estimated should be independent, which also holds true for ANCOVA assumptions.

In ANOVA, the variable which is dependent in nature must have the same variance in each category of the independent variable. In the case of more than one independent variable, the variance must be homogeneous in nature, within each cells formed by the independent categorical variables, which also holds true for ANCOVA assumptions.

In ANOVA, it is assumed that the data upon which the significance test is conducted is obtained by random sampling, which also holds true for ANCOVA assumptions.

When analysis of variance is conducted on two or more factors, interactions can arise. An interaction occurs when the effect of independent variables on a dependent variable is different for different categories, or levels of another independent variable. If the interaction is significant, then the interaction may be ordinal or disordinal. Disordinal interaction may be of a no crossover or crossover type. In the case of the balanced designs, while conducting ANCOVA assumptions, the relative importance of factors in explaining the variation in the dependent variable is measured by omega squared. Multiple comparisons in the form of a priori or a posteriori contrast can be used for examining differences among specific means in ANCOVA assumptions.

ANCOVA assumptions also assume some other assumptions apart from the assumptions made in ANOVA.

In ANCOVA assumptions, the adjusted treatment means the computed or the estimated are based on the fact that the variable by covariate interaction is negligible. If this ANCOVA assumption is violated, then the adjustment of the response variable to a common value of the covariate will be misleading.

ANCOVA assumptions combine with the assumption of linear regression. The method of ANCOVA assumptions is done by using a linear regression. So in ANCOVA assumptions, the relationship between the independent and dependent variable must be linear in the parameters. Thus, in ANCOVA assumptions, the different levels of the independent variable will follow normal distribution with mean zero.

ANCOVA assumptions also assume the homogeneity of regression coefficients which is based on the fact that the regression coefficient for every group present in the data of the independent variable should be same. If this fact of ANCOVA assumptions is violated, then the ANCOVA assumption will be misleading.

Click here for dissertation statistics help.

Wednesday, March 18, 2009

Validity

Validity means accurate or error free conclusion(s) from the data. Technically, we can say that a measure leads to valid conclusion from a sample that can be taken as valid inference about the population. When we talk about validity, we are talking about four major types:

1. Internal validity

2. External validity

3. Statistical conclusion validity

4. Construct validity

Internal validity: When the relationship between the variable is casual, than it is called internal validity. Internal validity refers to the casual relationship between the dependent and the independent variable. In internal validity, we are concerned with the factor responsible for change in the dependent variable. It is related to the design of the experiment, such as when it is used for the random assignment of treatments.

External validity: External validity is when there is a casual relationship between the cause and effect that can be generalized or transferred to different people, to different treatment variables and to different measurement variables.

Statistical conclusion validity: Statistical conclusion validity occurs when we talk about the inference about the degree of the relationship between the two variables. For example, it is used when two variables are studied and we want to draw a conclusion about the strength of the relationship between the variables. When we arrive at the correct decision about the strength of the relationship for both of the variables, then it is said to be statistical conclusion validity. Statistical conclusion validity has two major types of errors:

Type one error: Type one error occurs when we accept the hypothesis, but that hypothesis is inaccurate. It also occurs when we say that there is a relationship between the two variables but in reality there is no relationship between them.

Type two error: Type two error occurs when we reject the hypothesis that is true or when there is no relationship between variables, yet we say that the relationship exists.

Power analysis is used to detect the relationship in statistical conclusion validity. When we are using statistical conclusion validity, we come across several problems. One of these problems occurs when we use a small sample size. In a small sample size, there is a possibility that the result will not be accurate. To overcome this problem, the sample size should be increased. Violation of the statistical assumption is also a threat for statistical validity. If we use a biased value in analysis, then the results may not be accurate. If the wrong statistical test is applied, then the conclusion may not be accurate.

Construct validity: Construct validity is when the construct is involved in predicating the relationship for the dependent variable. For example, in structural equation modeling, when we draw the construct, then we assume that the factor loading for the construct should be greater than .7. Cronbach's alpha is used to draw the construct validity. .60 is considered acceptable for exploratory purposes, .70 is considered adequate for confirmatory purposes, and .80 is considered good for confirmatory purposes. If the construct satisfies the above assumption, then the construct will contribute in predicting the relationship for dependent variables. Convergent/divergent validation and factor analysis is also used to test construct validity.

Relationship between reliability and validity: A test that is unreliable cannot be valid and a test that is valid must be reliable. Reliability is necessary but not a sufficient condition for validity. Thus, validity plays a major role in analysis and in making accurate decisions.

The following are overall validity threats:

1. Insufficient data collected to make valid conclusions

2. Measurement done with too few measurement variables

3. Too much variation in data or outlier in data

4. Wrong selection of sample

5. Inaccurate measurement method taken for analysis

Click here for more assistance with Validity.

Thursday, March 12, 2009

Mann-Whitney in SPSS

Mann-Whitney in SPSS is the most widely used non-parametric test, and it is used as an alternative to the t-test. In the Mann-Whitney in SPSS, we do not make assumptions about the parent population as in the t-test. Mann-Whitney in SPSS tests that the two sample populations are equivalent in location. The observations from both the groups are combined together and are ranked. In the case of ties in the Mann-Whitney in SPSS, the average rank is obtained. In the Mann-Whitney in SPSS, one should keep the number of ties relatively small in relation to the total number of observations.

If the populations are identical in location, the ranks should be randomly mixed between the two samples. The test calculates the number of times a score from group 1 precedes a score from group 2, and the number of times a score in group 2 precedes a score from group 1. The value of the Mann-Whitney in SPSS is the one that comes out to be the smaller of these two numbers.

While conducting the Mann-Whitney in SPSS, one needs to perform the following options: go to “Analyze menu” and click on the “non parametric tests” option, select the “Two independent sample tests” option, and select the test type (Mann-Whitney in this case).

The following are the operations which SPSS does while calculating the Mann-Whitney in SPSS:

· We rank the cases in order of increasing size, and the test statistic U, which indicates the number of times that a score from group 1 preceded a score from group 2.

· We can compute an exact level of significance if there are fewer cases. When there are more than just a few cases, we transform U into Z statistic, and a normal approximation p value is computed.

· A test statistic is then calculated for each variable.

In order to compute the Mann-Whitney in SPSS, the following actions need to be performed:

Let xi (i=1…n1) and yj (j=1…n2) be an independent sample of n1 and n2 from the population probability density f ( ) and f2 ( ) respectively. If we want to test the null hypothesis H1 : f1 ( ) = f2 ( ), let T be the sum of ranks of the y’s in the combined ordered sample. The test statistic U is defined in terms of T as follows:

U= n1 n2+ n2 (n2 + 1)/2 – T

If T is significantly larger or smaller, then the null hypothesis is rejected. The problem is finding the distribution of T under null hypothesis. Unfortunately, it is very troublesome to obtain the distribution of T under null hypothesis. However, Mann-Whitney in SPSS has obtained the distribution of T for small n1 and n2 and has shown that T is asymptotically normal. It has been established that under null hypothesis, U is asymptotically normally distributed as N (µ, σ2), where

µ=E (U) = n1 n2/2 and σ2= V (U) = n1 n2(n1 + n2+1)/12.

Asymptotic normal means that the true parameter approaches the normal distribution as the size of the sample increases.

Here, Z= U-µ/ σ which is asymptotically normal with mean 0 and variance 1.The approximation of Mann-Whitney in SPSS is fairly good if both n1 and n2 are greater than 8. This means that the size of the two independent samples should be greater than 8, and then only the approximation given by Mann-Whitney in SPSS is true.

The Asymptotic Relative Efficiency (ARE) of the Mann-Whitney in SPSS is relative to the two sample t-tests, which is greater than or equal to 0.864. For a normal population, the ARE is 0.955. Accordingly, the Mann-Whitney in SPSS is regarded as the best non parametric test for location. Asymptotic Relative Efficiency (ARE) means that it is the limit of the relative efficiency, as the size of the sample increases.

Click here for further assistance with SPSS.


Independent and Dependent Variables

In order to know independent and dependent variables, one should know what variables are. Variables are properties or characteristics of some event, object, or person that can take on different values or amounts. When researchers are conducting research, they often manipulate variables.

Now, let us discuss independent and dependent variables in detail:

Independent variable(s) are the variables or alternatives that are manipulated (i.e. the level of these variables are changed by a researcher) and whose effects are measured and compared. They are also called Predictor(s), as they predict the values of the dependent variable or predicted variables in the model. In layman’s language, the independent variable is a variable that stands alone and is not changed by the other variable one is trying to measure. For example, while looking at someone’s age, variables like what a person eats, how much he watches television etc... do not change the person’s age. That is why they are called the other variables. In fact, when one is looking for some kind of a relationship between the variables, then one is trying to see if the independent variable causes some kind of change in the other variables.

The other variable(s) can also be called dependent variable(s). As the name suggests, they are the variables that measure the effect of the independent variable(s) on the test units. In layman’s language, the dependent variables are the variables which are completely dependent on the independent variable(s). They are also called Predicted variable(s) as they are the values to be predicted or assumed by the predictor / independent variables. For example, a student’s score could be a dependent variable because it could change depending on several factors such as how much he studied, how much sleep he got the night before he took the test, or even how hungry he was when he took the test. Usually, when one is looking for a relationship between two things, one is trying to find out what makes the dependent variable change the way it does.

Independent variables are also called “regressors,” “controlled variable,” “manipulated variable,” “explanatory variable,” “exposure variable,” and/or “input variable.” Similarly, dependent variables are also called "response variable," "regressand," "measured variable," "observed variable," "responding variable," "explained variable," "outcome variable," "experimental variable," and/or "output variable."

A few examples can highlight the importance and usage of dependent and independent variables in a broader sense:

If one wants to measure the influence of different quantities of nutrient intake on the growth of an infant, then the amount of nutrient intake can be the independent variable, while the dependent variable can be the growth of an infant measured by height, weight or other factor(s) as per the requirement of the experiment.

If one wants to estimate the cost of living for an individual, then factors such as salary, age, marital status etc. are independent variables. The cost of living for a person is highly dependent on such factors, hence can be designated as the dependent variable.

In the case of the time series analysis, forecasting a price value of a particular commodity is again dependent on various factors as per the study. Suppose we want to forecast the value of gold, for example. In such an instance, the seasonal factor can be an independent variable on which the price value of gold will depend.

In the case of a poor performance of a student in an examination, the independent variables can be the factors like the student not attending classes regularly, the student having poor memory etc., which can reflect the grade of the student. Here, the dependent variable is the test score of the student.

For assistance with your dissertation statistics click here.

Tuesday, March 10, 2009

Sample Dissertation Statistics


A sample of dissertation statistics is a template that dissertation and thesis students can use to help present their findings. This template can be invaluable to students when it comes to working on, writing and finishing a dissertation. This can be especially useful as writing a dissertation is not an easy task and it requires hard work combined with efficiency on the part of the student. And while the most important thing for a graduate or doctoral candidate is the completion of his or her dissertation, the dissertation comes at the end of an academic period, a time when everything seems important yet impossible to finish. Though they may have studied and researched throughout the year, students face problems making sense of their own ideas, even when it comes to deciding the very topic of their dissertation. One aspect of a quantitative dissertation is the Dissertation Statistics. Most students are “first timers” and need thorough and expert guidance from someone skilled. What’s more, teachers may not be available to the students all the time, and in fact, may have even less time at the end of the semester. Dissertation statistics samples can be a blessing for students because most students are not familiar with the writing and formatting skills of thesis research and writing. Dissertation statistics samples can be used by the students as a reference.

Dissertation Statistics Samples can be used for writing the dissertation proposal as well as the actual dissertation itself. Not only can dissertation statistics samples help by assisting the student in deciding a topic, dissertation statistics samples can also be useful when it comes to the terminology and writing style that should be used for the student’s dissertation. Looking at dissertation statistics samples is practical as the samples provide an idea of the research and writing methodology as well as examples of the construction of other parts of the entire dissertation. This can be extremely valuable as it can help in increasing the overall quality and reliability of the student’s own dissertation.

In considering the benefits of consulting dissertation statistics samples, a student may wonder if it is worth it and if his or her dissertation will truly benefit from the use of the dissertation statistics samples. However, this student must remember that the task of writing a dissertation can become much easier with the use of samples, which can provide useful insight in terms of common pitfalls to avoid. Additionally, a student writing a dissertation is likely to have many concerns about things such as graphs, tables, calculations, literature review, citation formats, etc… and the dissertation statistics samples can easily provide solutions to such problems, as long as they come from a reliable source.

Dissertation statistics samples can provide help in many different ways. To begin with, they provide a benchmark on which the student’s entire dissertation can be based. This makes the entire research project much less intimidating. Dissertation statistics samples can also be used as templates to help edit and “clean-up” an already prepared piece of text. It can do the same for tables, charts and graphs. In regards to the interpretation of all aspects of the dissertation, dissertation statistics samples consulting can advise and guide the student in terms of theme, style and format. On another note, they can also lend suggestions as to which statistical procedures are most useful to a specific kind of dissertation. Most importantly, dissertation statistics samples can help the students find their errors and they can suggest improvements. Finally, because dissertation statistics samples come from a reliable source and are in principle, pre-approved, students get the benefit of having a good standard to live up to.

In consulting dissertation statistics samples, students should remember that in terms of guidance, it is most useful to consult more than one dissertation statistics sample. However, using too many dissertation statistics samples can also lead to information overload. It is best to browse through a few dissertation statistics samples and select the right areas and sections for reference. Also, it is important to remember that though dissertation statistics samples can provide the much-needed guidelines, they cannot provide the actual content.

Tuesday, February 17, 2009

Cox Event History

Cox event history is a branch of statistics that deals mainly with death in biological organisms and failure of mechanical systems. It is also sometimes referred to as a statistical method for analyzing survival data. Cox event history is also known by the various other names such as survival analysis, duration analysis or transition analysis. Generally speaking, this technique involves modeling of data structured in a time to event format. The outcome of this analysis is to understand the probability of the occurrence of an event. This technique was primarily developed for medical and biological sciences. However, Cox Event history is frequently used rapidly in engineering as well as statistical and data analysis.

One of the key purposes of the Cox event history technique is to explain the causes behind the differences or similarities between the events encountered by subjects. For instance, Cox regression may be used to evaluate why certain individuals are at a higher risk of encountering some diseases than others. It can thus be effectively applied to studying acute or chronic diseases, hence the interest in medical science. The Cox event history model mainly focuses on the hazard function, which produces the probabilities of an event occurring randomly at any time or at a specific period or instance in time.

The basic Cox event history model can be summarized by the following function:

h(t) = h0(t)e(b1X1 + b2X2 + K + bnXn)

Where; h(t) = rate of hazard

h0(t) = baseline hazard function

bX’s = coefficients and covariates.

Cox event history mainly can be categorized under three models- nonparametric, semi-parametric and parametric.

Non-parametric: The non-parametric model does not make any assumptions about the hazard function or the variables affecting it. Hence, only a limited number of variable types can be handled with the help of a non-parametric model. This type of model involves the analysis of empirical data showing changes over a period of time and cannot not handle continuous variables.

Semi-parametric: Just like the non-parametric model, the semi-parametric model also does not make any assumptions about the shape of the hazard function or the variables affecting it. What makes this model different is that it assumes that the rate of the hazard is proportional over a period of time. The estimates for the hazard function shape can be derived empirically as well. Multivariate analyses are supported by semi-parametric models and are often considered a more reliable fitting method for choice in Cox event history analysis.

Parametric: In this model, the shape of the hazard function and the variables affecting it are determined in advance. Multivariate analyses of discrete and continuous explanatory variables is supported by the parametric model. However, if the hazard function shape is incorrectly estimated, then there are chances that the results could be biased. Parametric models are frequently used to analyze the nature of time dependency. It is also particularly useful for predictive modeling because the shape of the baseline hazard function can be determined correctly by the parametric model.

Cox event history analysis involves the use of certain assumptions. Like with every other statistical method or technique, if an assumption is violated, it will often lead to the results not being statistically reliable. The major assumption is that in using Cox event history, with the passage of time, independent variables do not interact with each other. In other words, the independent variables should have a constant hazard rate over time.

In addition, Hazard rates are rarely smooth in reality. Frequently they need to be smoothened in order for them to be useful for Cox Event History analysis.

Applications of Cox Event History

Cox event history can be applied in many fields although initially it was used primarily in medical and other biological sciences. Today it’s an excellent tool for other applications, frequently used as a statistical method where the dependent variables are categorical, especially in socio-economic analyses. For instance, in the field of economics, Cox event history is used extensively to relate macro or micro economic indicators in terms of a time series. For instance, figuring out the relationship between unemployment or employment with time. In addition, in commercial applications, Cox event history can be applied to estimate the lifespan of a certain machine and break down points based on historical data.

Tuesday, January 27, 2009

Multiple Regression

The term Multiple regression was first used by Pearson in 1908. Multiple regression is a statistical technique used to evaluate and establish a quantitative relationships between multiple dependent and independent variables. in simple regression, only a single dependent variable can be regressed on a single independent variable. In multiple regression however, a number of variables, both metric and non-metric, can be involved and regressed on one another. Multiple regression, like other statistical techniques, requires that certain assumptions be valid and fulfilled in order to complete a valid analysis. These assumptions are:

1. The Independent variable(s) should be constant, where a repeat sample is involved. This implies that while the dependent variable(s) can change as a treatment is applied to it, the independent variable(s) should be held constant.

2. The variance of all error terms or residuals related to each variable should be constant.

3. There should be no autocorrelation between the error terms of independent variables. the existence of autocorrelation can be tested by the Run test or Durbin-Watson tests. Both tests work differently in indicating the presence of autocorrelation but are generally equally acceptable, with some scholars preferring the latter.

4. The number of observations must be greater than the number of parameters to be estimated.

5. There should not be a perfectly linear relationship between the explanatory or independent variable(s). in case there is, the confidence interval becomes wider, leading to a higher possibility that a hypothesis which must be rejected, is accepted. This issue is called multicolinearity and refers to the independence of the independent variables. The existence of multicolinearity can be tested by the VIF (Variance Inflation Factor), which is essentially equivalent to 1 divided by 1 minus the correlation coefficient between the variable(s). Where multicolinearity exists, the said problem has to be eliminated from the dataset in question. There are a number of ways this can be accomplished. A common method is to either drop the variable altogether or append cases in the problematic variable(s). however, another effective method is to use Factor scores based on Factor analysis, which will club together the correlated variables to produce a more valid result.

6. When performing multiple regression, the resulting error term should have a mean value of ‘Zero’, implying a complete prediction. The presence of a residual will indicate that the regression output has not completely predicted the relationship between the variables in question, and can be [substantially] improved.


The use of multiple regression as a statistical technique involves estimation of coefficients for the various variables in question. There are two key methods for estimation based on whether the multiple regression is linear or non-linear, although the latter method listed below can be used in either case

  1. Ordinary least square (OLS): This method was propounded by German mathematician, Carl Friedrich Gauss. It is a point estimation technique, which means that dependent variables are estimated at a particular point rather than in an interval. This method cannot be used in non-linear multiple regression unless the data are modified to become linear. OLS as a technique is based on the principle of minimizing the error term, as opposed to the Maximum Likelihood method, which is based on probability analysis.

  1. Maximum likelihood Method: This too is a point estimation method, which does not require that data have a linear relationship. in this method, the error term does not need to be normally distributed. This technique of multiple regression relies on probability as measure of the extent to which the model has fit the data. It is more mathematical in nature so before coming computer most of the researcher prefer OLS technique now these day due to computer it is easy to use this method.

A key advantage of multiple regression, besides being able to use multiple variables, is the ability to use multiple types of variables. For instance, a metric or numerical variable can be regressed on a non-metric or string variable, and vice versa. In addition, combinations of metric and non-metric variables can be regressed on metric and non-metric variables. depending on the specific kind of variable in question, different techniques such as discriminant analysis, logistic regression or SEM (Structured Equation Modeling) can be applied.

Click here for dissertation assistance!

Tuesday, January 20, 2009

T-test


A t-test is a statistical technique for comparison of the means of two samples or populations. There are other techniques similar to t-test for comparison of means, with the other popular measure being a z-test. However, a z-test is typically used where the sample size is relatively large, with t-test being the standard for usage in samples where the size or ‘n’ is 30 or smaller. Another key feature of the t-test is that it can be used for comparison of no more than 2 samples, with ANOVA being the most appropriate alternative. The t-test was discovered in the early 20th century by an Englishman, W.S. Gosset. The t-test is also commonly known as the student’s t-test, due to the fact that the usage of statistical analysis was considered a trade secret by Guiness, Gosset’s employer, forcing him to use a pen-name instead of his own real name.

In conducting a t-test, certain key assumptions have to be valid, including the following:

  • Data have to be normally distributed, meaning that there should be no outliers and the mean, median and mode should be the same. In the event that the data are not normal, they have to be normalized by converting into logarithm form. The variance of each sample dataset should also be equal.
  • Sample(s) may be dependent or independent, depending on the hypothesis. Where the samples are dependent, repeat measure are typically used. An example of a dependent sample is where observations are taken before and after a treatment.
  • For help assessing the assumptions of a t-test click here

T-tests are widely used in hypothesis testing for comparison of sample means, to determine whether or not they are statistically different from each other. For instance, a t-test may be used to:

  • Determine whether a sample belongs to a certain population
  • Determine whether two different samples belong to the same population or two different populations.
  • Determine whether the correlation between two samples or two different variables is statistically significant.
  • Determine whether, in case of dependent samples, the treatment has been statistically significant.

In order to conduct a t-test, we need to follow certain steps as follows:

  • Set up a Hypothesis for which the t-test is being conducted. The hypothesis is simply a statement that suggests what our expectation of the existing sample(s) is, and determines how the result of the t-test will be interpreted.
  • Select the level of significance and critical or ‘alpha’ region. Most often, a level of 95% significance is used in non-clinical applications, wherein a 99% or upwards level of significance is used. The balance is simply the alpha region which determines our hypothesis rejection zone or range.
  • Calculation: we obtain the value of the t-test by calculating the mean of the sample and comparing it with the population mean, to determine the standard deviation and dividing it by the number of observations (n), and taking a square root. The resulting value is the coefficient of the t-test.

  • Hypothesis testing: this step involves comparing our original hypothesis in step 1 using the obtained t-test value or coefficient. The idea is to compare our level of significance or ‘alpha’ value with the result of the t-test. For instance, if our t-test is conducted at 95% significance, for the hypothesis to be valid, our coefficient of the t-test should be lower than 5% or .05. If this is the case, then we can say that our hypothesis holds true. If not, we simply reject our hypothesis and can claim that the opposite is true.

While being a very useful tool in data analysis, the t-test is not without its limitations. For one thing, it can only be used in a small sample of 30 observations or less. In large data analysis projects, the t-test is practically useless. In addition, the t-test is a parametric test, which implies that in a non-normal distribution, it cannot be applied without making changes to dataset. In reality, few datasets are ever normal without having to make changes, and a t-test is thus a more cosmetic test. A non-parametric test can thus be applied more effectively, such as the Mann-Whitney U test (for independent samples) or the binomial or signed rank test (for related or dependent samples).

Click here for assistance with conducting T-tests

Thursday, January 8, 2009

Linear Regression Analysis and Logistic Regression Analysis

In this blog I discuss linear regression analysis, aspects of multiple regression, and logistic regression analysis, their function and differences, and SPSS regression analysis interpretation. At Statistics Solutions we hope you glean a few ideas here.

Linear Regression Analysis in SPSS

Linear regression analysis is a statistical analysis technique that assesses the impact of a predictor variable (the independent variable) on a criterion variable (a dependent variable). Importantly, the independent variable must be continuous (interval-level or ratio-level) or dichotomous. The dependent variable must be either continuous (interval-level or ratio-level). Dissertation students often have research questions that are appropriate to this technique. For example, a dissertation research question may be what the impact of smoking is on life expectancy. In this example, smoking is the predictor variable and life expectancy is the criterion variable. For Linear Regression Analysis help, CLICK HERE.

Linear Regression Analysis Assumptions

There are three primary assumptions associated with linear regression: outliers, linearity, and constant variance. Linear regression analysis is very sensitive to outliers. The easiest way to identify outliers is to standardize the scores by requesting that SPSS for the z-scores. Any score with a z-value outside of the absolute value of 3 is probably an outlier and should be considered for deletion. The assumption of linearity and constant variance can be assessed in SPSS by requesting a plot of the residuals (“z-resid” on the y-axis) by the predicted values (on “z-pred” the x-axis). If the scatter plot is not u-shaped, indicating non-linearity, or cone-shaped, indicating non-constant variance, the assumptions are considered met. For Linear Regression Analysis Assumptions Help, CLICK HERE.

Multiple Linear Regression Analysis

Multiple linear regression is a statistical analysis which is similar to Linear Regression with the exception that there can be more than one predictor variable. The assumptions of outliers, linearity and constant variance need to be met. One additional assumption that needs to be examined is multicollinearity. Multicollinearity is the extent to which the predictor variables are related to each other. Multicollinearity can be assessed by asking SPSS for the Variance Inflation Factor (VIF). While different researchers have different criteria for what constitutes too high a VIF number, VIF of 10 or greater is certainly reason for pause. If the VIF is 10 or greater, consider collapsing the variables. For Multiple Linear Regression Analysis Multicollinearity Help, CLICK HERE.

Regression Analysis Interpretation

When I speak with dissertation students about their regression analysis, there are four aspects of the SPSS output that I want to interpret. First is the ANOVA. The ANOVA tells the researcher whether the model is statistically significant; whether the F-value has an associated probability of .05 or less. The second thing to look for is the R-square value, also named the coefficient of determination. The coefficient of determination is a number between 0 and 100 which indicates what percent of the variability in the criterion variable can be accounted for by the predictor variable(s). The third regression analysis aspect to interpret is whether the beta coefficient is statistically significant. The beta’s significance can be found by examining the t-value and the associated significance level of the t-value for that particular predictor. Fourthly, you should interpret the beta, whether positive or negative. For Linear Regression Analysis Interpretation Help, CLICK HERE.

Logistic Regression Analysis in SPSS

Logistic regression, also called Binary Logistic Regression, is a statistical analysis technique that assesses the impact of a predictor variable (the independent variable) on a criterion variable (a dependent variable). As in a linear regression analysis, the independent variable must be continuous (interval-level or ratio-level) or dichotomous. The difference is that the dependent variable must be dichotomous (i.e., a binary variable). For example, a researcher may want to know whether age predicts the likelihood of going to a doctor (yes vs. no). For Logistic Regression Analysis Help, CLICK HERE.

Binary Logistic Regression Analysis Interpretation

While binary logistic regression and linear regression analyses are different in the criterion variables, there are other differences as well. In logistic regression, to assess whether the model is statistically significant, you can look at the chi-square test and whether it is statistically significant. The chi-square in logistic regression analysis is analogous to the ANOVA test in the linear regression. The next thing to examine is the Nagelkerke R-square statistic, which is somewhat analogous to the R-square value in the linear regression analysis. Next, interpret whether the Beta coefficient(s) is statistically significant. If so, look at the Exp(B) to see the likelihood that for a one-unit change in the predictor, the outcome is X more times likely to occur.. For Binary Logistic Regression Analysis Interpretation Help, CLICK HERE.

Friday, January 2, 2009

Statistics for your Dissertation Proposal or Thesis Proposal

Tis the season for dissertation proposals!! I'm sure many of you are preparing to start another riveting semester of graduate work and another semester with edge-of-your seat deadlines – the stuff epic motion pictures are made of!!!

We've all been there. You had plenty of time. You researched and you put off the hard stuff. Now you are facing crunch time. You know who you are… Now you have to hand in the proposal and need help. Maybe you have a couple weeks or maybe you have a couple days. What are you going to do? Read on my friend, read on. Today's post may just save you thousands of dollars and a few years of your life lost from stress.

Statistics for your Dissertation Proposal or Thesis Proposal

Among other things, I am betting you are most concerned about the appropriate statistics for your dissertation or thesis. I have covered this in another blog. Check it out here. In the meantime, I have some recommendations for the graduate student pursuing their thesis or dissertation and working on their proposal.

Know What you Need to Know

Different statistical tests measure different things, so it's important to know what you are trying to find. Are you looking for a relationship or are you looking for differences? Do you need to establish some predictability or are you just seeking to describe something? This will have a direct impact on the type of statistical tests you choose for your dissertation proposal or thesis proposal. There are words associated with certain statistical tests, e.g., "to find a relationship between X and Y is associated with correlation language. Click here for help determining the type of statistical tests to use with your dissertation proposal or thesis proposal.

Know how the Statistics in your Dissertation are Supposed to be Used

This is similar to the one above but I thought I would include it. A pretty good percentage of our clients have had their dissertation or thesis proposal approved and are now beginning to work on their results section. The problem is they aren't really sure how the tests they proposed are supposed to be used. You might think that since the proposal has been approved by experts, that they would have ensured that the statistical analysis you proposed for your dissertation or thesis is correct. Don't be fooled!

Many, many clients have sent us their approved proposal, listing the statistical analysis to be conducted and the variables to be tested, only to find out that the statistical test they proposed cannot be used with their type of variables. This is embarrassing and time-consuming, but can be avoided with a little due diligence. Click here for help determining how to use statistical tests with your dissertation proposal or thesis proposal.

Know the Types of Variables

There aren't very many types of variables. Take an evening if you have to and become familiar with the different types of variables used in statistical analysis. There are only a few and it will make all the difference in the world when you are choosing the statistical tests for your dissertation proposal or thesis proposal. Some statistical tests are only for continuous variables and some statistical tests are only for nominal variables. Some tests can use both if they are entered a particular way. It will pay to familiarize yourself with these types, before you write your survey questions and propose your analysis. If you are keeping these variable types in mind as you are constructing the survey for your dissertation proposal or thesis proposal, it will make choosing the statistical analysis much easier later on. For help with the types of variables included in your graduate thesis or Ph.D. click here.

Know the Assumptions of the Statistical Tests

Each statistical test used in your dissertation proposal or thesis proposal comes complete with assumptions, to make sure the test accurately measures what it is intended to measure. There's a pretty good chance that the assumptions of the statistical tests you choose to use for your dissertation proposal or thesis proposal won't be met, unless you're gathering a lot of observations.While you won't know for sure if the assumptions of the statistical tests have been met until after you have the data, you can get a pretty good idea without having the data.

For instance, maybe you are proposing looking for differences on GPA between those receiving free/reduced lunch and those not receiving free/reduced lunch. If you are researching poor, inner-city schools, you know there is probably going to be a disproportionate number of free/reduced lunch recipients. It's also possible that there will be a disproportionate number of failing schools. For two of the tests that could be used to analyze this difference, the independent samples t-test and the analysis of variance (ANOVA), there is the assumption that the groups are approximately equal in their standard deviations. We know this isn't the case a may instead propose a non-parametric equivalent. Click here for help with the assumptions of the statistical analysis being used in your Master's thesis, Master's dissertation, Ph.D. thesis, or Ph.D. dissertation.

I hope this helps some. I invite you to click here and schedule an appointment to speak with us about helping your with your Master's thesis, Master's dissertation, Ph.D. thesis, or Ph.D. dissertation. I've helped thousands upon thousands of graduate students over the last 16 years and can help you.