Request

To request a blog written on a specific topic, please email James@StatisticsSolutions.com with your suggestion. Thank you!

Tuesday, February 17, 2009

Cox Event History

Cox event history is a branch of statistics that deals mainly with death in biological organisms and failure of mechanical systems. It is also sometimes referred to as a statistical method for analyzing survival data. Cox event history is also known by the various other names such as survival analysis, duration analysis or transition analysis. Generally speaking, this technique involves modeling of data structured in a time to event format. The outcome of this analysis is to understand the probability of the occurrence of an event. This technique was primarily developed for medical and biological sciences. However, Cox Event history is frequently used rapidly in engineering as well as statistical and data analysis.

One of the key purposes of the Cox event history technique is to explain the causes behind the differences or similarities between the events encountered by subjects. For instance, Cox regression may be used to evaluate why certain individuals are at a higher risk of encountering some diseases than others. It can thus be effectively applied to studying acute or chronic diseases, hence the interest in medical science. The Cox event history model mainly focuses on the hazard function, which produces the probabilities of an event occurring randomly at any time or at a specific period or instance in time.

The basic Cox event history model can be summarized by the following function:

h(t) = h0(t)e(b1X1 + b2X2 + K + bnXn)

Where; h(t) = rate of hazard

h0(t) = baseline hazard function

bX’s = coefficients and covariates.

Cox event history mainly can be categorized under three models- nonparametric, semi-parametric and parametric.

Non-parametric: The non-parametric model does not make any assumptions about the hazard function or the variables affecting it. Hence, only a limited number of variable types can be handled with the help of a non-parametric model. This type of model involves the analysis of empirical data showing changes over a period of time and cannot not handle continuous variables.

Semi-parametric: Just like the non-parametric model, the semi-parametric model also does not make any assumptions about the shape of the hazard function or the variables affecting it. What makes this model different is that it assumes that the rate of the hazard is proportional over a period of time. The estimates for the hazard function shape can be derived empirically as well. Multivariate analyses are supported by semi-parametric models and are often considered a more reliable fitting method for choice in Cox event history analysis.

Parametric: In this model, the shape of the hazard function and the variables affecting it are determined in advance. Multivariate analyses of discrete and continuous explanatory variables is supported by the parametric model. However, if the hazard function shape is incorrectly estimated, then there are chances that the results could be biased. Parametric models are frequently used to analyze the nature of time dependency. It is also particularly useful for predictive modeling because the shape of the baseline hazard function can be determined correctly by the parametric model.

Cox event history analysis involves the use of certain assumptions. Like with every other statistical method or technique, if an assumption is violated, it will often lead to the results not being statistically reliable. The major assumption is that in using Cox event history, with the passage of time, independent variables do not interact with each other. In other words, the independent variables should have a constant hazard rate over time.

In addition, Hazard rates are rarely smooth in reality. Frequently they need to be smoothened in order for them to be useful for Cox Event History analysis.

Applications of Cox Event History

Cox event history can be applied in many fields although initially it was used primarily in medical and other biological sciences. Today it’s an excellent tool for other applications, frequently used as a statistical method where the dependent variables are categorical, especially in socio-economic analyses. For instance, in the field of economics, Cox event history is used extensively to relate macro or micro economic indicators in terms of a time series. For instance, figuring out the relationship between unemployment or employment with time. In addition, in commercial applications, Cox event history can be applied to estimate the lifespan of a certain machine and break down points based on historical data.

Tuesday, January 27, 2009

Multiple Regression

The term Multiple regression was first used by Pearson in 1908. Multiple regression is a statistical technique used to evaluate and establish a quantitative relationships between multiple dependent and independent variables. in simple regression, only a single dependent variable can be regressed on a single independent variable. In multiple regression however, a number of variables, both metric and non-metric, can be involved and regressed on one another. Multiple regression, like other statistical techniques, requires that certain assumptions be valid and fulfilled in order to complete a valid analysis. These assumptions are:

1. The Independent variable(s) should be constant, where a repeat sample is involved. This implies that while the dependent variable(s) can change as a treatment is applied to it, the independent variable(s) should be held constant.

2. The variance of all error terms or residuals related to each variable should be constant.

3. There should be no autocorrelation between the error terms of independent variables. the existence of autocorrelation can be tested by the Run test or Durbin-Watson tests. Both tests work differently in indicating the presence of autocorrelation but are generally equally acceptable, with some scholars preferring the latter.

4. The number of observations must be greater than the number of parameters to be estimated.

5. There should not be a perfectly linear relationship between the explanatory or independent variable(s). in case there is, the confidence interval becomes wider, leading to a higher possibility that a hypothesis which must be rejected, is accepted. This issue is called multicolinearity and refers to the independence of the independent variables. The existence of multicolinearity can be tested by the VIF (Variance Inflation Factor), which is essentially equivalent to 1 divided by 1 minus the correlation coefficient between the variable(s). Where multicolinearity exists, the said problem has to be eliminated from the dataset in question. There are a number of ways this can be accomplished. A common method is to either drop the variable altogether or append cases in the problematic variable(s). however, another effective method is to use Factor scores based on Factor analysis, which will club together the correlated variables to produce a more valid result.

6. When performing multiple regression, the resulting error term should have a mean value of ‘Zero’, implying a complete prediction. The presence of a residual will indicate that the regression output has not completely predicted the relationship between the variables in question, and can be [substantially] improved.


The use of multiple regression as a statistical technique involves estimation of coefficients for the various variables in question. There are two key methods for estimation based on whether the multiple regression is linear or non-linear, although the latter method listed below can be used in either case

  1. Ordinary least square (OLS): This method was propounded by German mathematician, Carl Friedrich Gauss. It is a point estimation technique, which means that dependent variables are estimated at a particular point rather than in an interval. This method cannot be used in non-linear multiple regression unless the data are modified to become linear. OLS as a technique is based on the principle of minimizing the error term, as opposed to the Maximum Likelihood method, which is based on probability analysis.

  1. Maximum likelihood Method: This too is a point estimation method, which does not require that data have a linear relationship. in this method, the error term does not need to be normally distributed. This technique of multiple regression relies on probability as measure of the extent to which the model has fit the data. It is more mathematical in nature so before coming computer most of the researcher prefer OLS technique now these day due to computer it is easy to use this method.

A key advantage of multiple regression, besides being able to use multiple variables, is the ability to use multiple types of variables. For instance, a metric or numerical variable can be regressed on a non-metric or string variable, and vice versa. In addition, combinations of metric and non-metric variables can be regressed on metric and non-metric variables. depending on the specific kind of variable in question, different techniques such as discriminant analysis, logistic regression or SEM (Structured Equation Modeling) can be applied.

Click here for dissertation assistance!

Tuesday, January 20, 2009

T-test


A t-test is a statistical technique for comparison of the means of two samples or populations. There are other techniques similar to t-test for comparison of means, with the other popular measure being a z-test. However, a z-test is typically used where the sample size is relatively large, with t-test being the standard for usage in samples where the size or ‘n’ is 30 or smaller. Another key feature of the t-test is that it can be used for comparison of no more than 2 samples, with ANOVA being the most appropriate alternative. The t-test was discovered in the early 20th century by an Englishman, W.S. Gosset. The t-test is also commonly known as the student’s t-test, due to the fact that the usage of statistical analysis was considered a trade secret by Guiness, Gosset’s employer, forcing him to use a pen-name instead of his own real name.

In conducting a t-test, certain key assumptions have to be valid, including the following:

  • Data have to be normally distributed, meaning that there should be no outliers and the mean, median and mode should be the same. In the event that the data are not normal, they have to be normalized by converting into logarithm form. The variance of each sample dataset should also be equal.
  • Sample(s) may be dependent or independent, depending on the hypothesis. Where the samples are dependent, repeat measure are typically used. An example of a dependent sample is where observations are taken before and after a treatment.
  • For help assessing the assumptions of a t-test click here

T-tests are widely used in hypothesis testing for comparison of sample means, to determine whether or not they are statistically different from each other. For instance, a t-test may be used to:

  • Determine whether a sample belongs to a certain population
  • Determine whether two different samples belong to the same population or two different populations.
  • Determine whether the correlation between two samples or two different variables is statistically significant.
  • Determine whether, in case of dependent samples, the treatment has been statistically significant.

In order to conduct a t-test, we need to follow certain steps as follows:

  • Set up a Hypothesis for which the t-test is being conducted. The hypothesis is simply a statement that suggests what our expectation of the existing sample(s) is, and determines how the result of the t-test will be interpreted.
  • Select the level of significance and critical or ‘alpha’ region. Most often, a level of 95% significance is used in non-clinical applications, wherein a 99% or upwards level of significance is used. The balance is simply the alpha region which determines our hypothesis rejection zone or range.
  • Calculation: we obtain the value of the t-test by calculating the mean of the sample and comparing it with the population mean, to determine the standard deviation and dividing it by the number of observations (n), and taking a square root. The resulting value is the coefficient of the t-test.

  • Hypothesis testing: this step involves comparing our original hypothesis in step 1 using the obtained t-test value or coefficient. The idea is to compare our level of significance or ‘alpha’ value with the result of the t-test. For instance, if our t-test is conducted at 95% significance, for the hypothesis to be valid, our coefficient of the t-test should be lower than 5% or .05. If this is the case, then we can say that our hypothesis holds true. If not, we simply reject our hypothesis and can claim that the opposite is true.

While being a very useful tool in data analysis, the t-test is not without its limitations. For one thing, it can only be used in a small sample of 30 observations or less. In large data analysis projects, the t-test is practically useless. In addition, the t-test is a parametric test, which implies that in a non-normal distribution, it cannot be applied without making changes to dataset. In reality, few datasets are ever normal without having to make changes, and a t-test is thus a more cosmetic test. A non-parametric test can thus be applied more effectively, such as the Mann-Whitney U test (for independent samples) or the binomial or signed rank test (for related or dependent samples).

Click here for assistance with conducting T-tests

Thursday, January 8, 2009

Linear Regression Analysis and Logistic Regression Analysis

In this blog I discuss linear regression analysis, aspects of multiple regression, and logistic regression analysis, their function and differences, and SPSS regression analysis interpretation. At Statistics Solutions we hope you glean a few ideas here.

Linear Regression Analysis in SPSS

Linear regression analysis is a statistical analysis technique that assesses the impact of a predictor variable (the independent variable) on a criterion variable (a dependent variable). Importantly, the independent variable must be continuous (interval-level or ratio-level) or dichotomous. The dependent variable must be either continuous (interval-level or ratio-level). Dissertation students often have research questions that are appropriate to this technique. For example, a dissertation research question may be what the impact of smoking is on life expectancy. In this example, smoking is the predictor variable and life expectancy is the criterion variable. For Linear Regression Analysis help, CLICK HERE.

Linear Regression Analysis Assumptions

There are three primary assumptions associated with linear regression: outliers, linearity, and constant variance. Linear regression analysis is very sensitive to outliers. The easiest way to identify outliers is to standardize the scores by requesting that SPSS for the z-scores. Any score with a z-value outside of the absolute value of 3 is probably an outlier and should be considered for deletion. The assumption of linearity and constant variance can be assessed in SPSS by requesting a plot of the residuals (“z-resid” on the y-axis) by the predicted values (on “z-pred” the x-axis). If the scatter plot is not u-shaped, indicating non-linearity, or cone-shaped, indicating non-constant variance, the assumptions are considered met. For Linear Regression Analysis Assumptions Help, CLICK HERE.

Multiple Linear Regression Analysis

Multiple linear regression is a statistical analysis which is similar to Linear Regression with the exception that there can be more than one predictor variable. The assumptions of outliers, linearity and constant variance need to be met. One additional assumption that needs to be examined is multicollinearity. Multicollinearity is the extent to which the predictor variables are related to each other. Multicollinearity can be assessed by asking SPSS for the Variance Inflation Factor (VIF). While different researchers have different criteria for what constitutes too high a VIF number, VIF of 10 or greater is certainly reason for pause. If the VIF is 10 or greater, consider collapsing the variables. For Multiple Linear Regression Analysis Multicollinearity Help, CLICK HERE.

Regression Analysis Interpretation

When I speak with dissertation students about their regression analysis, there are four aspects of the SPSS output that I want to interpret. First is the ANOVA. The ANOVA tells the researcher whether the model is statistically significant; whether the F-value has an associated probability of .05 or less. The second thing to look for is the R-square value, also named the coefficient of determination. The coefficient of determination is a number between 0 and 100 which indicates what percent of the variability in the criterion variable can be accounted for by the predictor variable(s). The third regression analysis aspect to interpret is whether the beta coefficient is statistically significant. The beta’s significance can be found by examining the t-value and the associated significance level of the t-value for that particular predictor. Fourthly, you should interpret the beta, whether positive or negative. For Linear Regression Analysis Interpretation Help, CLICK HERE.

Logistic Regression Analysis in SPSS

Logistic regression, also called Binary Logistic Regression, is a statistical analysis technique that assesses the impact of a predictor variable (the independent variable) on a criterion variable (a dependent variable). As in a linear regression analysis, the independent variable must be continuous (interval-level or ratio-level) or dichotomous. The difference is that the dependent variable must be dichotomous (i.e., a binary variable). For example, a researcher may want to know whether age predicts the likelihood of going to a doctor (yes vs. no). For Logistic Regression Analysis Help, CLICK HERE.

Binary Logistic Regression Analysis Interpretation

While binary logistic regression and linear regression analyses are different in the criterion variables, there are other differences as well. In logistic regression, to assess whether the model is statistically significant, you can look at the chi-square test and whether it is statistically significant. The chi-square in logistic regression analysis is analogous to the ANOVA test in the linear regression. The next thing to examine is the Nagelkerke R-square statistic, which is somewhat analogous to the R-square value in the linear regression analysis. Next, interpret whether the Beta coefficient(s) is statistically significant. If so, look at the Exp(B) to see the likelihood that for a one-unit change in the predictor, the outcome is X more times likely to occur.. For Binary Logistic Regression Analysis Interpretation Help, CLICK HERE.

Friday, January 2, 2009

Statistics for your Dissertation Proposal or Thesis Proposal

Tis the season for dissertation proposals!! I'm sure many of you are preparing to start another riveting semester of graduate work and another semester with edge-of-your seat deadlines – the stuff epic motion pictures are made of!!!

We've all been there. You had plenty of time. You researched and you put off the hard stuff. Now you are facing crunch time. You know who you are… Now you have to hand in the proposal and need help. Maybe you have a couple weeks or maybe you have a couple days. What are you going to do? Read on my friend, read on. Today's post may just save you thousands of dollars and a few years of your life lost from stress.

Statistics for your Dissertation Proposal or Thesis Proposal

Among other things, I am betting you are most concerned about the appropriate statistics for your dissertation or thesis. I have covered this in another blog. Check it out here. In the meantime, I have some recommendations for the graduate student pursuing their thesis or dissertation and working on their proposal.

Know What you Need to Know

Different statistical tests measure different things, so it's important to know what you are trying to find. Are you looking for a relationship or are you looking for differences? Do you need to establish some predictability or are you just seeking to describe something? This will have a direct impact on the type of statistical tests you choose for your dissertation proposal or thesis proposal. There are words associated with certain statistical tests, e.g., "to find a relationship between X and Y is associated with correlation language. Click here for help determining the type of statistical tests to use with your dissertation proposal or thesis proposal.

Know how the Statistics in your Dissertation are Supposed to be Used

This is similar to the one above but I thought I would include it. A pretty good percentage of our clients have had their dissertation or thesis proposal approved and are now beginning to work on their results section. The problem is they aren't really sure how the tests they proposed are supposed to be used. You might think that since the proposal has been approved by experts, that they would have ensured that the statistical analysis you proposed for your dissertation or thesis is correct. Don't be fooled!

Many, many clients have sent us their approved proposal, listing the statistical analysis to be conducted and the variables to be tested, only to find out that the statistical test they proposed cannot be used with their type of variables. This is embarrassing and time-consuming, but can be avoided with a little due diligence. Click here for help determining how to use statistical tests with your dissertation proposal or thesis proposal.

Know the Types of Variables

There aren't very many types of variables. Take an evening if you have to and become familiar with the different types of variables used in statistical analysis. There are only a few and it will make all the difference in the world when you are choosing the statistical tests for your dissertation proposal or thesis proposal. Some statistical tests are only for continuous variables and some statistical tests are only for nominal variables. Some tests can use both if they are entered a particular way. It will pay to familiarize yourself with these types, before you write your survey questions and propose your analysis. If you are keeping these variable types in mind as you are constructing the survey for your dissertation proposal or thesis proposal, it will make choosing the statistical analysis much easier later on. For help with the types of variables included in your graduate thesis or Ph.D. click here.

Know the Assumptions of the Statistical Tests

Each statistical test used in your dissertation proposal or thesis proposal comes complete with assumptions, to make sure the test accurately measures what it is intended to measure. There's a pretty good chance that the assumptions of the statistical tests you choose to use for your dissertation proposal or thesis proposal won't be met, unless you're gathering a lot of observations.While you won't know for sure if the assumptions of the statistical tests have been met until after you have the data, you can get a pretty good idea without having the data.

For instance, maybe you are proposing looking for differences on GPA between those receiving free/reduced lunch and those not receiving free/reduced lunch. If you are researching poor, inner-city schools, you know there is probably going to be a disproportionate number of free/reduced lunch recipients. It's also possible that there will be a disproportionate number of failing schools. For two of the tests that could be used to analyze this difference, the independent samples t-test and the analysis of variance (ANOVA), there is the assumption that the groups are approximately equal in their standard deviations. We know this isn't the case a may instead propose a non-parametric equivalent. Click here for help with the assumptions of the statistical analysis being used in your Master's thesis, Master's dissertation, Ph.D. thesis, or Ph.D. dissertation.

I hope this helps some. I invite you to click here and schedule an appointment to speak with us about helping your with your Master's thesis, Master's dissertation, Ph.D. thesis, or Ph.D. dissertation. I've helped thousands upon thousands of graduate students over the last 16 years and can help you.

Tuesday, December 30, 2008

Statistical Analysis for your Dissertation and Thesis

What types of statistical analysis are appropriate for a dissertation or thesis?

Multivariate statistics are usually appropriate but not exclusively used. I help graduate students everyday with dissertations and theses that utilize simple linear regressions, correlations, and t-tests, however, most institutions and committees want to see multivariate statistics used by their graduate students. That said, here is a very short list of the common ones.

Multiple Regression

Multiple regression for your dissertation or thesis will simply include more than one predictor. The advantage to using this statistical test for your dissertation or thesis is that you include multiple variables in your model predicting your variable of interest. Very rarely – if ever – is it the case that only one variable is responsible for values of another variable. I like an example using the Super Bowl. I may be able to predict a good percentage of Super Bowl victories with salaries, but we all know there are many more factors involved in predicting Super Bowl victories, such as injuries, weather, experience, and strength of schedule. Including multiple predictors makes for a more accurate model. Get help with using multiple regressions for your Master's thesis, Master's dissertation, Ph.D. thesis, or Ph.D. dissertation.

Logistic Regression

The logic behind the multiple regression applies to the logistic regression, except that the logistic regression utilizes an odds ratio to predict the occurrence of a dichotomous variable. Get help with using logistic regressions for your Master's thesis, Master's dissertation, Ph.D. thesis, or Ph.D. dissertation.

n – way ANOVA (Analysis of Variance) or Factorial ANOVA(Analysis of Variance)

The n in this case is simply referring to the virtually limitless number of independent variables that can be used in an ANOVA. A two-way ANOVA is the equivalent of conducting two ANOVAs or t-tests in one test and is simply a factorial ANOVA. A factorial ANOVA is just an ANOVA with two or more independent variables. An n-way ANOVA or factorial ANOVA could have three, four, five, or more independent variables. This method also allows for not just testing of differences between the groups but also testing of interactions between the independent variables. Get help with n-way ANOVA factorial ANOVA for your Master's thesis, Master's dissertation, Ph.D. thesis, or Ph.D. dissertation.

Mixed ANOVA(Analysis of Variance)

Again, the logic behind this test is the same as the n-way ANOVA or factorial ANOVA, but is the equivalent of conducting:

  • a dependent samples t-test or paired samples t-test and an independent samples t-test or two sample t-test at the same time.
  • a repeated measures ANOVA and simple ANOVA at the same time. 

The complexity of the test depends completely on the number of variables involved in the statistical analysis. The effect of conducting a mixed ANOVA is the increase in power from conducting multiple statistical tests in one test, while protecting your alpha in the process. Get help with using mixed ANOVA for your Master's thesis, Master's dissertation, Ph.D. thesis, or Ph.D. dissertation.

MANOVA (Multivariate Analysis of Variance)

The same logic again, except this time we are analyzing multiple dependent variables. For example, we may want to test for significant differences in GPA, SAT scores, and ACT scores, by religious affiliation. We can do all of these comparisons at the same time in the same test with the MANOVA or multivariate analysis of variance. This is a favorite of many a professional researcher and committee. Get help with using MANOVA or multivariate analysis of variance for your Master's thesis, Master's dissertation, Ph.D. thesis, or Ph.D. dissertation.

ANCOVA (Analysis of Covariance) MANCOVA (Multivariate Analysis of Covariance

These have the same benefits and accomplish the same thing as their siblings without the "C" or "Co" but add that capability of excluding variables that could somehow invalidate your results. To do this, the ANCOVA and the MANCOVA utilize a control variable. For example, if I wanted to know if there is a significant difference in GPA between college students, there could be any number of factors that could cause the difference. But by isolating the effect those factors have on my test, I am able to test for real differences. In this case I might control for socioeconomic status and the number of extracurricular activities. Utilizing the control variable will do a great deal to silence the critics of your research that may attribute the differences you found to the existence of some extraneous, unidentified, and unaccounted for variable. Get help with using ANCOVAs (analysis of covariance) or MANCOVAs (multivariate analysis of covariance) for your Master's thesis, Master's dissertation, Ph.D. thesis, or Ph.D. dissertation.

Doubly Multivariate Analysis of Covariance

I just thought I would throw this in here to get you thinking a little about what's possible. If your head's spinning at this point, click here and I will be more than happy to help you with your statistics for your dissertation or thesis.

Monday, December 29, 2008

Bivariate Correlations Continued

To this point I have been writing about what I thought people might be interested in reading, but I thought I would start taking requests… Debussy's Clair de Lune, 50 Cent, Michael Bolton, Britney Spears… okay maybe not Michael Bolton. Post what you would like to see a blog entry about and I will do my best to comply.

It seems a great number of you are interested in bivariate correlation, so here is another entry on this favorite of statistical tests. I also think I could be a bit more comprehensive on the assumptions of ANOVA, so look for that in the very near future. I also plan on covering the assumptions of bivariate correlation. In the meantime…

What is bivariate correlation?

A bivariate correlation is a statistical test that measures the association or relationship between two continuous/interval/ordinal level variables. This test will use probability and tell the researcher the nature of the relationship between the two variables, but not the direction of the relationship in the sense of describing causality…but I digress.

How to interpret bivariate correlation

To understand how to interpret a bivariate correlation, we have to first understand what the possible results are. If the correlation is significant, our correlation coefficient will be either positive or negative.  

Positive Correlation Coefficients

A positive correlation coefficient means that the relationship between the two variables is positive and that the variables move in the same direction. It also means that an increase in say… height, corresponds with an increase in weight. Stated another way, "As height increases, weight also increases, or as weight increases, height also increases."  

Negative Correlation Coefficients

A negative correlation coefficient means that the relationship between the two variables is negative and means that the variables move in opposite directions. Using the same example, we would say, "As height increases, weight decreases, or as weight increases, height decreases." 

The sign of the correlation coefficient tells us the nature of the relationship, as in one variable decreasing as one variable is increasing, or both variables increasing or decreasing together, but does not tell us how one variable affects another variable.  

Note that the sign of the correlation – negative or positive – can be interpreted two ways. With a positive correlation, as height increases, weight also increases, or as weight increases, height also increases. Both are correct. For a negative correlation, as weight increases, height decreases, or as height increases, weight decreases. I hope this isn't confusing. If any of you are having trouble, just post a comment and let me know. Better yet, let me do the correlations for you. Get help with how to interpret bivariate correlation coefficients for your Master's thesis, Master's dissertation, Ph.D. thesis, or Ph.D. dissertation. 

What does the bivariate correlation coefficient mean?

Correlation coefficients range from -1 to +1. If the bivariate correlation coefficient is -1, the relationship between the two variables is perfectly negative, and if the bivariate correlation coefficient is +1, the relationship between the two variables is perfectly positive. The closer the correlation coefficient is to -1 or +1, the stronger the relationship. The closer the correlation coefficient is to 0, the weaker the relationship.  

What is r2?

Often the correlation is interpreted in terms of the amount of variance explained in one variable by another variable – just remember, though, that the correlation is bidirectional and can be interpreted either way. If you are conducting a Pearson correlation, in your output, you will get a Pearson correlation coefficient or a product-moment correlation coefficient. Squaring this gives you…r2. If I have a correlation coefficient of 0.4, then r2 = 0.16 and would be interpreted as, "…16% of the variance in height is explained by weight," and vice versa. Get help with interpreting r2 for your Master's thesis, Master's dissertation, Ph.D. thesis, or Ph.D. dissertation. 

How to report a Pearson correlation or a Pearson product-moment correlation?

Here is the gem of the entire post and free of charge.  

There was a significant, positive relationship between height and weight, r(98) = 0.40, p < 0.01, indicating that as height increases, weight also increases. Height accounted for 16% of the variance in weight.  

Of course there is more to it than this, such as determining which variable is going to be your dependent variable and which variable is going to be your independent variable, as well as how the appropriateness of the test and how it relates to the rest of your thesis or dissertation. Click here for help with writing bivariate correlations for your Master's thesis, Master's dissertation, Ph.D. thesis, or Ph.D. dissertation.  

What are the degrees of freedom for a bivariate correlation?

The degrees of freedom for a bivariate correlation are n – 2, where n is the sample size. This is also the number in the parenthesis above. The number in parenthesis is not the sample size.

How do I use the bivariate correlation?

If you were interested merely in the relationship of two variables, you would use the bivariate correlation. If you are interested in the effect of one variable on another variable, you would use a regression. The regression is the same as the correlation, but will tell you the specific impact one variable has on another variable in terms of the unstandardized beta coefficient and the standardized beta coefficient. It will also tell you the equation for the best fit line. We'll cover regressions very soon. Sometimes knowing the relationship of the two variables is enough, however, and if this is the case then bivariate correlation is the statistical test for you. Get help with how to use bivariate correlation for your Master's thesis, Master's dissertation, Ph.D. thesis, or Ph.D. dissertation. We even provide customized videos of your bivariate correlations being conducted.


 

Tuesday, December 23, 2008

The Dependent Samples t-test or the Paired Samples t-test

What is a dependent samples t-test or a paired samples t-test?

One of the most common statistical test, a dependent samples t-test, or a paired samples t-test, is used to find significant mean differences between two groups on a particular measure like SAT scores, ACT scores, GPA, height, or weight. In the case of the dependent samples t-test or a paired samples t-test, the groups of interest are related somehow as siblings or in the environment of a pretreatment vs. posttreatment setting. Either way, the two groups being compared are related somehow. Get help with dependent samples t-test


What is the difference between a dependent samples t-test or a paired samples t-test and an independent samples t-test?

Both tests are used to find significant differences between groups, but the independent samples t-test assumes the groups are not related to each other, while the dependent samples t-test or paired samples t-test assumes the groups are related to each other.

If you're familiar with the tests, a dependent samples t-test or paired samples t-test would be used to find differences within groups, while the independent samples t-test would be used to find differences between groups. Get help with dependent samples t-tests or independent samples t-tests


In a dependent samples t-test or a paired samples t-test, what is the independent variable and what is the dependent variable?

The independent variable and the dependent variable is the same in both the dependent samples t-test and the independent samples t-test. The variable of measure of the variable of interest is the dependent variable and the grouping variable is the independent variable. Get help with dependent samples t-test


Example of dependent samples t-test or a paired samples t-test

The most common use of the dependent samples t-test is in a pretreatment vs. posttreatment scenario where the researcher wants to test the effectiveness of a treatment.

  1. The participants are tested pretreatment, to establish some kind of a baseline measure
  2. The participants are then exposed to some kind of treatment
  3. The participants are then tested posttreatment, for the purposes of comparison with the pretreatment scores

Having both pretreatment scores and the posttreatment scores for the same participants allows us to measure the effectiveness of the treatment, ceteris paribus. Get help with dependent samples t-test


Tuesday, December 9, 2008

Analysis of Variance (ANOVA)

An analysis of variance (ANOVA) is a statistical test conducted to examine difference in a continuous variable by a categorical variable. Let’s talk about:

  1. The variables in ANOVA,
  2. The assumptions of ANOVA,
  3. The logic of ANOVA, and
  4. What the ANOVA results indicate.

I am going to limit the conversation to a one-way ANOVA (i.e., an ANOVA with just 1 independent variable).

Variables in ANOVA

The variables in ANOVA: there are two variables in an ANOVA—a dependent variable and an independent variable. For example, let’s imagine we want to examine differences in SAT scores by gender. SAT scores are the ANOVA dependent variable (i.e., the scores depend on the participants), and it’s a continuous variable because the scores range from 200 to 800. Gender is the ANOVA independent variable (i.e., the designation of male and female are independent of the participant). Further, the independent variable is categorical—you are either male or female. (For more on variables, look here.)

The Assumptions of ANOVA

The assumptions of ANOVA: when an ANOVA is conducted, there are three assumptions. The first ANOVA assumption is that of independence—in this example, males’ scores are unrelated or unaffected with the females’ scores. This ANOVA assumption cannot be violated; if it is, then a different test needs to be conducted. The second ANOVA assumption is normality—that is, the distribution of females’ scores are not dissimilar from a normal bell curve.

The third ANOVA assumption is homogeneity of variance. This ANOVA assumption essentially assesses whether the standard deviation of males and females scores are similar (or homogeneous); that is that the males’118.15 standard deviation is not dissimilar from females standard deviation of 101.03 (Table 1). The ANOVA assumption is homogeneity of variance can be assessed with the Levene test. Table 2 shows the resulting Levene test statistic, where a non-significant difference (i.e., sig > .05) indicates no difference between the standard deviations and the assumption is met.


Table 1.

Descriptives

sat





N

Mean

Std. Deviation

male

13

608.5385

118.15288

female

13

506.7692

101.03148

Total

26

557.6538

119.55415


Table 2.

Test of Homogeneity of Variances

sat




Levene Statistic

df1

df2

Sig.

.062

1

24

.806

The Logic of ANOVA

The logic of ANOVA: the logic of ANOVA is to test whether the males mean score (M=608.53) differs from females mean score (M=506.76).

What the ANOVA Results Indicate

What the ANOVA results indicate: the ANOVA (Table 3) shows the resulting F-value (F=5.571) with a significance level of .027, indicating that the F-value would occur by chance less than 3 times in 100. We can than say there is a statistically significant difference between the male and female scores, with male achieving a higher average scores compared to females.


Table 3.


df

F

Sig.

Between Groups

1

5.571

.027

Within Groups

24



Total

25



For a customized, confidential help with ANOVA and/or conducting your statistical analysis, please email us at James@StatisticsSolutions.com or call Statistics Solutions Inc. at (877) 437-8622 for a free 30-minute consultation.


Wednesday, November 26, 2008

Statistical Analysis using Independent Samples t-test

This very common method of statistical analysis allows us to test the difference between two independent means. In layman’s terms, this means we are looking for differences between two groups of participants that are not related. In statistical terms, this means that the scores of the two groups are not correlated.

How is the independent samples t­-test used?

The independent samples t-test can be used when looking for differences between any two groups on a single measure, e.g. differences on SAT scores by gender, differences on GPA by gender, or differences on ACT scores by ethnicity (African American vs. Caucasian).

What types of variables can be used in an independent samples t-test?

There are two variables in an independent samples t-test, an independent variable and a dependent variable. The independent variable is the grouping variable and must be dichotomous or two groups. The dependent variable must be continuous/interval, however, sometimes it’s okay to use ordinal variables, but that is a whole other topic.

What are typical uses of the independent samples t-test?

Often, independent samples t-test are used to look for differences between control and experimental groups. Some research designs employ both an experimental group and a control group with measures before a treatment and after a treatment for both groups. While the statistical test for examining differences within the control group before and after the treatment and within the experimental group before and after the treatment is a dependent samples t-test, the statistical test for examining differences between the control group and the experimental before the treatment and then again after the treatment is an independent samples t-test.

If I am a researcher hoping that my additional math class as a treatment is effective, I am hoping to find that the control and experimental groups are the same pre-treatment, but different post-treatment. Hopefully, my experimental group would have higher math scores post treatment, than my control group. For information on conducting the independent samples t-test in SPSS, please see the information on www.