Main menu
Market Research App has four menu choices for statistical procedures
(Counts, Percents, Means, and Sampling). Most of these procedures are called
inferential because data from a sample is used to infer to a
population.
The Counts menu item contains routines to analyze a table of counts, compute Fisher's exact probability for two-by-two tables, use the binomial distribution to predict the probability of a specified outcome, and the poisson distribution to test the likelihood of observing a specific number of events.
The Percents menu item is used to compare percents drawn from one sample, two samples,
and to calculate confidence intervals around a percent.
The Means menu item is used to compare two means to each other, calculate a confidence interval around a mean, compare a sample mean to a population mean, compare two standard deviations to each other, and compare three or more standard deviations.
The Sampling menu item is used to determine the required sample size for a study
and to determine the margin of error for a given sample size.
The Help menu item is used to get this on-line help.
Finite population correction
In a typical research scenario the population is very large compared to the sample.
The researcher draws a small sample from the population, conducts the research
on the sample, and then infers the results back to the entire population. As
long as the sample is small compared to the population (less than 10%), all
the statistical techniques will be accurate. This encompasses most research
studies.
Although rare, sometimes the sample is larger than 10% of the population. In
these situations, adjustments must be made to the formulas or they will be
inaccurate. This software has fields to enter the population
size in procedures that are susceptible to this kind of error. The adjustment is
called finite population correction.
If the population is larger that 10% of the sample, the correction is
unnecessary. Leave the population size blank to ignore the correction. However,
if the sample size is more than 10% of the population size, the population size
should be specified. Finite population correction is incorporated into all
relevant formulas.
Basic concepts
We understand the world by asking questions and searching for answers. Our construction of reality depends on the nature of our inquiry.
All research begins with a question. Some questions are not testable. The classic philosophical example is to ask, "How many angels can dance on the head of a pin?" While the question might elicit profound and thoughtful revelations, it clearly cannot be tested by empirical research.
Defining the goals and objectives of a research project is one of the most important steps in the research process. Do not underestimate the importance of this step. Clearly stated goals keep a research project focused. The process of
goal definition usually begins by writing the broad and general goals of the study. As the process continues, the goals become more clearly defined and the research issues are narrowed.
Exploratory research (e.g., literature reviews, talking to people, and
focus groups) goes hand-in-hand with the goal clarification process. The
literature review is especially important because it obviates the need to
reinvent the wheel for every new research question.
The research question itself can be stated as a hypothesis. A hypothesis
is simply the investigator's belief about a problem. Typically, a researcher
formulates an opinion during the literature review process. If you are uncertain
how to state a hypothesis, begin a sentence with "I believe" .... and then remove
the "I believe" portion. That's the hypothesis.
The hypothesis is converted into a null hypothesis in order to make it
testable because the only way to test a hypothesis is to eliminate alternatives
of the hypothesis. Statistical techniques will enable us to reject or fail to
reject a null hypothesis, but they do not provide us with a way to accept a
hypothesis. Therefore, all hypothesis testing is indirect.
Creating the research design
Defining a research problem provides a format for further investigation. A
well-defined problem points to a method of investigation. There is no one best
method of research for all situations. Rather, there are a mra_style variety of
techniques for the researcher to choose from. Often, the selection of a
technique involves a series of trade-offs. For example, there is often a
trade-off between cost and the quality of information obtained. Time constraints
sometimes force a trade-off with the overall research design. Budget and time
constraints must always be considered as part of the design process.
Methods of research
There are three basic methods of research: 1) survey, 2) observation, and 3) experiment. Each method has its advantages and disadvantages.
The survey is the most common method of gathering information in the social sciences. It can be a face-to-face interview, telephone, or mail survey. A personal interview is one of the best methods obtaining personal, detailed, or in-depth information. It usually involves a lengthy questionnaire that the interviewer fills out while asking questions. It allows for extensive probing by the interviewer and gives respondents the ability to elaborate their answers. Telephone interviews are similar to face-to-face interviews. They are more efficient in terms of time and cost, however, they are limited in the amount of in-depth probing that can be accomplished, and the amount of time that can be allocated to the interview. Mail surveys and online surveys are generally the most cost effective interview method. The researcher can obtain opinions, but trying to meaningfully probe opinions is difficult. Online surveys can be conducted more quickly than mail surveys.
Observation research monitors respondents' actions without directly interacting with them. It has been used for many years by A.C.
Nielsen to monitor television viewing habits. Psychologists often use one-way
mirrors to study behavior. Anthropologists and social scientists often study
societal and group behaviors by simply observing them. The fastest growing form
of observation research has been made possible by the bar code scanners at cash
registers, where purchasing habits of consumers can now be automatically
monitored and summarized.
In an experiment, the investigator changes one or more variables over the course
of the research. When all other variables are held constant (except the one
being manipulated), changes in the dependent variable can be explained by the
change in the independent variable. It is usually very difficult to control all
the variables in the environment. Therefore, experiments are usually
restricted to laboratory models where the investigator has more control over all
the variables.
Sampling
It is incumbent on the researcher to clearly define the target population. There
are no strict rules to follow, and the researcher must rely on logic and
judgment. The population is defined as the largest scope of people to
which the study result will be inferred.
Sometimes, the entire population will be sufficiently small, and the researcher can include the entire population in the study. This type of research is called a
census because data is gathered on every member of the population. When you conduct a census, no inferential statistics are necessary. If you see a difference or relationship, it's reliable.
Usually, the population is too large for the researcher to attempt to survey all of its members. A small, but carefully chosen
sample can be used to represent the population. A well-chosen sample reflects the characteristics of the population from which it is drawn.
Sampling methods are classified as either probability or nonprobability. In probability sampling, each member of the population has a known non-zero probability of being selected. Probability methods include random sampling, systematic sampling, and stratified sampling. In nonprobability sampling, members are selected from the population in some nonrandom manner. These include convenience sampling, judgment sampling, quota sampling, and snowball sampling. The advantage of probability sampling is that sampling error can be calculated.
Sampling error (often called margin of error) is the degree to which a sample might differ from the population. When inferring to the population, results are reported plus or minus the sampling error. In nonprobability sampling, the degree to which the sample differs from the population remains
unknown.
Random sampling is the purest form of probability sampling. Each member
of the population has an equal and known chance of being selected. When there
are very large populations, it is often difficult or impossible to identify
every member of the population, so the pool of available subjects becomes
biased.
Systematic sampling is often used instead of random sampling. It is also called an
Nth name selection technique. After the required sample size has been calculated, every Nth record is selected from a list of population members. As long as the list does not contain any hidden order, this sampling method is as good as the random sampling method. Its only advantage over the random sampling technique is simplicity. Systematic sampling is frequently used to select a specified number of records from a computer file.
Stratified sampling is commonly used probability method that is superior to random sampling because it reduces sampling error. A stratum is a subset of the population that share at least one common characteristic. The researcher first identifies the relevant stratums and their actual representation in the population. Random sampling is then used to select subjects from each stratum until the number of subjects in that stratum is proportional to its frequency in the population. Stratified sampling is often used when one or more of the stratums in the population have a low incidence relative to the other stratums.
Convenience sampling is used in exploratory research where the researcher is interested in getting an inexpensive approximation of the truth. As the name implies, the sample is selected because they are convenient. This nonprobability method is often used during preliminary research efforts to get a gross estimate of the results, without incurring the cost or time required to select a probability sample.
Judgment sampling is a common nonprobability method. The researcher selects the sample based on judgment. This is usually an extension of convenience sampling. For example, a researcher may decide to draw the entire sample from one "representative" city, even though the population includes many cities. When using this method, the researcher must be confident that the chosen sample is truly representative of the entire population.
Quota sampling is the nonprobability equivalent of stratified sampling. Like stratified sampling, the researcher first identifies the stratums and their proportions as they are represented in the population. Then convenience or judgment sampling is used to select the required number of subjects from each stratum. This differs from stratified sampling, where the stratums are filled by random sampling.
Snowball sampling is a special nonprobability method used when the desired sample characteristic is rare. It may be extremely difficult or cost prohibitive to locate respondents in these situations. Snowball sampling relies on referrals from initial subjects to generate additional subjects. While this technique can dramatically lower cost, it comes at the expense of introducing bias because the technique itself reduces the likelihood that the sample will represent a good cross section from the population.
Data collection
There are very few hard and fast rules to define the task of data collection.
Each research project uses a data collection technique appropriate to the
particular research methodology. The two primary goals for both quantitative and
qualitative studies are to maximize response and maximize accuracy.
When using an outside data collection service, researchers often validate the data collection process by contacting a percentage of the respondents to verify that they were actually interviewed. Data editing and cleaning involves the process of checking for inadvertent errors in the data. This usually entails using a computer to check for out-of-bounds data.
Quantitative studies employ deductive logic, where the researcher starts with a hypothesis, and then collects data to confirm or refute the hypothesis.
Qualitative studies use inductive logic, where the researcher first designs a study and then develops a hypothesis or theory to explain the results of the analysis.
Quantitative analysis is generally fast and inexpensive. A mra_style assortment of statistical techniques are available to the researcher. Computer software is readily available to provide both basic and advanced multivariate analysis. The researcher simply follows the preplanned analysis process, without making subjective decisions about the data. For this reason, quantitative studies are usually easier to execute than qualitative studies.
Qualitative studies nearly always involve in-person interviews, and are therefore very labor intensive and costly. They rely heavily on a researcher's ability to exclude personal biases. The interpretation of qualitative data is often highly subjective, and different researchers can reach different conclusions from the same data. However, the goal of qualitative research is to develop a hypothesis--not to test one. Qualitative studies have merit in that they provide broad, general theories that can be examined in future research.
The most important consideration in preparing any research report is the nature of the audience. The purpose is to communicate information, and therefore, the report should be prepared specifically for the readers of the report. Sometimes the format for the report will be defined for the researcher (e.g., a thesis or dissertation), while other times, the researcher will have complete latitude regarding the structure of the report. Reports usually contain an abstract, problem statement, methods section, results section, and a discussion of the results.
Validity
Validity refers to the accuracy or truthfulness of a measurement. Are we
measuring what we think we are? This is a simple concept, but in reality, it is
extremely difficult to determine if a measure is valid. There is no mathematical
or statistical test to ascertain validity. It is always subjective.
Face validity is based solely on the judgment of the researcher. Each
question is scrutinized and modified until the researcher is satisfied that it
is an accurate measure of the desired construct. The determination of face
validity is based on the subjective opinion of the researcher.
Content validity is similar to face validity in that it relies on the judgment of the researcher. However, where face validity only evaluates the individual items on an instrument, content validity goes further in that it attempts to determine if an instrument provides adequate coverage of a topic. Expert opinions, literature searches, and open-ended pretest questions help to establish content validity.
Criterion-related validity can be either predictive or concurrent. When a dependent/independent relationship has been established between two or more variables, criterion-related validity can be assessed. A mathematical model is developed to be able to predict the dependent variable from the independent variable(s).
Predictive validity refers to the ability of an independent variable (or
group of variables) to predict a future value of the dependent variable.
Concurrent validity is concerned with the relationship between two or more
variables at the same point in time.
Construct validity refers to the theoretical foundations underlying a particular scale or measurement. It looks at the underlying theories or constructs that explain a phenomena. This is also quite subjective and depends heavily on the understanding, opinions, and biases of the researcher.
Reliability
Reliability is synonymous with repeatability. A measurement that yields
consistent results over time is said to be reliable. When a measurement is prone
to random error, it lacks reliability. The reliability of an instrument places
an upper limit on its validity. A measurement that lacks reliability will
necessarily be invalid. There are three basic methods to test reliability:
test-retest, equivalent form, and internal consistency.
A test-retest measure of reliability can be obtained by administering the same instrument to the same group of people at two different points in time. The degree to which both administrations are in agreement is a measure of the reliability of the instrument. This technique for assessing reliability suffers two possible drawbacks. First, a person may have changed between the first and second measurement. Second, the initial administration of an instrument might in itself induce a person to answer differently on the second administration.
The second method of determining reliability is called the equivalent-form technique. The researcher creates two different instruments designed to measure identical constructs. The degree of correlation between the instruments is a measure of equivalent-form reliability. The difficulty in using this method is that it may be very difficult (and/or prohibitively expensive) to create a totally equivalent instrument.
The most popular methods of estimating reliability use measures of internal consistency. When an instrument includes a series of questions designed to examine the same construct, the questions can be arbitrarily split into two groups. The correlation between the two subsets of questions is called the
split-half reliability. The problem is that this measure of reliability changes depending on how the questions are split. A better statistic, known as Cronbach's alpha, is based on the mean (absolute value) interitem correlation for all possible variable pairs. It provides a conservative estimate of reliability, and generally represents the lower bound to the reliability of a scale of items. For dichotomous nominal data, the KR-20 (Kuder-Richardson) is used instead of Cronbach's alpha.
Statistical significance
What does statistical significance really mean? Many researchers get
excited when they discover a "significant" finding, without really
understanding what it means. When a statistic is statistically significant, it
simple means that you are very sure that the statistic is reliable. It
doesn't mean the finding is important. Don't confuse statistical
significance (reliable) with the non-technical word significant (important).
For example, suppose we give 1,000 people an IQ test, and we ask if there is a
statistically significant difference between male and female scores. The mean
score for males is 98 and the mean score for females is 100. We use a t-test and find that the difference is significant at the .001 level. The big question is, "So what?". The difference between 98 and 100 on an IQ test is a very small difference...so small, in fact, that its not even important.
Then why did the t-statistic come out significant? Because there was a large sample size. When you have a large sample size, very small differences will be detected as significant. This means that you are very sure that the difference is real (i.e., it didn't happen by fluke). It doesn't mean that the difference is large or important. If we had only given the IQ test to 10 people instead of 1,000, the two-point difference between males and females would not have been statistically significant.
Statistical significance tells how certain you are that a difference or relationship exists. To say that a
statistically significant difference or relationship exists only tells half the story. We might be very sure that a relationship is real, but is it a strong, moderate, or weak relationship? After finding a
statistically significant relationship, it is important to evaluate its strength. Statistically
significant relationships can be strong or weak. Statistically significant differences can be large or small. It just depends on your sample size.
When a statistic is not statistically significant it means that the difference
or relationship you are observing is not reliable (i.e., it could be a fluke).
Stop there. Don't fall into the trap of saying that a difference would probably
be significant if the sample size was larger. While it's possible that the
statement would be correct, it's equally possible that the observed difference
was a fluke and would disappear with a larger sample.
One-tailed and two-tailed probabilities
One important concept in significance testing is whether to use a one-tailed or
two-tailed test of significance. The answer is that it depends on your
hypothesis.
When your research hypothesis states (or implies) the direction of the
difference or relationship, then you use a one-tailed probability. For example,
a one-tailed test would be used to test these null hypotheses. In each case, the
null hypothesis (indirectly) predicts the direction of the expected difference:
Females will not score significantly higher than males on an IQ test.
Blue collar workers will not have significantly lower education than white
collar workers.
Superman is not significantly stronger than the average person.
In a two tailed test, we only ask whether or not there is a difference, without
regard to the nature of the difference. A two-tailed test would be used to test
these null hypotheses:
There will be no significant difference in IQ scores between males and females.
There will be no significant difference between blue collar and white collar
workers.
There is no significant difference in strength between Superman and the average
person.
This software always reports the two-tailed probability. A one-tailed probability is exactly half the value of a two-tailed probability. Thus, if you have a one-tailed research question, you should divide the
reported probability by two.
There is a raging controversy (for about the last hundred years) on whether or
not it is ever appropriate to use a one-tailed test. The rationale is that if
you already know the direction of the difference, why bother doing any
statistical tests. The safest bet is to always state your hypotheses so that
two-tailed tests are appropriate.
Type I and Type II errors
There are two types of hypothesis testing errors. The first one is called a
Type I error. This is a very serious error where you wrongly reject the null
hypothesis. Suppose that the null hypothesis is: Daily administrations of drug
ABC will not help patients. Also suppose that drug ABC is really a very bad
drug, and it causes permanent brain damage to people over 60. In your research,
you ask for volunteers, and all of the sample is under 60 years of age. The
sample seems to improve and you reject the null hypothesis. There could be very
serious consequences if you were to market this drug (based on your sample).
Type I errors are often caused by sampling problems.
A Type II error is usually less serious, where you wrongly fail to reject the null hypothesis. Suppose that drug ABC really isn't harmful and does actually help many patients, but several of your volunteers develop severe and persistent psychosomatic symptoms. You would probably not market the drug because of the potential for long-lasting side effects. Usually, the consequences of a Type II error will be less serious than a Type I error.
Procedure for significance testing
Whenever we perform a significance test, it involves comparing a test value that
we have calculated to some critical value for the statistic. It doesn't
matter what type of statistic we are calculating (e.g., a t-statistic, a
chi-square statistic, an F-statistic, etc.), the procedure to test for
significance is the same.
Before doing any statistical tests, decide on what you will use as a
critical alpha level. The critical
alpha level is the error rate that you're willing to accept. An alpha level of
.05 (i.e., you're willing to accept a 5% error) is often cited as some kind of
"magical" number, when in fact, it's just arbitrary (using .05 is simply
tradition).
Choose an alpha level appropriate to your research. Usually, this means that you
choose an alpha level based on the consequences of making a Type I error. A Type
I error is when you wrongly reject the null hypothesis.
When you find a significant difference or relationship, resources will often be
allocated to "fix the problem". If you're wrong (i.e., there really isn't a difference
or relationship), resources will be allocated to fix a problem that doesn't
exist. In other words, time and money will be wasted. The alpha level describes
how certain you want to be of your finding before you allocate resources to fix
the problem.
1. Decide on the critical alpha level you will use (i.e., the error rate
you are willing to accept).
2. Conduct the research.
3. Calculate the test statistic.
4. Compare the probability of the statistic to the critical alpha level.
If the probability is less than the critical alpha level:
Your finding is significant.
You reject the null hypothesis.
The probability is small that the difference or relationship happened
by chance, and p is less than the critical alpha level (p < alpha).
If the probability is higher than the critical alpha level:
Your finding is not significant.
You fail to reject the null hypothesis.
The probability is high that the difference or relationship happened
by chance, and p is greater than the critical alpha level (p > alpha).
In other words, compare the probability of the test statisiic to your critical alpha level. If the probability is less than the critical alpha level you've chosen, your finding is statistically significant. If the probability is greater than your critical alpha level, your finding is not statistically significant.
Performing multiple tests
Bonferroni's theorem states that as one performs an increasing number of statistical tests, the likelihood of getting an erroneous significant finding (Type I error) also increases. Thus, as we perform more and more statistical tests, it becomes increasingly likely that we will falsely reject a null hypothesis (very bad).
For example, suppose our critical alpha level is .05. If we performed one statistical test, our chance of making a false statement is .05. If we were to perform 100 statistical tests, and we made a statement about the result of each test, we would expect five of them to be wrong (just by fluke). This is a rather undesirable situation for social scientist.
Bonferroni's theorem states that we need to adjust the critical alpha level in order to compensate for the fact that we're doing more than one test. To make the adjustment, take the desired critical alpha level (e.g., .05) and divide by the number of tests being performed, and use the result as the critical alpha level. For example, suppose we had a test with eight scales, and we plan to compare males and females on each of the scales using an independent groups t-test. We would use .00625 (.05/8) as the critical alpha level for all eight tests.
Bonferroni's theorem should be applied whenever you are conducting two or more tests that are of the same "type" and the same "family". The same "type" means the same kind of statistical test. For example, if you were going to do one t-test, one ANOVA, and one regression, you would not make the adjustment because the tests are all different. The same "family" is a more elusive concept, and there are no hard and fast rules. "Family" refers to a series of statistical tests all designed to test the same (or very closely related) theoretical constructs. The bottom line is that it's up to the individual researcher to decide what constitutes a "family".
Some things are more obvious than others, for example, if you were doing t-tests comparing males and females on a series of questionnaire items that are all part of the same scale, you would probably apply the adjustment, by dividing your critical alpha level by the number of items in the scale (i.e., the number of t-tests you performed on that scale). The probabilities of the tests would be called the
family error rates. However, suppose you have a series of independent questions, each focusing on a different construct, and you want to compare males and females on how they answered each question. Here is where the whole idea of Bonferroni's adjustment becomes philosophical. If you claim that each t-test that you perform is a test of a unique "mini"-hypothesis, then you would not use the adjustment, because you have defined each question as a different "family". In this case, the probability would be called a
statement error rate. Another researcher might call the entire questionnaire a "family", and she would divide the critical alpha by the total number of items on the questionnaire.
In the real world, most researchers do not use Bonferroni's adjustment because they would rarely be able to reject a null hypothesis. They would be so concerned about the possibility of making a false statement, that they would overlook many differences and relationships that actually exist. The "prime directive" for social science research is to discover relationships. One could argue that it is better to risk making a few wrong statements, than to overlook relationships or differences that are clear or prominent, but don't meet critical alpha significance level after applying Bonferroni's adjustment.
Central tendency
The best known measures of central tendency are the mean and median. The
mean average is found by adding the values for all the cases and dividing by the
number of cases. For example, to find the mean age of all your friends, add all
their ages together and divide by the number of friends. The mean average can
present a distorted picture of central tendency if the sample is skewed in any
way.
For example, let's say five people take a test. Their scores are 10, 12, 14, 18, and 94. (The last person is a genius.) The mean would be the sums of the scores 10+12+14+18+94 divided by 5. In this example, a mean of 29.6 is not a good measure of how well people did on the test in general. When analyzing data, be careful of using only the mean average when the sample has a few very high or very low scores. These scores tend to skew the shape of the distribution and will distort the mean.
When you have sampled from the population, the mean of the sample is also your best estimate of the mean of the population. The actual mean of the population is unknown, but the mean of the sample is as good an estimate as we can get.
The median provides a measure of central tendency such that half the sample will be above it and half the sample will be below it. For skewed distributions this is a better measure of central tendency. In the previous example, 14 would be the median for the sample of five people. If there is no middle value (i.e., there are an even number of data points), the median is the value midway between the two middle values.
The distribution of many variables follows that of a bell-shaped curve. This is called a
normal distribution. One must assume that data is approximately normally distributed for many statistical analyses to be valid. When a distribution is normal, the mean and median will be equal to each other. If they are not close to each other, the distribution is distorted in some way.
Variability
Variability is synonymous with diversity. The more diversity there is in
a set of data, the greater the variability. One simple measure of diversity is
the range (maximum value minus the minimum value). The range is generally not a
good measure of variability because it can be severely affected by a single very
low or high value in the data. A better method of describing the amount of
variability is to talk about the dispersion of scores away from the mean. The
variance and standard deviation are useful statistics that measure
the dispersion of scores around the mean. The standard deviation is the
square root of the variance. Both statistics measure the amount of diversity in
the data. The higher the statistics, the greater the diversity. As a rule of thumb,
68 percent of all the scores in a sample will be within plus or minus one
standard deviation of the mean and 95 percent of all scores will be within two
standard deviations of the mean.
There are two formulas for the variance and standard deviation of a sample. One set of formulas calculates the exact variance and standard deviation of the sample. The statistics are called
biased, because they are biased to the sample. They are the exact variance and standard deviation of the sample, but they tend to underestimate the variance and standard deviation of the population.
Generally, we are more concerned with describing the population rather than the sample. Our intent is to use the sample to describe the population. The
unbiased estimates should be used when sampling from the population and inferring back to the population. They provide the best estimate of the variance and standard deviation of the population.
All fields requesting standard deviation in this software are referring to the unbiased estimate. The biased and unbiased estimate of a standard deviation will nearly be identical unless the sample size is very small.
Standard error of the mean
The standard error of the mean is used to estimate the range within which we
would expect the mean to fall in repeated samples taken from the population
(i.e., confidence intervals). The standard error of the mean is an estimate of
the standard deviation of those repeated samples.
Inferences with small sample sizes
When the sample size is small (less than 30), the z value for the area under the
normal curve is not accurate. Instead of a z value, we can use a t value to
derive the area under the curve. In fact, many researchers always use the t
value instead of the z value. The reason is that the t values are more accurate
for small sample sizes, and they are nearly identical to the z values for large
sample sizes. Unlike the z value, the values for t depend upon the number of
cases in the sample. Depending on the sample size, the t value will change.
Degrees of freedom
Degrees of freedom literally refers to the number of data values that are free
to vary. For example, suppose I tell you that the mean of a sample is 10, and
there are a total of three values in the sample. It turns out that if I tell you
any two of the values, you will always be able to figure out the third value. If
two of the values are 8 and 12, you can calculate that the third value is 10
using simple algebra. (x + 8 + 12) / 3 = 10 x = 10
In other words, if you know the mean, and all but one value, you can figure out
the missing value. All the values except one are free to vary. One value is set
once the others are known. Thus, degrees of freedom is equal to n-1.
Systematic and random error
Most research is an attempt to understand and explain variability. When a measurement lacks variability, no statistical tests can be (or need be) performed.
Variability refers to the dispersion of scores.
That is, it describes how much diversity there is in the data. Ideally, when a researcher finds differences between respondents, they are due to true difference on the variable being measured. However, the combination of
systematic and random errors can dilute the accuracy of a measurement. Systematic error is introduced through a constant bias in a measurement. It can usually be traced to a fault in the sampling procedure or in the design of a questionnaire. Random error does not occur in any consistent pattern, and it is not controllable by the researcher.
Formulating hypotheses from research questions
There are basically two kinds of research questions: testable and non-testable.
Neither is better than the other, and both have a place in applied research.
Examples of non-testable questions are:
How do managers feel about the reorganization?
What do residents feel are the most important problems facing the community?
Respondents' answers to these questions could be summarized in descriptive tables and the results might be extremely valuable to administrators and planners. Business and social science researchers often ask non-testable research questions. The shortcoming with these types of questions is that they do not provide objective cut-off points for decision-makers.
In order to overcome this problem, researchers often seek to answer one or more testable research questions. Nearly all testable research questions begin with one of the following two phrases:
Is there a significant difference between ...?
Is there a significant relationship between ...?
For example:
Is there a significant relationship between the age of managers and their attitudes towards the reorganization?
Is there a significant difference between white and minority residents with respect to what they feel are the most important problems facing the community?
A research hypothesis is a testable statement of opinion. It is created from the research question by replacing the words "Is there" with the words "There is", and also replacing the question mark with a period. The hypotheses for the two sample research questions would be:
There is a significant relationship between the age of managers and their attitudes towards the reorganization.
There is a significant difference between white and minority residents with respect to what they feel are the most important problems facing the community.
It is not possible to test a hypothesis directly. Instead, you must turn the hypothesis into a null hypothesis. The null hypothesis is created from the hypothesis by adding the words "no" or "not" to the statement. For example, the null hypotheses for the two examples would be:
There is no significant relationship between the age of managers and their attitudes towards the reorganization.
There is no significant difference between white and minority residents with respect to what they feel are the most important problems facing the community.
All statistical testing is done on the null hypothesis...never the hypothesis. The result of a statistical test will enable you to either 1) reject the null hypothesis, or 2) fail to reject the null hypothesis. Never use the words "accept the null hypothesis".