Psych Stats
Purpose
- Descriptive statistics: describe data sets or extract
characteristics of the group being observed
- Inferential statistics: used to make conclusions (predictions)
about large numbers of individuals when only a small sample
from the larger population is observed.
Deviation
- Measure of variability = an index of diversity in the data distribution
- Deviation = distance from a data point to the mean
- Average deviation is always going to be zero.
- Squaring the deviations is as the standard approach to eliminate negative numbers in Statistics.
- SS: The sum of squared errors is a good measure of the accuracy of the model.
- Average error = SS divided by the number of observations (N)
- Variance = the average error between the mean and the observations made (and so is a measure of how well the model fits the actual data). Mean Squared Deviation.
- Standard Deviation = squared root of mean squared deviation (sqrt variance)
Measures of Central Tendency
- Mean = arithmetic mean ( i.e., the average)
- Mode: the most frequent score in a distribution
- Mode is unaffected by extreme scores.
- Modes are generally useful only in unimodal distributions
- Generally best used for qualitative data (i.e. Nominal scale) - e.g. not numbers
- Median: the middle value of the ordered distribution scores
- The median is unaffected by extreme scores.
- Measure of Central Tendency = a measure that is typical of the set of the data
- Data = a measurement collected on a variable as a consequence of
observation.
- Case = a single unit of observation
- Variable = a certain characteristic of a population that can take
different values.
Population v Sample
- Population = entire group of people
- Parameters = term for summary properties or measures about
population values
- Sample = small subset of the population
- Statistics = term for summary properties or measures of
sample values
- Arithmetic mean = X with bar on top, or M
Z-Scores
- z-score or Standard Score is the deviation of the i-th case divided by the standard deviation
- The z scores are a precise index of how an individual compares with the rest of the group in terms of the distance from the mean.
- They also provide a metric for comparing performance/measurements on completely unrelated scales (Bradman vs Einstein)
- Transforming ANY DISTRIBUTION of raw scores into Z-scores results in a distribution with a MEAN of 0 and a STANDARD DEVIATION of 1
- A negative z-score means that the original score was below the mean. A positive z score means that the original score was above the mean.
- The area between any two z-score values represents the number, proportion or percent of the scores that fall between those two values
Central Limit Theorem
- The Central Limit Theorem: It turns out that this distribution of sample means is always normal
- (even if the population distribution is not normal).
- Mean of all the samples would equal the population mean
- The sampling distribution = frequency distribution of sample means
- Standard Error of the Mean - SEM (or σM) = standard deviation of the sampling distribution
Confidence Interval
- Confidence interval - an estimated range of values which is likely to
include an unknown population parameter
* boundaries within which we believe the true value of the mean will fall.
Determining 99% confidence interval for population mean
Null Hypothesis
- Null hypothesis H0 - refers to some null or conservative state of affairs; the assumption you would make without evidence to the contrary.
- H0 contains the argument “that the observed results occurred by chance due to fluctuations of sampling”.
- Alternative hypothesis H1 - the complement of the null hypothesis. It is not tested
directly but adopted upon a rejection of the null hypothesis. It usually expresses the experimenter’s belief about the parameter being studied
Testing Null Hypothesis:
- t = standardised difference between two means
- Significance level is set at α = 0.05 thus critical t= 1.96
- If |t(mean1-mean1)| > critical t, p < 0.05
Symmetric v Skewed
- Unimodal = one mode = one entry that has the most hits
- Negatively skewed distribution: the left tail is longer, observations are clustered towards higher end of the scale
- Positively skewed distribution: the right tail is longer, observations are clustered towards lower end of the scale
Percentiles
- Percentile rank = the proportion of scores in a distribution that a specific score is greater than or equal to. = (CF/N) * 100
- The percentile rank show how an individual score compares to the others scores in the sample.
- Percentiles are limited because the scores are merely ordered.
- The distance between the scores is not specified.
- Percentile Score: is the score corresponding to a particular percentile rank.
Frequency
- Cumulative frequency - the counts accumulated by the current count and all previous ones, for all scores lower than the score of interest in the interval of interest.
- I.e. what i is the currently score
- Qualitative variables = Attributes of the variable fall into discrete categories; (e.g. gender, favorite color, country of birth)
- Quantitative variables = Attributes of the variable are assigned values that can be anywhere within a range; (e.g. age, weight, height, IQ,speed of driving)
Measurement Scales
- Nominal scale = identity
- Used for cateogorical/discrete data
- Any case can be placed in one and only one category.
- Numbers used as labels; arbitrary
- Ordinal scale = identity + order
- Used where scores can be ranked / ordered;
- There is no objective distance between any two points on your subjective scale.
- Interval scale = identity + order + equidistance
- Measurement at this level allows us to separate objects or events into mutually exclusive categories, arranged in a specific order, and specify the distance between data points
- On this scale numbers are separated by equal-sized intervals but have no meaningful or absolute zero.
- Doesn't do rations - e.g. IQ 140 is not twice as high as IQ 70.
- Ratio scale = identity + order + equidistance + origin
- separate objects or events into mutually exclusive categories
- arranged in a specific order
- specify the distance between data points
- compare ratios constructed from the data.
Graphing
- Horizontal Axis
- also called the abscissa, or X axis
- the values of the variable
- Vertical Axis:
- also called the ordinate, or Y axis
- the frequencies, or proportions or percentages
Class Intervals
When choosing a class interval width one aims to produce a concise picture of the data, with minimal loss of information. Generally, use 6 – 12 intervals of equal width
Correlation
- The Pearson-product moment correlation coefficient r is sensitive only to linear relationships.
- Correlation != Causation. To test causation: Experimental designs are best. Systematically manipulated X and measure Y.
- If we know the correlation between two events (e.g 0.61) and we have a z-score for X, we can work out z-score for Y to be the product of the two known facts.
page revision: 7, last edited: 25 Oct 2012 08:03