Assignment #1: Basic Design, Descriptive Statistics, Frequency Distributions
1) Dr. Dahlberg is interested in word learning in children, in particular mutual exclusivity. Briefly, his theory suggests that when learning word labels for novel objects, children innately believe that an object can only posses one label (they assume mutual exclusivity: a one-to-one mapping between labels and objects). So if you have a label for a ball already but not for a pencil, and I say “pass me the blicket” you are likely to think that the pencil is a blicket since the ball is already called a “ball.”
However, Dr. Dahlberg believes that mutual exclusivity is learned from parental interaction rather than innate. He videotapes parents interacting with their children and identifies parents who only use a single label when referring to an object versus parents who use multiple labels. He then selects 10 children from parents who use multiple labels and 10 children from parents who use single labels for his experiment. Each child is put in a situation with a familiar toy that he/she has a label for and a novel toy he/she has never seen and asked “pass me the X” where X is a novel label like blicket. They do 10 trials with different objects/labels and Dr. Dahlberg records the number of times each child passed him the novel toy.
Children whose parents use single labels handed him the novel object an average of 8 of the 10 trials, whereas children whose parents use multiple labels only handed him the novel object an average of 6.5 of the 10 trials. He concludes that children’s tendency for mutually exclusive labeling is completely due to parents’ behavior, and not an innate predisposition for exclusivity when learning.
Briefly, respond to the following:
a) What type of study is this (Experiment, Quasi-experiment, Correlational study) and why?
b) Identify the independent and dependent variable. What is the scale of measurement for each?
c) For the independent variable, identify the type of variable (between/within subjects)?
d) What are two confounds with his study, or, give two alternative interpretations for his conclusions. If you wanted to fix these problems, how would you do it?
However, Dr. Dahlberg believes that mutual exclusivity is learned from parental interaction rather than innate. He videotapes parents interacting with their children and identifies parents who only use a single label when referring to an object versus parents who use multiple labels. He then selects 10 children from parents who use multiple labels and 10 children from parents who use single labels for his experiment. Each child is put in a situation with a familiar toy that he/she has a label for and a novel toy he/she has never seen and asked “pass me the X” where X is a novel label like blicket. They do 10 trials with different objects/labels and Dr. Dahlberg records the number of times each child passed him the novel toy.
Children whose parents use single labels handed him the novel object an average of 8 of the 10 trials, whereas children whose parents use multiple labels only handed him the novel object an average of 6.5 of the 10 trials. He concludes that children’s tendency for mutually exclusive labeling is completely due to parents’ behavior, and not an innate predisposition for exclusivity when learning.
Briefly, respond to the following:
a) What type of study is this (Experiment, Quasi-experiment, Correlational study) and why?
b) Identify the independent and dependent variable. What is the scale of measurement for each?
c) For the independent variable, identify the type of variable (between/within subjects)?
d) What are two confounds with his study, or, give two alternative interpretations for his conclusions. If you wanted to fix these problems, how would you do it?
2) In a normal distribution, the mean, median, and mode are all the same value. What is the relation of the mean, median, and mode in a negatively skewed distribution?
3) Dr. Merrick is interested in problem solving. She gives 10 subjects a classic missionaries and cannibals problem and records how many minutes it takes them to produce the solution. Either using SPSS or using the old-fashioned technique of calculating by hand, compute the mean, median, mode, and standard deviation. The data follow: 7, 4, 1, 3, 2, 5, 2, 8, 5, 3
For your interest, here are her instructions:
Three missionaries and three cannibals are on one side of a river and need to cross to the other side. The only means of crossing is a boat, and the boat can only hold two people at a time. Devise a set of moves that will transport all six people across the river, bearing in mind the following constraints: The number of cannibals can never exceed the number of missionaries in any location, for the obvious reason that the cannibals will outnumber and eat the missionaries. Remember that someone will have to row the boat back across each time.
For your interest, here are her instructions:
Three missionaries and three cannibals are on one side of a river and need to cross to the other side. The only means of crossing is a boat, and the boat can only hold two people at a time. Devise a set of moves that will transport all six people across the river, bearing in mind the following constraints: The number of cannibals can never exceed the number of missionaries in any location, for the obvious reason that the cannibals will outnumber and eat the missionaries. Remember that someone will have to row the boat back across each time.
4) Dr. Riopelle is interested in problem solving under stress. Hence, the obvious study is to have 100 subjects solve an anagram problem while immersed in a pit of snakes. Briefly, subjects are to create as many words as possible within three minutes using the letters from the word statistician. So, valid examples would be can, sat, is, cist, etc. The number of correct words produced in three minutes for each subject can be found in the following file:
https://www.dropbox.com/s/dht0u207t2n1mv4/Assign1a.txt?dl=0
Using SPSS (or any other software), create a frequency table to show the number of subjects who remembered words at each point on the measurement scale. Open a new syntax window and read in the data file. Name the variable something like N_Words. Next in your syntax window, type:
FREQUENCIES VARIABLES = n_words.
Highlight this syntax, and click on the run arrow (>) at the top of the syntax editor (or you can just hit CTRL+R to run a current selection). In the output viewer, you should now have a frequency table. In a text document, answer the following:
5) Download the data file from https://www.dropbox.com/s/4qmmwtqsuq9tnrf/Assign1b.txt?dl=0
This file contains the frequency of occurrence of 1000 concrete and 1000 abstract nouns in a large sample of the English language.
Read in the data file. The file is delimited, and there are only two variables, in the order type, freq. Type is a nominal dummy variable. Using the value labels command add in the text labels for the values (1 = abstract, 2 = concrete).
Using the Frequencies procedure, next let's create a histogram with normal curve. By the syntax method, type in the following to your editor:
FREQUENCIES VARIABLES=freq
/FORMAT=NOTABLE
/STATISTICS=ALL
/HISTOGRAM NORMAL.
In the first line, we specify the variables of interest (only freq here....or whatever you've named it). In the second line, I asked SPSS to suppress the frequency tables. Since there are a large number of values on this scale, I don't want to see the number of words at a particular frequency; I would rather see this pattern in a histogram. In the statistics line, I have specified computing all possible descriptives; alternatively, I could specify those of interest (MEAN, SDDEV). In the final line, I have asked for a histogram with a normal curve fit to it. Highlight this syntax, and click on the run arrow (>) at the top of the syntax editor (or you can just hit CTRL+R to run a current selection).
Look at the histogram: How would you describe the shape of this distribution in terms of skewness? In the Statistics table, you will notice values for our three measures of central tendency--the mean is heavily affected by the outliers. This is common for a frequency or RT distribution...the scale is ratio so it cannot go below zero, and we almost always have a few extreme scores pulling out the bulk of smaller scores (the pattern you are looking at is known as Zipf's law of word frequencies in the English language).
Finally, let's examine the mean for abstract and concrete words separately using the Means procedure. On the menu, you can click Analyze > Compare Means > Means (but remember to paste anything you do to the syntax editor for a record of what you did). Or, just type the following into your syntax editor:
MEANS TABLES=freq BY type
/CELLS MEAN COUNT STDDEV.
Highlight and run this syntax to produce a table with the mean and standard deviation for each group of words separately. If you ran your value labels correctly, your table should have the labels abstract and concrete rather than 1 and 2. Which group of words has the highest frequency of occurrence in the English language?
https://www.dropbox.com/s/dht0u207t2n1mv4/Assign1a.txt?dl=0
Using SPSS (or any other software), create a frequency table to show the number of subjects who remembered words at each point on the measurement scale. Open a new syntax window and read in the data file. Name the variable something like N_Words. Next in your syntax window, type:
FREQUENCIES VARIABLES = n_words.
Highlight this syntax, and click on the run arrow (>) at the top of the syntax editor (or you can just hit CTRL+R to run a current selection). In the output viewer, you should now have a frequency table. In a text document, answer the following:
- What is the range for this dataset?
- What is the most frequently occurring score?
- If I selected a person at random from this data set, what is the probability that person would have produced 20 or more words?
5) Download the data file from https://www.dropbox.com/s/4qmmwtqsuq9tnrf/Assign1b.txt?dl=0
This file contains the frequency of occurrence of 1000 concrete and 1000 abstract nouns in a large sample of the English language.
Read in the data file. The file is delimited, and there are only two variables, in the order type, freq. Type is a nominal dummy variable. Using the value labels command add in the text labels for the values (1 = abstract, 2 = concrete).
Using the Frequencies procedure, next let's create a histogram with normal curve. By the syntax method, type in the following to your editor:
FREQUENCIES VARIABLES=freq
/FORMAT=NOTABLE
/STATISTICS=ALL
/HISTOGRAM NORMAL.
In the first line, we specify the variables of interest (only freq here....or whatever you've named it). In the second line, I asked SPSS to suppress the frequency tables. Since there are a large number of values on this scale, I don't want to see the number of words at a particular frequency; I would rather see this pattern in a histogram. In the statistics line, I have specified computing all possible descriptives; alternatively, I could specify those of interest (MEAN, SDDEV). In the final line, I have asked for a histogram with a normal curve fit to it. Highlight this syntax, and click on the run arrow (>) at the top of the syntax editor (or you can just hit CTRL+R to run a current selection).
Look at the histogram: How would you describe the shape of this distribution in terms of skewness? In the Statistics table, you will notice values for our three measures of central tendency--the mean is heavily affected by the outliers. This is common for a frequency or RT distribution...the scale is ratio so it cannot go below zero, and we almost always have a few extreme scores pulling out the bulk of smaller scores (the pattern you are looking at is known as Zipf's law of word frequencies in the English language).
Finally, let's examine the mean for abstract and concrete words separately using the Means procedure. On the menu, you can click Analyze > Compare Means > Means (but remember to paste anything you do to the syntax editor for a record of what you did). Or, just type the following into your syntax editor:
MEANS TABLES=freq BY type
/CELLS MEAN COUNT STDDEV.
Highlight and run this syntax to produce a table with the mean and standard deviation for each group of words separately. If you ran your value labels correctly, your table should have the labels abstract and concrete rather than 1 and 2. Which group of words has the highest frequency of occurrence in the English language?