MINISTAT: A Statistical Package for the Commodore 64 Copyright 1989 by Jon Rich, Ph.D. MINISTAT is a statistical package program which performs both univariate and bivariate inferential and descriptive statistics. A particularly useful feature of this package is that the data need to be entered only once. Once the data file has been set up, one may perform any of the included statistical tests on any of the variables. A MINISTAT data file is a two dimensional array, or table, of data. One dimension is the variables. These may be subject characteristics, such as sex or race, subject measurements, such as height, test scores, or running speed, or any other characteristic on which subjects vary. The other dimension is the cases, or subject number. Data for a typical MINISTAT file is shown below: Variables SEX RACE HT. WT1 WT2 1 1 1 68 143 140 Case 2 2 2 60 105 103 No. 3 1 3 69 162 153 4 1 2 70 168 160 5 2 3 63 115 118 6 2 1 65 123 125 7 1 2 69 149 147 8 2 2 67 145 140 9 1 3 67 123 119 10 2 1 64 122 114 These data are from a test of a weight-loss diet. For each of the ten persons in the test, the researcher has recorded the sex (1=male, 2=female), the race (1=Black, 2=White, 3=Oriental), the height in inches, and the weight before (WT1) and after (WT2) the diet. Using MINISTAT, we can answer a number of questions about these data. STARTING THE PROGRAM Start the program by entering LOAD "MINISTAT",8 and then entering RUN. At the title screen you will be given the opportunity to toggle the color between black on white and white on blue by pressing the space bar. Choose the color combination which is easiest to read, and then proceed to the main menu by pressing "C". You will then see the main menu, which looks like this: SELECT A) SAVE 1) DESC B) INFO 2) FREQ C) OLD 3) REGR D) DIR 4) CHI2 E) NEW 5) T:UR F) KILL 6) T:RS G) COMP 7) ALPHA H) HELP At this point, there is no data file loaded into the program. The only options on the menu which will work are "C", which will allow you to retrieve a previously saved file, "D", which will display the catalog of previously saved MINISTAT files, "E", which will allow you to input a new file, "F", which will erase a previously saved file, and "H", which will allow you to view the help files. SETTING UP THE FILE (Option "E") To set up a new file, hit "E" at the main menu. You will be asked for a file name. This can be anything you wish that you can easily associate with your study. We will name this file "DIET." Next you are asked "N VARS?" This means "How many variables are in the file?" In our example there are five variables, so we enter the number 5. We are then asked for N, and we enter 10, meaning there are ten subjects in the study. N must be from 2 to 100, and the number of variables must be from one to 30. Next, MINISTAT asks, NAMES (y or n)?. This means, "Would you like to name the variables?" If we press N, indicating No, MINISTAT will assign the variables the names V1, V2, etc., and go straight into the data entry section. If we press Y, we will be given an opportunity to assign our own names. For our example, we will press Y. MINISTAT then asks NAME1? and we enter SEX, the name of our first variable. After NAME2 we input RACE, after NAME3 HT., and so on. Once the file characteristics have been input, we are ready to input the actual data. MINISTAT will ask for the first case of the first variable, continuing down through every case of the first variable, and then go on to subsequent variables. For example, MINISTAT will initially print SEX - CASE 1?, and we will enter a 1, indicating that the sex of the first subject is male. If we make a mistake, we can back up by simply pressing ENTER. The data entry might look like this: SEX -- CASE 1: 1 SEX -- CASE 2: 2 SEX -- CASE 3: 3 (this value is a mistake) SEX -- CASE 4: (enter, we back up) SEX -- CASE 3: 1 SEX -- CASE 4: 1 * * * WT2 -- CASE 9: 119 WT2 -- CASE 10:114 SELECTING A PROCEDURE After the data have been entered, the other options in the menu become available. Letters (A through H) select utility procedures; numbers (1 through 7) select statistical procedures. A procedure is selected by simply pressing the corresponding number or letter -- you do not need to press enter. Procedures that require that a variable be selected will produce a prompt mark: >?. This mark indicates that a variable name should be entered. Some procedures require that more than one variable be entered and will produce this mark again until all variables have been entered. If you input an unrecognized name, two question marks will be printed. After a procedure has been executed, you will be asked, if appropriate, AGAIN (y or n)?. If you would like to perform the same procedure with different variables or parameters, type Y. If you want to return to the main menu, type N. Detailed descriptions of each procedure are listed below. 1) DESC This procedure generates descriptive statistics for any of the variables. If we enter variable WT1, the description looks like this: MEAN: 135.5 VAR: 393.25 S.D.: 19.830532 S.E.: 6.27096 SUM: 1355 N: 10 MAX: 168 MIN: 105 Here is what each of these statistics means: N: The total number of subjects in the sample. SUM: The sum of all the scores or measurements. MEAN: This is the average value, the sum divided by N. MAX, MIN: The maximum and minimum. The heaviest person in this sample weighed 168 lbs., the lightest 105 lbs. VAR: This is the variance of the sample -- to what degree the scores are spread out or clustered together. S.D.: The standard deviation, which is the square root of the variance. In large samples, about 68% of the scores will fall within one standard deviation of the mean, 95% within two standard deviations. S.E.: This is the standard error of means, which is the standard deviation divided by the square root of N. This is the standard deviation of the means of all possible samples of size N. 2) FREQ This procedure generates a histogram or bargraph. It shows how many subjects fall within each of a number of consecutive values or value ranges of a variable. The program first asks for a value name, and then for an interval size. Choose an interval size which is a fraction of the total range, but at least equal to the unit of measurement. Using an interval size of 2, height is distributed like this: 60 ******* (1) 62 ******* (1) 64 ************** (2) 66 ************** (2) 68 ********************* (3) 70 ******* (1) The top bar shows that there is one subject who is at least 60 inches but is shorter than 62 inches. We can see that the modal interval, the one with the most subjects, is the one with subjects who are at least 68 inches tall, but shorter than 70 inches. 3) REGR This procedure generates a scattergram, a regression equation, a correlation coefficient, and a t-value with associated degrees of freedom. All of these statistics allow us to examine the relationship between two variables. The scattergram is a plot of the values of one variable against the values of another. A strong positive relationship, as one might expect to find between variables such as height and weight or job prestige and income, will show all of the points tightly clustered in a straight line going from the lower left to the upper right. A weak relationship, such as that between nose length and IQ, would show points scattered about in a more or less random fashion. A strongly negative relationship, as one might find between blood alcohol levels and performance on a driving test, would show points clustered tightly from the upper left down to the lower right. The first variable entered is the X variable, shown along the bottom of the graph. The second variable entered is the criterion or Y variable, and is shown along the side. The regression equation is shown below the scattergram. This is the formula which does the best job of predicting the Y variable from the X variable. The correlation coefficient (R) quantifies the degree of relationship between the two variables. The value of R can range from -1, a perfect negative relationship, through zero, no relationship, to +1, a perfect positive relationship. The t-value along with the degrees of freedom allows one to test if the relationship is strong enough to be generalized beyond the sample to the population in general. The P value shows the level of significance for the t-value, that is, the likelihood that the results are due only to chance and do not reflect a real effect. A P of less than .05 is generally thought of as significant. If we enter height (HT.) as our first variable, and weight before the diet as our second variable (WT1), we get these results: WT1 = (6.094*HT.) + -267.906 R=0.92 T=6.631 DF=8 P<.001 The regression formula provides a way to predict weight, given a person's height. If someone is five feet, or 60 inches tall, we could predict that they would weigh (6.094*60)-267.906, or 97.7 pounds. The R of .92 is relatively high; it shows us that the relationship is strongly positive, and that we can predict one variable from the other with relatively little error. The t, df, and p values can tell us whether the R is high enough to be generalized to the population from which we drew our sample, or whether it might be a fluke found in this particular sample. P<.001 means that there is less th...
Amiga7878