STATIS.DOC

(19 KB) Pobierz
            MINISTAT: A Statistical Package for the Commodore 64
                     Copyright 1989 by Jon Rich, Ph.D.

MINISTAT is a statistical package program which performs both univariate and
bivariate inferential and descriptive statistics.  A particularly useful
feature of this package is that the data need to be entered only once.
Once the data file has been set up, one may perform any of the included
statistical tests on any of the variables.

A MINISTAT data file is a two dimensional array, or table, of data.  One
dimension is the variables.  These may be subject characteristics, such as
sex or race, subject measurements, such as height, test scores, or running
speed, or any other characteristic on which subjects vary.  The other 
dimension is the cases, or subject number.  Data for a typical MINISTAT
file is shown below:

                    Variables
          SEX  RACE HT.  WT1  WT2
      1    1    1   68   143  140
Case  2    2    2   60   105  103
No.   3    1    3   69   162  153
      4    1    2   70   168  160
      5    2    3   63   115  118
      6    2    1   65   123  125
      7    1    2   69   149  147
      8    2    2   67   145  140
      9    1    3   67   123  119
     10    2    1   64   122  114

These data are from a test of a weight-loss diet.  For each of the ten
persons in the test, the researcher has recorded the sex (1=male, 2=female),
the race (1=Black, 2=White, 3=Oriental), the height in inches, and the
weight before (WT1) and after (WT2) the diet.  Using MINISTAT, we can
answer a number of questions about these data.

STARTING THE PROGRAM

Start the program by entering
LOAD "MINISTAT",8
and then entering RUN.  At the title screen you will be given
the opportunity to toggle the color between black on white
and white on blue by pressing the space bar.  Choose the color
combination which is easiest to read, and then proceed to the main
menu by pressing "C". You will then see the main menu, which looks
like this:

           SELECT
   A) SAVE          1) DESC
   B) INFO          2) FREQ
   C) OLD           3) REGR
   D) DIR           4) CHI2
   E) NEW           5) T:UR
   F) KILL          6) T:RS
   G) COMP          7) ALPHA
   H) HELP

At this point, there is no data file loaded into the program.  The only
options on the menu which will work are "C", which will allow you to
retrieve a previously saved file, "D", which will display the catalog of
previously saved MINISTAT files, "E", which will allow you to input a new
file, "F", which will erase a previously saved file, and "H", which will
allow you to view the help files.

SETTING UP THE FILE (Option "E")

To set up a new file, hit "E" at the main menu.  You will be asked for
a file name.  This can be anything you wish that you can easily associate
with your study.  We will name this file "DIET."  Next you are asked
"N VARS?" This means "How many variables are in the file?"  In our
example there are five variables, so we enter the number 5.  We
are then asked for N, and we enter 10, meaning there are ten subjects
in the study.  N must be from 2 to 100, and the number of variables
must be from one to 30.

Next, MINISTAT asks, NAMES (y or n)?.  This means, "Would you like to name
the variables?"  If we press N, indicating No, MINISTAT will assign the
variables the names V1, V2, etc., and go straight into the data entry
section. If we press Y, we will be given an opportunity to assign
our own names.  For our example, we will press Y. MINISTAT then asks
NAME1? and we enter SEX, the name of our first variable. After NAME2
we input RACE, after NAME3 HT., and so on.

Once the file characteristics have been input, we are ready to input the
actual data.  MINISTAT will ask for the first case of the first variable,
continuing down through every case of the first variable, and then go on
to subsequent variables.  For example, MINISTAT will initially print 
SEX - CASE 1?, and we will enter a 1, indicating that the sex of the 
first subject is male.  If we make a mistake, we can back up by simply
pressing ENTER.  The data entry might look like this:

   SEX -- CASE 1: 1
   SEX -- CASE 2: 2
   SEX -- CASE 3: 3 (this value is a mistake)
   SEX -- CASE 4: (enter, we back up)
   SEX -- CASE 3: 1
   SEX -- CASE 4: 1
         *
         *
         *
   WT2 -- CASE 9: 119
   WT2 -- CASE 10:114

SELECTING A PROCEDURE

After the data have been entered, the other options in the menu become
available.  Letters (A through H) select utility procedures; numbers
(1 through 7) select statistical procedures.  A procedure is selected
by simply pressing the corresponding number or letter -- you do not
need to press enter.

Procedures that require that a variable be selected will produce a prompt
mark: >?.  This mark indicates that a variable name should be entered.
Some procedures require that more than one variable be entered and will 
produce this mark again until all variables have been entered.
If you input an unrecognized name, two question marks will be 
printed.

After a procedure has been executed, you will be asked, if appropriate,
AGAIN (y or n)?.  If you would like to perform the same procedure with
different variables or parameters, type Y.  If you want to return to the
main menu, type N.  Detailed descriptions of each procedure are listed
below.


1) DESC

This procedure generates descriptive statistics for any of the variables.
If we enter variable WT1, the description looks like this:

MEAN: 135.5      VAR: 393.25
S.D.: 19.830532  S.E.: 6.27096
SUM: 1355        N: 10
MAX: 168         MIN: 105

Here is what each of these statistics means:

N:  The total number of subjects in the sample.
SUM: The sum of all the scores or measurements.
MEAN: This is the average value, the sum divided by N.
MAX, MIN: The maximum and minimum.  The heaviest person in this sample
weighed 168 lbs., the lightest 105 lbs.
VAR: This is the variance of the sample -- to what degree the scores are
spread out or clustered together.
S.D.: The standard deviation, which is the square root of the variance.
In large samples, about 68% of the scores will fall within one standard
deviation of the mean, 95% within two standard deviations.
S.E.: This is the standard error of means, which is the standard deviation 
divided by the square root of N.  This is the standard deviation of the means
of all possible samples of size N.

2) FREQ
This procedure generates a histogram or bargraph.  It shows how many subjects
fall within each of a number of consecutive values or value ranges of a
variable.  The program first asks for a value name, and then for an
interval size.  Choose an interval size which is a fraction of the
total range, but at least equal to the unit of measurement.  Using an
interval size of 2, height is distributed like this:

60 ******* (1)
62 ******* (1)
64 ************** (2)
66 ************** (2)
68 ********************* (3)
70 ******* (1)

The top bar shows that there is one subject who is at least 60 inches but is 
shorter than 62 inches.  We can see that the modal interval, the one with
the most subjects, is the one with subjects who are at least 68 inches tall,
but shorter than 70 inches.

3) REGR

This procedure generates a scattergram, a regression equation, a
correlation coefficient, and a t-value with associated degrees
of freedom.  All of these statistics allow us to examine the relationship
between two variables.

The scattergram is a plot of the values of one variable against the values
of another.  A strong positive relationship, as one might expect to find
between variables such as height and weight or job prestige and income, will
show all of the points tightly clustered in a straight line going from
the lower left to the upper right.  A weak relationship, such as that
between nose length and IQ, would show points scattered about in a more
or less random fashion.  A strongly negative relationship, as one might
find between blood alcohol levels and performance on a driving test,
would show points clustered tightly from the upper left down to the
lower right.  The first variable entered is the X variable, shown along
the bottom of the graph.  The second variable entered is the criterion
or Y variable, and is shown along the side.

The regression equation is shown below the scattergram.  This is the formula
which does the best job of predicting the Y variable from the X variable.

The correlation coefficient (R) quantifies the degree of relationship between
the two variables.  The value of R can range from -1, a perfect negative
relationship, through zero, no relationship, to +1, a perfect positive
relationship.  The t-value along with the degrees of freedom allows one
to test if the relationship is strong enough to be generalized beyond
the sample to the population in general.  The P value shows the
level of significance for the t-value, that is, the likelihood
that the results are due only to chance  and do not reflect a
real effect.  A P of less than .05 is
generally thought of as
significant.

If we enter height (HT.) as our first variable, and weight before the
diet as our second variable (WT1), we get these results:

     WT1 = (6.094*HT.) + -267.906
     R=0.92
     T=6.631   DF=8   P<.001

The regression formula provides a way to predict weight, given a person's
height.  If someone is five feet, or 60 inches tall, we could predict that
they would weigh (6.094*60)-267.906, or 97.7 pounds.

The R of .92 is relatively high; it shows us that the relationship is 
strongly positive, and that we can predict one variable from the other
with relatively little error.

The t, df, and p values can tell us whether the R is high enough to be
generalized to the population from which we drew our sample, or whether
it might be a fluke found in this particular sample.  P<.001 means that 
there is less th...
Zgłoś jeśli naruszono regulamin