ANOVA(1)                           |STAT                    August 22, 1992

NNNNAAAAMMMMEEEE
     anova - multi-factor analysis of variance

SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS
     aaaannnnoooovvvvaaaa [-p] [-w width] [factor names]

DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
     _a_n_o_v_a does multi-factor analysis of variance on designs with within
     groups factors, between groups factors, or both.  _a_n_o_v_a allows
     variable numbers of replications (averaged before analysis) on any
     factor.  All factors except the random factor must be crossed; some
     nested designs are not allowed.  Unequal group sizes on between groups
     factors are allowed and are solved with the weighted means solution,
     however empty cells are not permitted.

     _I_n_p_u_t _F_o_r_m_a_t.  The input format was designed so that when the user
     specifies the role individual data play in the overall design, _a_n_o_v_a
     figures out the experimental design.  This helps reduce design
     specification errors.  The input to _a_n_o_v_a consists of each datum on a
     separate line, preceded by a list of index labels, one for each
     factor, that specifies the level of each factor at which that datum
     was obtained.  By convention, data are always in the last column, and
     indexes for the one allowable random factor must be in the first.
     Data can be real numbers or integers.  Indexes can be any character
     string, so mnemonic labels can simplify reading the output.  For
     example:
                               fred  3  hard  10
     indicates that "fred" at level "3" of the factor indexed by column two
     and at level "hard" of the factor indexed by column three, scored 10.
     Indexes and data on a line can be separated by tabs or spaces for
     readability.  Data from an experiment consists of a series of lines
     like the one above.  The order of these lines does not matter, so
     additional data can be appended to the end of files.  Replications are
     coded by having more than one line with the same list of leading
     indexes.  With this information, _a_n_o_v_a determines the number of
     factors, the number and names of levels of each factor, and whether a
     factor is between groups or within groups so that error terms for F-
     ratios can be chosen.

     Names of independent and dependent variables can be supplied to _a_n_o_v_a,
     providing mnemonic labels for the output.  These names may be
     truncated in the output.  The names should have unique first
     characters because that is all that is used in parts of F tables.  For
     example, in a three factor design, the call to _a_n_o_v_a:
                   anova  subjects  group  difficulty  errors
     would give the name "subjects" to the random factor, "group" and
     "difficulty" to the next two, and "errors" to the dependent variable.
     If names are not specified, the default name for the random factor is
     RANDOM, for the dependent variable, DATA, and for the independent
     variables, A, B, C, D, etc.

     _O_u_t_p_u_t _F_o_r_m_a_t.  The first part of the output from _a_n_o_v_a includes
     summary statistics and optional error bar plots for each source not
     involving the random factor.  The summary statistics include: cell
     counts, means, standard deviations, and standard errors.  The error
     bar plots place cell means between the grand minimum and grand maximum
     values and use the following characters to identify statistics, which
     are plotted in a specific order:
     Example Plot:
         <    -----(--#--)----- >
     Key:
         -   spanning one standard deviation around the mean,
             but within the minimum and maximum
         (   one standard error below the mean
         )   one standard error above the mean
         <   minimum value
         >   maximum value
         #   mean value
     With small plots and/or low variances the markers may overlap, some of
     the markers may be hidden by others.  In particular, with no
     variability, only the mean will appear.  A useful rule of thumb is
     that if the standard error envelopes of two means do not overlap, then
     they are significantly different, however, a post hoc test should be
     applied to verify this.

     After the summary statistics and plots, a summary of design
     information and an F table testing main effects and interactions
     follow.  Sums of squares, degrees of freedom, mean squares, F ratio
     and significance level are reported for each F test.  The labels for
     interactions are based on the first character of the factor names
     involved, so it is wise to choose factor names with different first
     letters.  For significance testing, one asterisk indicates a result
     significant at the .05 level, two *'s indicate .01, and three *'s
     indicate .001.

OOOOPPPPTTTTIIIIOOOONNNNSSSS
     ----pppp   Show Error Bar Plots for N-Way Tests.  Each use of the -p option
          will request plots for higher-order interactions.  The first will
          request plots for main effects, the next will request them for
          two-way interactions, and so on.
     ----wwww   Width of Error Bar Plots.  This option will let you increase the
          amount of detail in the plots.

DDDDIIIIAAAAGGGGNNNNOOOOSSSSTTTTIIIICCCCSSSS

     _a_n_o_v_a will complain about "Ragged input" if the number of variables in
     its input varies.  _a_n_o_v_a will not print its F tables if it cannot make
     sense out of the the input specification ("Unbalanced factor or Empty
     cell").  This can happen if there are missing data (detected when the
     cell sizes of all the scores for a source do not add up to the
     expected grand total).  Unbalanced factors often are due to a
     typographical error, but the empty cell size message can be due to an
     illegal nested design (only the random factor can be nested).

     _a_n_o_v_a uses a temporary file to store its input and will complain if it
     is unable to create it.  This may be because you are in some other
     user's directory that is "write protected."

EEEEXXXXAAAAMMMMPPPPLLLLEEEE

     An experiment has two experimental factors: difficulty of material to
     be learned, and amount of knowledge a person brings with him or her.
     (This design is due to Naomi Miyake.)  Each person is given two
     learning tasks, one easy and one hard, so task difficulty is a within
     groups factor.  Two people are experts in the task domain, while three
     are novices, so knowledge is a between groups factor with unequal
     group sizes.  The dependent variable is the amount of time it takes a
     person to correctly work through a problem.  Data is formatted as
     follows: in column one is the name of the person (the random factor);
     in column two is the level of the difficulty factor; in column three
     is the level of the knowledge factor; and in column four is the time,
     in seconds, to solve the problem.  Fictitious data follow.

          lucy      easy      novice  12
          lucy      hard      novice  22
          ethel     easy      novice  10
          ethel     hard      novice  15
          ricky     easy      novice  25
          ricky     hard      novice  30
          ernie     easy      expert   7
          ernie     hard      expert  10
          bert      easy      expert  12
          bert      hard      expert  18

     The call to _a_n_o_v_a to analyze the data would probably look like:

                anova subjects difficulty knowledge time < data

     "data" is the name of the file containing the above data.  "subjects"
     is the random factor so indexes for that factor appear in the first
     column.  Data, here called "time", must appear in the last column.
     "difficulty" is a within groups factor because each person appears at
     every level of that factor.  In the third column are indexes for
     "knowledge", a between groups factor, because no person appears at
     more than one level of that factor.

FFFFIIIILLLLEEEESSSS
     UNIX    /tmp/anova.????
     MSDOS   anova.tmp

AAAALLLLGGGGOOOORRRRIIIITTTTHHHHMMMM
     Keppel (1973) _D_e_s_i_g_n _a_n_d _A_n_a_l_y_s_i_s: _A _R_e_s_e_a_r_c_h_e_r'_s _H_a_n_d_b_o_o_k.

WWWWAAAARRRRNNNNIIIINNNNGGGG
     When unequal sized cell designs are used, the cell sizes must be in
     the same proportion across all rows and columns of interactions, or
     there may be marked distortions and the analysis may be invalid.  This
     applies only to designs with more than one between groups factor.  See
     Keppel's discussion of unequal cell designs.

LLLLIIIIMMMMIIIITTTTSSSS
     Use the -L option to determine the program limits.

MMMMIIIISSSSSSSSIIIINNNNGGGG VVVVAAAALLLLUUUUEEEESSSS
     Missing data values (NA) are counted but not included in the analysis.