Chi-square with Minitab

Minitab can do the chi-square test for contingency tables with either raw data or tables of counts. It does not have a command or menu option for the Goodness-of-Fit test though you can use it as a calculator to get the job done. (Pull down Help, select Help, and search for "goodness" to see an example.) For contingency tables, we start with the heart attack data. If you have not worked with this data before you can find a description here. This is a very large data set and so is provided as zip files. (You may need a program such as winzip to unzip them). Available are plain text (with tabs separating entries), Excel, Minitab 14, and Minitab 10 versions of the data. If possible, download and unzip the Minitab version of the heart attack data and open it in your version of Minitab. (It may be too big for student versions.) We would like to see if the mortality rate is different between men and women. From the menus, select Stat, Tables, Cross-tabulation and Chi-square. Select a variable for the rows and a variable for the columns. (Either will work; we put SEX in the rows.) Check Counts (and anything else you want to see). Click on the Chi-Square button and select Chi-Square Analysis (and anything else you want). Keep clicking OK to close windows and soon you should see:

Rows: SEX   Columns: DIED

           0     1    All

F       4298   767   5065
M       7136   643   7779
All    11434  1410  12844

Cell Contents:      Count


Pearson Chi-Square = 148.464, DF = 1, P-Value = 0.000
Likelihood Ratio Chi-Square = 144.921, DF = 1, P-Value = 0.000

With 12,844 observations, getting the table is a lot more work than computing chi-square, and it is best to let the computer do it. If you have an existing table, Minitab can analyze it. You need to enter the table into columns of the worksheet. For example, here are some data from the University of Texas Southwestern Medical Center reported in De Veaux, Velleman and Bock, Stats.: Data and Models 2nd ed., 2008, Addison Wesley, Boston. (It's the last example in Chapter 26, p.645.) The disease hepatitis C can be transmitted through needle pricks including those involved in tattoos. The goal here was to compare infection rates between people with differing tattoo status. A summary of their data is

Hepatitus C No Hepatitus C
Tattoo from parlor 17 35
Tattoo from elsewhere 8 53
No tattoo 22 491

To get such a summary table into Minitab you could type 17, 8, 22 into (say) c9 and 35, 53, 491 into c10. From the menus, select Stat > Tables > Chi-Square Test (Table in Worksheet). Double click on each column that is part of your table and click on OK. (If you are in a hurry, just type the command.)

MTB > ChiSquare C9 C10.

Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts

           C9     C10  Total
    1      17      35     52
         3.90   48.10
       43.928   3.566

    2       8      53     61
         4.58   56.42
        2.554   0.207

    3      22     491    513
        38.52  474.48
        7.082   0.575

Total      47     579    626

Chi-Sq = 57.912, DF = 2, P-Value = 0.000
2 cells with expected counts less than 5.

When you go this route you have no control over what gets printed and have to put up with this congested table. In this case, the information in that crowded table is useful. A rule of thumb is that for the approximations in using the continuous chi-squared distribution for discrete counts to work well we should have expected counts above 5 in each cell. Here we have two violations out of six cells. One reason we check this is that because expected counts go into the denominator in computing the sample chi-squared, they could inflate the result. Here we see that one undersized cell contributes 2.544 to the 57.912 total so that cell is probably not a problem. In contrast the other undersize cell contributes 43.928 -- more than all other cells combined. This is a common problem with survey studies of relatively rare events. Though you seem to have a very large sample of 626 people, only 47 actually had hepatitis C. (This is one of those situations where a much better design would be to study 313 people with hepatitis and 313 without, though that is not always practical.) One common after-the-fact remedy is see if there are some rows in the table we can combine is a rational way. In this case we could concede that we do not have enough data to answer questions about where people got their tattoos and pool the two sources together to get a new table with these entries.

25 88
22 491

The analysis is left as an exercise but the conclusion is that the hepatitis rates are quite different for the tattoo vs. no tattoo groups.