Editor's note: Dr. Steven Struhl is vice president, senior methodologist at Total Research, Chicago.
The newly released version 7.5 of the SPSS statistical analysis software suite gives a strong example of how far software has progressed in the last few years. SPSS sets itself a series of tasks, many of them daunting, and then accomplishes them neatly. This integrated set of programs raises the standard for performance in the area of data analysis. It adds some compelling evidence that we have reached a real turning point in computer-based applications. In short, it seems that the "era of really neat software" has begun and the "era of truly awful software" is slowly drawing to a close.
The Awful Days
The term "era of truly awful software" may not mean much to more recent computer users and those blessed with the ability to forget painful experiences. As recently as four or five years ago, though, many programs simply refused to perform under many conditions, produced dreadful output, communicated only fitfully with their users, and/or managed to crash the entire computer on a regular basis. Even the best software was often incompatible with other programs (that is, refused to work with products from infidel competitors), looked completely different from all other programs, and used idiosyncratic commands and procedures.
The great progress of software came home forcibly when I dusted off my old 386-based laptop computer in a recent rearrangement of my office. This computer had been serving as a handy 7.1 lb. paperweight for some time. Inspired by the clean-up, though, I wanted to see if it could be pressed into more useful service. What I found on it seemed a kind of crazy quilt of programs, with each looking and acting little like the others. Yet this ill-assorted collection was exactly what I had left on the laptop a year or two before - and what I had once been using to produce work.
After I poked around the laptop's directories for few minutes, the setup started to remind me of a slew of tiny fiefdoms, each with its own rules and laws, and with generally impassable borders. But this was, in the bad old days, as much as we had. Then, ease of use was entirely relative - as Christopher O'Malley observed (in 1992): "Word Perfect is easy to use compared to WordStar, which is really a snap compared to XyWrite, which is a breeze compared to intestinal surgery." (As quoted by David Lubar in It's Not a Bug, It's a Feature, Reading, Mass.: Addison Wesley Publishing Company, 1995.)
Software's new challenges
The fact that software now often works so much better opens an entirely new set of challenges, both for users and your reviewer. Rather than concerning ourselves with the ways in which the program fails or crashes, or what it refuses to do, we can consider the goals of the software, and how much these meet our needs. This makes evaluating the software both more interesting and more complex. We have an opportunity to discuss why we are using the software, which can take somewhat more thought than noting what the program will not do.
The basics about SPSS
If you already use SPSS or know it fairly well, you probably can skip, jump or leap directly to the next section. This introduction is here because it recently became highly apparent that your reviewer should not assume everybody knows SPSS and what it does. More specifically, a major computer-oriented publication ran a review of this product which (not to denigrate a competitive reviewer more than is absolutely needed) spent much of its time "comparing" SPSS and Microsoft Excel. These programs most definitely are not intended to do the same things, although they overlap in some small areas. Excel in fact picked up a few of its many features from SPSS, and recently SPSS seems to have returned the favor. Trying to compare the performance and capabilities of these two programs, though, seems something like comparing a super-tanker and a moving van because they both carry things. Comparisons soon become just as informative as noting that the super-tanker does not fit neatly into most driveways, and that the moving van does not do well crossing most large bodies of water.
In brief, consider using SPSS if you need to find the relationships between items of data, and particularly what sets of values (numbers) have in common, and how sets of values predict (or explain) other sets of values. Providing you have some ideas about what you need to do, even the most powerful spreadsheet will not approach a program like SPSS in these areas. With SPSS, you can find complex inter-relationships in data that will elude the simpler forms of analysis available in spreadsheets.
SPSS also holds more information about the data than any spreadsheet can. For instance, SPSS can hold long labels for the values of a variable in addition to holding a long label for the variable itself. (This means SPSS can retain a label of "excellent" for a numeric response of 5, a label of "good" for a response of 4, and so on. It can have different long labels for the values of different variables. You can then use the long labels in reports and analyses of many types.)
Excel has its own set of distinctive strengths also. It makes sensational-looking tables and forms, which also can do sophisticated calculations with the numbers you enter. Perhaps most impressively, Excel has a built-in programming language that allows its spreadsheets to become programs, complete with menus, controls, and so on. SPSS 7.5 has moved closer to Excel in this area with the inclusion of a new "scripting" feature. This new feature allows you to build customized routines in SPSS that automate analyses and displays. (We will discuss this more later.) Overall, though, SPSS still does not match Excel in ability to make truly customized displays - up to and including interactive ones.
Often I find Excel useful to format results or to create small programs based on analyses conducted in SPSS. The basic analytical power comes from SPSS. The final integration into findings that will get applied still largely relies on Excel - and other programs. Perhaps this is oversimplifying, but this seems to capture the basic difference between the two programs.
SPSS: What comes in the package
SPSS has taken a modular approach to its package for a number of years. Those of you who use SPSS now will be familiar with its organization into a base package (required for all operations) and a number of add-on modules that have special analytical capabilities. Several of these modules themselves contain bundles of programs.
SPSS has steadily added procedures and options over the years, but the basic organization of the program into sets has remained fairly constant. Version 7.5 continues the trend that SPSS has followed in recent releases, adding more analytical power to the base, and more cutting-edge procedures to the add-on modules. Here's a run-down of what you will find in the base and the two largest modules:
If you are involved in the analysis of databases or market research data, then you will find the Base program and Categories - and perhaps Professional statistics - the "must have" modules. If you do not have another favorite classification tree analysis program, you will find CHAID a highly competent performer, if not quite as full-featured as some competitors. Trends is a highly powerful time series analysis program with many advanced features. Lastly, if you run into more complex analyses, you should find Advanced Statistics highly useful. Overall, SPSS covers nearly everything you will need, leaving just a few items for the wish list for later releases. (See "The SPSS wish list and competitive options" below for a full description.)
How SPSS works
You must have either Windows 95 or Windows NT for SPSS 7.5 to run. If you still have Windows 3.0 or 3.1 (or 3.11 or 3.01, etc.), you will need to stay with SPSS version 6.0 or 6.1. (Incidentally, if you need an excuse to upgrade to Windows 95, this will give you as good as any. You will find many advantages to the new operating system, as we discussed in our review in the May 1996 QMRR.) The older SPSS version (6.1) has most of the analytical features of version 7.5, but it does not have the new and improved product architecture, or the same ability to customize output and automate involved analytical procedures.
One of the first facts you will notice about version 7.5 is that it exists as a series of windows, each of which will appear separately identified on the Windows 95 task manager bar. The program does not have a solid "back wall" as do applications such as Excel or Word for Windows. The free floating windows may seem slightly disconcerting at first (you can click your way into another program in the background at times), but then should quickly become familiar. The windows include the familiar data editor (showing the file which you are using in spreadsheet-like form), the syntax editor, and the new output navigator.
The output navigator presents the single most striking improvement in SPSS over the older versions 6.0 and 6.1. It replaces the old text output and "chart carousel" windows that SPSS once used, and very nicely organizes all the types of output SPSS produces. Small "books" go into an organized tree-like display window to the left of the display screen, with each book containing all the output from a procedure. You can label these books yourself, or use the default labels that SPSS provides. Scrolling to a given book in this left window brings you quickly to the exact portion of the analysis that you need. You can move sections of the analysis around by dragging the books to different spots in the tree display. You can edit the output on the spot or save it to revise later. You also can copy and paste all or any part of the output into Windows-based word processing and presentation packages.
As you run longer analyses, the tremendous value of this output navigator becomes more obvious. Everything you have done stays readily at hand, so that you can quickly review and compare results. You can instantly eliminate sections that you decide are superfluous. Charts remain connected to the analyses that generated them. In short, everything becomes more organized and efficient.
Among the types of output SPSS generates, the "pivot table" was introduced in version 7.0, and made still more flexible in this new release. Many types of output now go into these tables, which are quite handsome and professional in appearance. Pivot tables may be familiar to users of Excel, Quattro Pro - and the late, lamented Lotus Improv, which introduced the idea. In a pivot table, you can swap rows and columns with a click of a button. Even more impressive, if you have a table with "nesting" (headings within headings), you can change rows, columns and nestings within each. You may not need to do such fancy maneuvering often, but now you can. You get the tables to pivot simply by pulling around icons that appear in pivot trays at the margins of the table. Everything gets rearranged instantly.
You can change the basic "look" of these tables, as you can in programs like Excel, choosing a scheme with special shading, cell borders and colors. The results look at least as good as the tabular output that appears in professional journals.
Pivot tables, bringing a new level of output to much of the program, exact only a small price in return. They sometimes show slight "hesitation" when you select them for viewing, even on a reasonably fast PC. (Our test machines included a 586-133 desktop computer with 32 MB of RAM and a 586-75 notebook with 16 MB of RAM. The program ran with little discernible speed difference on either for most operations.) This behavior of the output navigator may seem a little disconcerting if you are used to scrolling instantly through several miles of text-based output - as you could in earlier versions of SPSS.
However, only the chart editing function seems to need some speed enhancements. It runs perceptibly slower than the graphing module in Microsoft Powerpoint, for instance, which is already slower than anything your reviewer likes to use regularly. The convenience of having all your output readily at hand more than makes up for any small performance lags, though - at least on a 586-class machine.
Also, since output navigator files contain all sorts of handsome graphical objects, they tend to require more disk space than did the corresponding text-based output. You can, without too much effort, generate a navigator file holding charts, tables and text output the size of an average University of Chicago doctoral dissertation. The navigator saves the entire session and the tree structure organizing it in a special ".spo" file, which can run up to several megabytes. You definitely will want to eliminate any unwanted analyses before saving the navigator document.
As mentioned earlier, SPSS has continued to add new features and procedures, as it has done with every recent new release. In fact, you can see in many details of SPSS (both how it looks and acts) evidence of considerable careful thought and many years of refinement. New procedures in this version include, most notably, tests in the analysis of variance module that will handle distributions with widely different patterns of dispersion (or variance). Among the features, you can get several fine bonuses in the CD-ROM version of the program. SPSS has put this medium's great storage capacity to good use by including files with basic demographic information from Claritas and Wessex, some of which extends down to the census tract level. This allows you to append this data to files you are analyzing where you have a person's ZIP code (or census tract), greatly boosting your ability to profile respondents or database members demographically. The CD-ROM also has the syntax guide online, making it easier to specify details of the analysis as you want them.
Version 7.5 also makes strides toward helping users less familiar with statistics, with a new statistics on-line "statistics coach." Using the coach involves answering a series of questions about your data and the type of chart or table you would like, and then following the directions it gives toward the dialogue boxes of specific procedures. The coach seemed to handle straightforward requests well - but the PC still cannot substitute for knowing the methods available and thinking through what you really need. Also, even if a novice, you will need to know what type of data you are analyzing (nominal, continuous, etc.), and a few statistical terms. SPSS also had added a related but simpler "ODBC Wizard" which can prompt you step-by-step through accessing a database and joining multiple database tables.
What remains the same
SPSS retains its basic operating structure, in which you can build up a complex chain of analyses by selecting items from menus or by filling in "dialogue boxes" on screen. The program then generates a series of commands from your choices. You still can choose to run commands once they have been completed by pushing an "OK" button, or can choose to paste these commands into a "syntax window." Pasting the commands into the window has its advantages. First, and most obviously, you can see exactly what your menu choices make the program do, in the program's own terms. Also, you can compare the commands you chose versus all possible options by pushing the handy "syntax checker" button (which appears on the bar at the top of the syntax window). Finally, you can save and recycle the commands you have generated in a text-based ".sps" file. You can keep reusing the pasted commands in any analysis at hand, or call up ".sps" files to use in another analysis.
SPSS allows you to open many command and output files simultaneously in a session. You can cut and paste between any of them. You still are restricted to one data file per session, though. (Unlike Excel, SPSS will not allow you to have many data windows - or spreadsheets - open for viewing and manipulating at the same time.) SPSS does give you remarkable power in merging many files into one unified database, though. You have precise control over how SPSS adds variables or cases - even according to incredibly complex rules. Given this capability, using one data file per analysis seems a minor limitation. Even if you do not do much data analysis, you should find the file management abilities of the base program - to put together data files, to select specific cases according to exact criteria, to exclude data, to sort data, to transform data, and to create new variables from others - remarkably useful.
"Real Stats-Real Easy"
SPSS has used this as a slogan for several years. While I agree that the statistics are real, I don't know if I can agree that second "real" is real, really. (Sorry, but I couldn't resist.) The statistics are easy in the sense that your PC can, in seconds, rip through mountains of data analysis that the combined math faculties of every American college and university could not solve in decades. The speed of analysis can be amazing. Watching SPSS sprint through, say, a discriminant analysis with 200 variables and 30,000 cases is not likely to replace cable TV as entertainment, but you will be impressed with how quickly it gets done. (All right, perhaps you won't, but it still impressed your reviewer.)
In any event, you now can do more analyses and test more options and alternatives than ever before. The question then becomes whether you should do the analysis at all, not whether you can do the analysis.
With your forbearance, let's try another analogy, hoping that this does not stretch things too far. In a way, having SPSS to analyze your data is something like finding that - instead of buying a hand saw - you can now have an entire millwork factory delivered to your basement. Now, when you go downstairs, you can knock out an oak dining room table with fancy scrollwork and turned legs in about 16 minutes. If you get really ambitious, and you have an afternoon to two to spare, you can make up a pre-fabricated addition for your house. You find only one catch to this marvelous set-up: you have to know how to use the machinery - or at least some of it.
If you run the machinery the wrong way, you can create a pile of sawdust in an instant - or perhaps even burn down a substantial portion of you neighborhood. Assuming you know how to avoid a disaster, you then have to know what to make, and how to make it. You can use your basement mill to produce things that seem fine, but which do not stand up to use, like three-legged chairs, couches without any internal supports, and so on.
At the end of this long analogy, we come back to the powerful analytical capabilities you find in SPSS. With this program, you can generate immense quantities of "output" in nearly no time. But you must have the requisite knowledge and do the needed thinking for this material to perform adequately. No statistics program can advise you against mistakes or misconceptions in basic approach. You may get some guidance on the correct comparison method to use with certain forms of data, but the program will not tell you whether you should draw a given comparison or do something else entirely. With a program as sophisticated and analytically capable as SPSS, you may almost start to think that your PC can think - which (unfortunately or not) is most definitely an illusion. You will still need to know what the computer should be doing, and instruct it accordingly. And (unfortunately), doing this is not always "real easy."
The presentation dilemma
By now, you should have few questions about the ability SPSS has to analyze most forms of data, and to solve many analytical problems. Also, SPSS can do nearly anything you will ever need - in data display and presentation - to get an article into any academic journal. Where SPSS leaves you a few steps from the finished product, though, is in creating the types of presentations that most decision-makers (managers, executives, etc.) need to see. This final distillation of data into so-called "actionable" form remains largely up to you.
SPSS will bring you at most times to "nice neat tables," and at others all the way to "cool charts and graphs." But the last long transition remains squarely outside the program's design. In a way, this seems all we could ever expect. The makers of SPSS are serious scientists and statisticians. They could never understand how frequently we need to present our findings to - and get some action from - people with the attention span of children. (This is getting worse, if anything, with corporate downsizing. Now you encounter more and more people acting like stressed-out and overworked children.)
These comments may more properly belong in the "SPSS wish list," but it seems that the program could go still further in making its output flexible, and in allowing you to extract only those few items your audience will need to see. SPSS has in this release eliminated much of the typewriter-like ASCII-text format that it once produced. For instance, the output from discriminant analysis or factor analysis used to appear as a large block of text which you had to keep in a "fixed font" (like Courier). Now, you get a series of handsomely formatted "objects," but objects that remain fairly fixed in content, if not appearance. You may need to devise complex strategies to get a simple measure that the output does not provide.
Suppose, for example, you decide you want "overall weighted importances" for the variables in a discriminant analysis which has produced several functions (or dimensions). Each variable has a coefficient in each function and the functions each have relative strengths, in terms of how much variance they explain. To get some overall measure, you have to excise the correct section of the output, make sure it has the correct format, paste it into a spreadsheet, multiply the coefficients within each function by the strength or size of the function, and sum. Who's ready to take a quiz about what I just said? I thought so.
Anyhow, this is the kind of thing (misguided or not) that you may need to do. The idea is fairly simple (which is why the client asked you to do it, anyhow), but the doing is not. Running up numbers like this can become time consuming and tedious when you have many analyses to complete. Perhaps the new scripting procedures in SPSS 7.5 can automate some tasks like these, but for now the commands available seem mostly to affect formatting of output, rather than producing any new values from the output, even simple ones.
The SPSS wish list and competitive options
As the section above suggested, SPSS still seems to need some progress in letting the user isolate and format just the few key fragments of information coming from a long analysis. The new output navigator and scripting language make several long steps toward this goal, but more control over results produced could only help. Again, seeking major changes in this area seems more like demanding a shift in how the program is conceptualized than claiming it has any actual deficiencies. Indeed, when we look at its operations, SPSS has omitted little that you are likely to need in the analysis of surveys and databases.
Perhaps the largest omission is that SPSS cannot perform multinomial logit analysis (and such related logit analyses), which you need for discrete choice modeling (DCM) as it is usually done. The lack of true capability to analyze DCM problems prevents SPSS from handling an entire branch of analysis that has proven remarkably useful for product and service optimization and pricing research.
Finally, the CHAID program could use some updating. It has all the basics for classification tree analysis, and using it certainly is far better than not having any program with these capabilities. It has been surpassed by competitive offerings in recent years, though. In particular, KnowledgeSeeker from Angoss continues to advance its remarkable range of capabilities and features. (We plan to review the KnowledgeSeeker version 4.2 in an upcoming issue.)
SPSS has a growing list of competitors in the Windows environment. However, SPSS remains a top choice for analyzing surveys, databases and other data from which you must make decisions. It strikes an excellent balance between power, features and usability, making it the leading contender to become the analytical software that you use most often.
SPSS is not the reigning heavyweight champion of data analysis, though. This distinction belongs to SAS, as it likely has for some years now. SAS remains a staggering program, in terms of number of procedures and options within procedures. In general, if somebody somewhere is doing some obscure form of statistical analysis, the odds are that you too can do it with SAS. However, SAS does exact more of a price for the power it provides. Some of this price is purely monetary: SAS does not sell its program, but only licenses it - meaning that you must pay an annual fee to continue using it. Licensing fees vary based on the number of users, but they tend to be more expensive than the cost of purchasing and maintaining a program like SPSS. Also, with SAS, you probably will want to keep one or several of the many manuals close at hand while doing an analysis. While the manuals are extremely clear and provide many helpful examples of exactly how to do different procedures, it seems that you need to refer to them fairly often to do what you want. Although SAS has graduated to Windows, it still has much more of the feel of a basic DOS program. You can still use the basic command line structures and SAS manuals from six to 10 years ago. Also, you may find the tremendous arrays of options daunting in themselves. You reviewer is willing to wager that nobody knows exactly how all the different choices you have in SAS really differ, even in theory - and certainly not in practice.
Two programs formerly competing with SPSS have become companion products, also distributed by SPSS, with each intended for slightly different groups of users. Systat has become the SPSS product for engineering and technical users, and BMDP has become the product for biomedical research. Each has a few special features that their intended audiences find helpful. Since becoming part of the SPSS product line-up, though, each of these programs has taken on a somewhat SPSS-like appearance and has gained in useful SPSS-like capabilities and amenities. I know a few market researchers who prefer Systat to SPSS, but similarities between these programs now seem to outweigh points of difference.
SPSS pricing
SPSS is priced on modular basis. The base package lists for $695. SPSS offers special pricing on the base with added modules. For instance, you can buy a "Base +2 bundle" for $1,295. Other additional modules usually cost $395 to $495 each. SPSS offers a number of standard discounts (for educational and government users, for instance), and site licenses are available. Special discounts on extra modules are sometimes available; you can call SPSS at the numbers below for more information.
Conclusions
Given the market prices for data analyses, SPSS is in fact a terrific bargain. The incredible usefulness of this program in solving real-world problems, and in making real-world decisions, is beyond dispute. Unless you have highly specialized needs, the SPSS base and two or three additional modules are very likely to provide all the analytical power you need. The program provides the best balance achieved by any software so far between analytical power and ease of use. You may need to fuss with the output somewhat to reduce it to a form that most audiences can understand and use, but that is the only qualification to a very strong recommendation. Provided you have moved to Windows 95 or the Windows NT operating system, you will find a great deal to like in SPSS version 7.5. And if you have not yet upgraded your operating system, this program could provide precisely the reason you needed.