Listen to the text

Editor’s note: Tom Anderson is managing partner of Anderson Analytics LLC, a Stamford, Conn., research firm. The information in this article relating to the Starwood Hotels and Resorts Worldwide Inc. text mining project was originally presented at the 2005 SPSS Decisions conference. The information relating to Web scraping and text mining in the leisure industry will be presented at the ESOMAR leisure research conference in Rome on November 5-7, 2006.

Would you spend hundreds of thousands of dollars to collect valuable customer comments and not read them? It seems unlikely, yet we have found that most companies essentially do just that, most likely because the sheer volume of text data available for analysis just isn’t manageable.

However, thanks to advances in text mining software available from companies such as SPSS, Clear Forest and Leximancer, analyzing thousands of open-ended customer comments is finally possible.

The challenge

Starwood is one of the world’s largest hotel and leisure companies. It conducts its hotel and leisure business both directly and through its subsidiaries. Its brand names include St. Regis, the Luxury Collection, Sheraton, Westin, W, Four Points by Sheraton, Le Méridien, and aloft.

The hotel industry is highly competitive. Customer choices are generally based on quality and consistency of room, restaurant and meeting facilities and services, attractiveness of locations, availability of a global distribution system, price, the ability to earn and redeem loyalty program points and other factors. 

Starwood management believes that brand strength is among the most important factors contributing to its position as a leader in the lodging and vacation ownership industry and provides a foundation for the company’s business strategy. Key to the firm’s brand strength is the success of its upscale and luxury brands in capturing market share from competitors by aggressively cultivating new customers while maintaining loyalty among active travelers.

To manage and maintain quality among its brands, Starwood operates a global guest satisfaction program consisting of about 1,000,000 guest responses per year. While the satisfaction surveys contain several rating scale questions, about a third of surveys (>300,000 guests) also contain verbatim/text comments and suggestions. This data contains valuable insights on how Starwood might further increase guest satisfaction and drive loyalty. However, reading, coding and analyzing this amount of text data was seen as impossible.

Text mining

To reap those insights, Starwood turned last year to text mining and analysis. Codes or “verbatim concepts” were created from the vast amounts of text data and these were used to model a large database. Looking at guest text responses in aggregate, using the extracted verbatim concepts allowed prediction of key measures such as overall satisfaction, “return to brand” and “return to hotel” with above-80 percent accuracy.

The analysis also allowed Starwood to see how decisions on capital investment (such as replacing hotel ventilation systems to reduce noise issues) affect guest satisfaction and likelihood of returning to the brand.

The viability of a verbal satisfaction index to be used independently or as a support to the common overall satisfaction score was also explored. This score, made up of verbatim concepts with regression weights, may represent the future of customer satisfaction research and benefit several industries.

“Starwood Hotels and Resorts were delighted to see the aggregate voice of the customer,” says Rebecca Gillan, Starwood vice president, global market research and guest satisfaction. “Understanding the key words that drive verbal satisfaction can provide another important tool for general managers to ensure that a guest’s stay is a great one, and being better able to judge how satisfied a guest is while they are still at the hotel provides another opportunity to make the guest’s experience a positive one, which is the most important factor in the decision to return to the hotel and ultimately to drive true preference for Starwood’s brands.”

Three phases

In each text mining project there are typically three phases.

• Phase one is data collection and preparation. Before the text coding process can begin the text data must be put into a database structure similar to an Excel file. If the text data of interest resides on the Internet, special software for screen scraping or Web scraping must be used to harvest the data. There are commercial applications for sale, but many companies involved in text mining have their own software for this as well. The software will go to the Web sites of interest and create a database using parameters you specify.

• Phase two is the coding of verbatim concepts of interest. Most commercial coding software comes with at least one predefined dictionary. In its simplest form, coding software only counts words. However, next-generation software also looks at syntax and other linguistic qualifiers. Good coding software should be able to automatically identify double negatives, for example.

Each industry or product category obviously has its own vocabulary. Therefore for each project at least one custom dictionary is usually built. If you are using consultants for your text mining project it is important that they communicate with your management during the coding phase so that verbatim concepts can be coded properly. Otherwise there are bound to be misclassifications in the dictionaries.

• Once the coding scheme has been built phase three can begin. This is when the actual text or data mining takes place. Your data has now been turned into numerical format, and, depending on your analytical software, various exploratory procedures from the data mining/knowledge discovery discipline such as Web graphs, CHAID and neural nets can be run on the data to understand how verbatim concepts are related and what your customers view as negative or positive.

Once the data is understood, it is advisable to develop and test some hypotheses in a structured approach. At this point, without a structured analytical approach, much time can be wasted. It is important to know what questions management most wants answered and to understand the quickest and most accurate way to get there using the data.

Understand the issues

To gain the information advantage in the new information world - where data is a commodity and any customer/guest can become a brand evangelist/terrorist by blogging or posting to Web sites - it is crucial to leverage new technology to monitor and understand the issues that truly drive brand equity and customer satisfaction.

Surveys and focus group data are just one source of customer text data. Other important sources include but are not limited to: call center and sales force records, customer e-mail complaints and suggestions, Web site submission forms, blogs and Web discussion boards.

Perhaps the most challenging source of text-based information about your company and its competitors - because it is dynamic and at least partially out the control of your marketing and PR departments - is the Web. Starwood has long known the value of monitoring and responding to customer comments on sites relevant to frequent travelers and has a full-time employee known as “the Starwood Lurker” who frequently posts on the popular flyertalk.com  site.

Ready to get serious

If your company is ready to get serious about listening to your customers, here are some tips on text mining:

1. First, identify the sources of customer comments/data that your company may not be properly monitoring or reacting to. Prioritize these sources in terms of competitive advantage and importance to customers. If customers are expecting your company to react to and/or reply to this information it may also be necessary to set up a system to respond to some of these issues.

2. While 80 percent of all data is in text format, and this data usually provides a rich source of fresh insights, it is important to have some specific goals in mind before beginning your text mining project. Specifically you should think about which supporting variables are available. For instance, will you be able to identify the source of text data? The overall topic category? Is date/time information appropriate/available? If so, these considerations need to be incorporated into the data processing/collection phase as well as the analytical framework. This may save countless hours of work later on.

3. Finally, investigate which software or vendor is most appropriate for you. Vendors should have experience in your field and make recommendations on what supporting software is necessary or even customize/build collection software from scratch if necessary. They should also be able to speak candidly about how the analysis will be done and what results you should expect.