■■
Originating number
■■
Terminating number
■■
Location where the call was placed
■■
Account number of the person who originated the call
■■
Call duration
■■
Time and date
Although the analysis did not use the account number, it plays an important role in this data because the data did not otherwise distinguish between business and residential accounts. Accounts for larger businesses have thousands of phones, while most residential accounts have only a single phone.
Analyses without Graph Theory
Prior to using link analysis, the marketing department used a single measurement for segmentation: minutes of use (MOU), which is the number of minutes each month that a customer uses on the cellular phone. MOU is a useful measure, since there is a direct correlation between MOU and the amount billed to the customer each month. This correlation is not exact, since it does not take into account discount periods and calling plans that offer free nights and weekends, but it is a good guide nonetheless.
The marketing group also had external demographic data for prospective customers. They could also distinguish between individual customers and business accounts. In addition to MOU, though, their only understanding of 1 The authors would like to thank their colleagues Alan Parker, William Crowder, and Ravi Basawi for their contributions to this section.
470643 c10.qxd 3/8/04 11:16 AM Page 344
344 Chapter 10
customer behavior was the total amount billed and whether customers paid the bills in a timely matter. They were leaving a lot of information on the table.
A Comparison of Two Customers
Figure 10.11 illustrates two customers and their calling patterns during a typical month. These two customers have similar MOU, yet the patterns are strikingly different. John’s calls generate a small, tight graph, while Jane’s explodes with many different calls. If Jane is happy with her wireless service, her use will likely grow and she might even influence many of her friends and colleagues to switch to the wireless provider.
Looking at these two customers more closely reveals important differences.
Although John racks up 150 to 200 MOU every month on his car phone, his use of his mobile telephone consists almost exclusively of two types of calls:
■■
On his way home from work, he calls his wife to let her know what time to expect him. Sometimes they chat for three or four minutes.
■■
Every Wednesday morning, he has a 45-minute conference call that he takes in the car on his morning commute.
The only person who has John’s car phone number is his wife, and she rarely calls him when he is driving. In fact, John has another mobile phone that he carries with him for business purposes. When driving, he prefers his car phone to his regular portable phone, although his car phone service provider does not know this.
10 MOU
20 MOU
30 MOU
10 MOU
20 MOU
John
150 MOU
J ane
20 MOU
40 MOU
30 MOU
5 MOU
5 MOU
20 MOU
Figure 10.11 John and Jane have about the same minutes of use each month, but their behavior is quite different.
470643 c10.qxd 3/8/04 11:16 AM Page 345
Link Analysis 345
Jane also racks up about the same usage every month on her mobile phone.
She has four salespeople reporting to her that call her throughout the day, often leaving messages on her mobile phone voice mail when they do not reach her in the car. Her calls include calls to management, potential customers, and other colleagues. Her calls, though, are always quite short—
almost always a minute or two, since she is usually scheduling meetings.
Working in a small business, she is sensitive to privacy and to the cost of the calls so out of habit uses land lines for longer discussions.
Now, what happens if Jane and John both get an offer from a competitor?
Who is more likely to accept the competing offer (or churn in the vocabulary of wireless telecommunications companies)? At first glance, we might suspect that Jane is the more price-sensitive and therefore the more susceptible to another offer. However, a second look reveals that if changing carriers would require her to change her telephone number it would be a big inconvenience for Jane. (In the United States, number portability has been a long time coming. It finally arrived in November 2003, shortly before this edition was published, perhaps invalidating many existing churn models.) By looking at the number of different people who call her, we see that Jane is quite dependent on her wireless telephone number; she uses features like voicemail and stores important numbers in her cell phone. The number of people she would have to notify is inertia that keeps her from changing providers. John has no such inertia and might have no allegiance to his wireless provider—as long as a competing provider can provide uninterrupted service for his 45-minute call on Wednesday mornings.
Jane also has a lot of influence. Since she talks to so many different people, they will all know if she is satisfied or dissatisfied with her service. She is a customer that the cellular company wants to keep happy. But, she is not a customer that traditional methods of segmentation would have located.
The Power of Link Analysis
Link analysis is played two roles in this analysis of cellular phone data. The first was visualization. The ability to see some of the graphs representing call patterns makes patterns for things like inertia or influence much more obvious. Visualizing the data makes it possible to see patterns that lead to further questions. For this example, we chose two profitable customers considered similar by previous segmentation techniques. Link analysis showed their specific calling patterns and suggested how the customers differ. On the other hand, looking at the call patterns for all customers at the same time would require drawing a graph with hundreds of thousands or millions of nodes and hundreds of millions of edges.
470643 c10.qxd 3/8/04 11:16 AM Page 346
346 Chapter 10
Second, link analysis can apply the concepts generated by visualization to larger sets of customers. For instance, a churn reduction program might avoid targeting customers who have high inertia or be sure to target customers with high influence. This requires traversing the call graph to calculate the inertia or influence for all customers. Such derived characteristics can play an important role in marketing efforts.
Different marketing programs might suggest looking for other features in the call graph. For instance, perhaps the ability to place a conference call would be desirable, but who would be the best prospects? One idea would be to look for groups of customers that all call each other. Stated as a graph problem, this group is a fully connected subgraph. In the telephone industry, these subgraphs are called “communities of interest.” A community of interest may represent a group of customers who would be interested in the ability to place conference calls.
Lessons Learned
Link analysis is an application of the mathematical field of graph theory. As a data mining technique, link analysis has several strengths:
■■
It capitalizes on relationships.
■■
It is useful for visualization.
■■
It creates derived characteristics that can be used for further mining.
Some data and data mining problems naturally involve links. As the two case studies about telephone data show, link analysis is very useful for telecommunications—a telephone call is a link between two people. Opportunities for link analysis are most obvious in fields where the links are obvious such as telephony, transportation, and the World Wide Web. Link analysis is also appropriate in other areas where the connections do not have such a clear manifestation, such as physician referral patterns, retail sales data, and forensic analysis for crimes.
Links are a very natural way to visualize some types of data. Direct visualization of the links can be a big aid to knowledge discovery. Even when automated patterns are found, visualization of the links helps to better understand what is happening. Link analysis offers an alternative way of looking at data, different from the formats of relational databases and OLAP tools. Links may suggest important patterns in the data, but the significance of the patterns requires a person for interpretation.
Link analysis can lead to new and useful data attributes. Examples include calculating an authority score for a page on the World Wide Web and calculating the sphere of influence for a telephone user.
470643 c10.qxd 3/8/04 11:16 AM Page 347
Link Analysis 347
Although link analysis is very powerful when applicable, it is not appropriate for all types of problems. It is not a prediction tool or classification tool like a neural network that takes data in and produces an answer. Many types of data are simply not appropriate for link analysis. Its strongest use is probably in finding specific patterns, such as the types of outgoing calls, which can then be applied to data. These patterns can be turned into new features of the data, for use in conjunction with other directed data mining techniques.