Facebook, Amazon, Google, LinkedIn — one of the reasons for the success of these companies is that they make use of complex networks. Scholarly interest in networks has increased dramatically in the last 10 years due to the growth of online social networks and the ability to access detailed network data in such large volumes that were previously unavailable.
Economists are becoming increasingly aware of the pervasiveness of network information and are rapidly developing new techniques to convert this information into knowledge. For example, in the last several years, in certain circumstances, forensic data analysts have started to use economic network analysis combined with advanced econometrics for detecting patterns of potentially illicit behavior in transactions data. Some common examples include:
- Fraudulent behavior in healthcare claims data
- Financial Ponzi schemes
- Payments made from an unmarked account
- Money laundering activity
In an investigation setting, attorneys and forensic specialists often seek to identify the symptoms of the illicit behavior (usually the misappropriation of money) and then attempt to link the symptoms back to their driver (the person performing the misappropriation). This article highlights the ability of the new science of network analysis to reveal previously unknown relationships, influences and commonality among the actual drivers of the illicit behavior before the symptoms are investigated.
Depending on the scope of the investigation or allegations, the ability to identify the perpetrator (disease) may be more important than identifying the fraudulent or illicit transactions (symptoms). After all, it should be perfectly clear from headlining newspaper articles that events associated with the misappropriation of money are usually caused directly by individuals, not by random chance.
Before we describe how these network analyses work and why they are different from previously used data analytics, we offer a quick introduction to the innovative theory of economic networks. We conclude the article with a case study using an actual qui tam False Claims Act matter where network analysis was used to assist the attorneys in determining that a small number of doctors within a healthcare facility were allegedly working together to inflate medical claims and that this behavior was not systematic through the facility.
In its most basic form, a network is any collection of items in which pairs of these items are connected in some manner. These connections could be between doctors at a medical facility making similar diagnoses, bank account holders receiving money from common routing numbers, consumers with similar online purchases, etc. Individual members of a network have certain characteristics that describe them within the context of the network. For example, in a social network a member can be central to the network (very social) or can be peripheral (a loner). A member can be the sole link between several groups of members (the diplomat) or can belong very strongly to only one group in the network (the club president).
In studying the ways that networks can be used to detect illicit behavior or to suggest targeted areas of investigation for attorneys, it is important to understand the concept of homophily (pronounced HOME-ah-filly). This is the propensity of network members to select connections with other members who have similar characteristics.
For the purposes of an investigation, this concept is important in establishing a data analytic approach to identifying the fingerprint, or DNA, of a fraudulent scheme or illicit behavior. The assumption that underlies the unique success of this type of analysis is that the perpetrators operate with some degree of regularity that exhibits a pattern in the transaction data which can be revealed only through a network analysis. This is often a reasonable assumption because of the strong and pervasive interaction between an individual and his / her social network:
“Homophily is the principle that a contact between similar people occurs at a higher rate than among dissimilar people. The pervasive fact of homophily means that cultural, behavioral, genetic or material information that flows through networks will tend to be localized. Homophily implies that distance in terms of social characteristics translates into network distance, the number of relationships through which a piece of information must travel to connect two individuals.” 
This tendency for “birds of a feather to flock together” is well understood by attorneys and other forensic investigators as they search for accomplices through email traffic, social media posts, friends on Facebook and interviews with close work colleagues and colleagues in the same department. The results of a network analysis can also be very helpful in making a determination whether the illicit behavior was being performed sporadically by a rogue employee or systematically by an administrator at the top.
Investigators are often concerned with the fraud-generating process because people are committing the fraud, not the invoices or the insurance claims. While misstated claims, invoices and receipts may be symptoms of the fraud, they are only the result of the fraud. Detecting the fraud generating process, or root cause, through statistical analysis can be difficult because fraudulent behavior is often:
- Well considered
- Purposefully concealed
- Time-evolving through different forms
- Lacking data
While statistical analyses alone cannot prove that a transaction is fraudulent, they can identify potentially suspicious activity for further review.
Alvarez & Marsal (A&M) Network Analysis Case Study: False Claims Act
A&M was recently engaged to assist on a qui tam False Claims Act case regarding the quantification and identification of fraudulent medical claims. The case involved more than 140,000 medical claims where a sample of 2,500 claims was selected and reviewed for suspicious activities. Of these 2,500 medical claims, 161 were identified as being false claims. We initially used the more traditional approaches to identify and predict the fraudulent claims. The traditional types of unsupervised analyses, such as correlations, tabulations, and charts produced little or no results. Even the more sophisticated supervised predictive models, such as logistic regression, produced poor results and had little ability to identify the fraudulent claims.
The primary reason for these techniques’ poor performance is that they fail to incorporate any information about the primary fraud driver — people. Different and more innovative methods were necessary in order to incorporate into our predictive model this notion of people as the primary driver of fraud. This insight as to how to incorporate people into the model came when we discovered that our data identified each doctor who treated each patient who was associated with each and every claim. With this doctor / patient / claim-specific information, we created a network of doctors, where each doctor is a member of the network and each connection between a pair of doctors indicates that the doctors treated at least one patient in common.
When doctors treat the same patient, it is very likely they would have interactions with one another and thus know each other. This information then allowed us to create a network graph of doctors, detailing the relationships and the extent of interactions that the doctors had with one another and, more importantly, how these interactions were related to the fraudulent claims. Below is a representative example of the network of doctors, where the doctors associated with patients with fraudulent claims have been marked as a red node.
Doctor Network defined by common patients:
By creating a doctor network from the actual claims data, patterns associated with the fraud begin to emerge to which the more traditional methods are blind. One of the obvious features of this doctor network is homophily; that is, doctors who are associated with fraudulent claims are more likely to be connected to other doctors who are also associated with fraudulent claims. In other words, “fraud begets fraud.” In fact, of the 18 doctors who were associated with fraudulent claims only two were not connected with another doctor who was also associated with fraudulent claims. Stated another way, 16 of the 18 doctors who were associated with a fraudulent claim were connected with at least one other doctor who was also associated with a fraudulent claim.
A&M then extracted this network information, how doctors knew one another, and integrated it back into our original predictive model.With the introduction of information garnered from the doctor network into our original predictive model (a logistic regression), our results improved considerably. The original predictive model only identified one of the 161 (0.6 percent) fraudulent medical claims, while the network-enhanced predictive model correctly identified 43 of the 161 (27 percent) fraudulent medical claims. The network-enhanced model also improved on its positive predictive ability; that is, when the model predicts that a medical claim is fraudulent, what is the probability that the claim is truly fraudulent. This is often described as the “true given true rate.” The original statistical model predicted eight fraudulent claims, of which only one was truly fraudulent, indicating that the model only had a 1/8, or 12.5 percent positive predictive ability; while the network-enhanced model now predicted a total of 65 fraudulent claims of which 43 were correctly identified, for a 43/65 or a 66 percent positive predictive ability.
These network enhancements to the predictive model improved the model’s performance and ability to identify the fraudulent claims. They transformed the original model, which had very little ability to assist in identifying any further avenues of investigation into the fraud, into a tool that can greatly assist forensic professionals in identifying further avenues of investigation.
 Miller McPherson, Lynn Smith-Lovin, and James M. Cook, “Birds of a Feather: Homophily in Social Networks,” Annual Review of Sociology 27:415–44 (2001), p. 416.
 Specifically, network statistics such as the doctor’s degree (number of connections), betweenness (number of shortest paths in the network that traverse through a member), closeness (average distance that each member is from all other members in the network) and modularity (measures groups, clusters or communities) were integrated into our original predictive model.