Printable versionSend by emailPDF version
August 19, 2015

Electronically Stored Information Explosion

Recent years have borne witness to an explosion of electronically stored information (ESI). By some estimates, more than 99 percent of the world’s data, much of it in the corporate sphere, has been created in only the last three years.[1] And the creation of new data is expected to continue apace.

At the same time, increasingly byzantine regulatory requirements and the ever-present threat of litigation have put mounting strains on the records management functions of modern corporations. These trends, combined with steadily decreasing costs for storing ESI, have resulted in a pervasive “keep everything” attitude regarding the storage of electronic files. Stockpiling of data is common among legal and compliance managers, who regard the costs of storage as small compared to the potential costs of sanctions that can result from violation of complicated retention rules and standards. Stockpiling is also common among business managers, who feel that because storage is inexpensive, they can and should retain anything that might be useful in the future.

Data stockpiling has resulted in corporate data stores that are rife with obsolete and redundant data. Figures 1 and 2 represent a large random sample of files from the shared drives of a large multi-national corporation and depict a high degree of obsolescence and redundancy.

Figure 1. Obsolescence. Time since files have been accessed as a percentage of all files. Only 47 percent of the files in this example have been accessed in the last year.

Figure 2. Redundancy. Duplication of files, as a percentage of all files. Only 63 percent of files in the sample are unique. The remaining have at least one duplicate.

Data Stockpiling and the Need for Defensible Deletion - Redundancy


The Costs and Risks of Data Stockpiling

While it is true that the hardware required to store vast volumes of data has become relatively inexpensive, retaining such data indefinitely and indiscriminately is not without costs or risks. The costs and risks manifest themselves in multiple ways:

  1. The Pace of New Data Creation – It is nearly certain that the price of electronic data storage will continue to drop. At the same time, the pace at which new data is being created continues to rise. Between 2010 and 2014, the per gigabyte cost of hard drive storage fell by an average of approximately 23 percent per year.[2] Still, the total amount of data stored is expected to double every two years through 2020,[3] a growth rate that outpaces recent declines in storage costs. Taken together, these trends suggest that without a mitigating strategy in place, IT departments will see continuing increases in expenditures for electronic storage and related systems such as backup and recovery.
  2. E-Discovery Management – As electronic files and communications have become the primary form of information storage for corporations, e-discovery is now the dominant component of the discovery process. Collection, processing, and review of ESI represent some of the major costs of litigation. New techniques such as Technology Assisted Review and Predictive Coding show great promise for increasing the efficiency — and thereby reducing the cost — of e-discovery. Yet, each additional piece of ESI that makes its way through the e-discovery process represents an incremental cost. As the volume of data stored by corporations continues to grow, sorting through these vast troves of information to identify materials that may be responsive to incoming legal discovery requests will (and already has) become an increasingly daunting and expensive task.
  3. Unnecessary Discovery Production – Related to e-discovery management, organizations that stockpile data bear a significant risk of unnecessarily producing damaging materials in response to legal discovery requests. Today’s e-discovery procedures typically include a process by which opposing parties agree to “search parameters,” a set of parameters that describes the characteristics of documents to be produced. In most cases, these parameters specify that a document must be produced if it: (1) was held by an agreed upon set of persons (“custodians”), (2) was created or modified within an agreed upon date range, and (3) contains any of a set of agreed upon keywords, word fragments, or word combinations (“search terms”). Production of documents that satisfy these parameters — with certain exclusions providing for confidentiality, attorney-client privilege, and other conditions — is compulsory, even if the documents being produced contradict the arguments of the producing party. Production of a damaging document is unfortunate but sometimes inevitable. However, production of a damaging document that should not have been retained is inexcusable.
  4. Increasing Frequency of Data Breaches – The frequency and scale of hacking incidents and cybersecurity breaches of corporate and government networks has increased dramatically. Most recently, suspected hackers breached the network of the Office of Personnel Management, a U.S. government agency, gaining access to the personal information of as many as 22 million government employees.[4] According to a 2015 Ponemon Institute study, the total cost of data breaches has increased by 23 percent since 2013.[5] The more data an organization retains, the more there is to steal. This is a particularly significant issue for organizations that store large amounts of proprietary information, personally identifiable information, or protected health information, including, but not limited to, financial institutions and healthcare companies.

Changes to the Federal Rules of Civil Procedure

Even in light of these costs and risks, legal managers may still argue that data stockpiling is warranted; the rules and obligations governing data retention are vague and, in the realm of civil litigation, the consequences associated with their violation can be severe, including monetary sanctions, adverse inferences, summary judgment, or dismissal of a legal action. Fortunately, proposed changes to the Federal Rules of Civil Procedure, effective December 1, 2015 — particularly, changes to Rule 37(e) (sometimes known as the “Safe Harbor” rule) — should alleviate some of these concerns. A complete analysis of the proposed changes to Rule 37(e) goes beyond the scope of this writing, but the principal effects of those changes are: (1) to reserve the most severe sanctions for cases in which information is willfully destroyed (“with the intent to deprive”) and (2) to apply sanctions only if the information cannot be found elsewhere through additional discovery. [6] These changes significantly lower the legal risks associated with the destruction of old and redundant data. 

Defensible Deletion

In combination, the foregoing suggests that any organization, large or small, would benefit from the implementation of a policy of defensible deletion.

Defensible deletion is the mirror image of document retention and a critical component of a cohesive information governance strategy. Whereas document retention policies concern themselves with what data must be retained for legal or regulatory reasons, a well-designed defensible deletion policy identifies data whose retention is not necessary and, moreover, whose destruction is mandatory. And it does so in a way that is clearly documented, consistently applied, complies with governing regulations, and is thus “defensible,” should questions arise about the destruction of data in a legal or regulatory context.

The specific manner in which defensible deletion is implemented will vary widely by organization, but generally consists of three components:

  1. A governing body consisting of enterprise-wide stakeholders whose responsibility is to guarantee that deletion activities are consistent with legal and regulatory standards, ensure the preservation of needed and valuable materials, provide implementation options, and guide execution
  2. A well-documented and up-to-date policy that unambiguously defines categories of data to be deleted and describes deletion procedures. This is particularly important if charges of spoliation are to be effectively countered
  3. Controlling technology to catalog, categorize, and delete data in accordance with policies and to provide required reporting. This is an area of growing focus and innovation in the IT industry, as corporate managers are becoming increasingly aware of the pitfalls of data stockpiling


The notion that modern organizations can indiscriminately save all data because storage is inexpensive is both misguided and potentially harmful. Preserving data without regard to retention obligations or business value is both expensive and carries with it significant risks related to legal discovery and cybersecurity. Implementation of an effective strategy for defensible deletion can lower costs and minimize risk while addressing concerns about sacrificing valuable business materials or running afoul of regulatory and legal requirements.


  1. Leo Leung, "99.8 percent of the world’s data was created in the last two years," Tech Expectations, October 23, 2014. Accessed August 8, 2015,
  2. “Average Cost of Hard Drive Storage,” Statistic Brain, November 11, 2014. Accessed August 8, 2015,
  3. Charles McLellan, “Storage in 2014: An overview,” ZDNet, U.S. Edition, January 8, 2014. Accessed August 8, 2015,
  4. Mike Levine and Jack Date, “22 Million Affected by OPM Hack, Officials Say,” ABC News, July 9, 2015. Accessed August 8, 2015,
  5. Ponemon Institute, “2015 Cost of Data Breach Study: Global Analysis,” May 2015, p.1. Accessed August 8, 2015,
  6. See Letter to Hon. John A. Boehner, Speaker of the House of Representatives, from John G. Roberts, Chief Justice of the Supreme Court of the United States, April 29, 2015, containing Proposed Amendments to the Federal Rules of Civil Procedure, pp. 25–26 (Rule 37(e)). Accessed August 8, 2015,