No matter how many precautionary measures corporations take to avoid lawsuits, the specter of litigation is an omnipresent and potentially costly threat to businesses everywhere. These days, a court order for all documents related to a dispute can entail months of combing through electronically stored information (ESI). The process of finding, securing, aggregating, and reviewing ESI is called eDiscovery. The eDiscovery industry today is a fast-growing and expensive service but one that firms must use in the event of corporate litigation. By employing machine learning and the appropriate business model, businesses can fundamentally enhance a process that has been necessary but all too costly.

Firms either anticipating or in the process of a litigation procedure employ eDiscovery processors in accordance with court mandates. Field experts use the Electronic Discovery Reference Model (EDRM)—which involves nine distinct steps—to isolate the specific steps involved in this process. Important steps include:

  1. Identification & Aggregation of all forms of electronically stored information (ESI) that a firm has produced over a given time period.
  2. Collection and Preservation of all ESI in a format readable to eDiscovery reviewers
  3. Review of all relevant materials by paralegals (essentially, combing through data to find potential evidence of wrongdoing).
  4. Production of said evidence to a court of law.

The point of eDiscovery is to go from volume to relevance. With each successive step, an effective eDiscovery processor can winnow down the total amount of data ultimately presented to the courts and expedite the rest of the litigation process.

The increased computerization of the business world has already led to the production and accumulation of vast amounts of data. New data-gathering technologies render old forms of data collection obsolete within years or months; few law firms or corporate lawyers are able to sustain in-house eDiscovery services. This explosion in raw data shows no signs of letting up; 90 percent of the data in the world today has been created within the last two years. By 2020, the estimated annual global data production will be almost forty zettabytes (one zettabyte is equal to one billion terabytes). This increase in overall data will lead to larger cases and more business for the eDiscovery market.

In addition to the sheer size of data being produced, an additional case for the future growth of the eDiscovery space can be observed through the projected drivers of data production. One such vehicle of data creation is social media services. As platforms like Twitter and Facebook continue to attract larger user bases, more data is being created. It will be necessary for eDiscovery firms to quickly parse through this specific form of ESI. 50.6 percent of polled law firms were involved in at least one matter that contained social media data in the past year and 19.1 percent were engaged in three or more matters that involved social media. The rise of personal devices in the workplace presents a range of data collection issues stemming from a more diverse range of operating systems. The anticipated Internet of Things—a network of physical devices communicating with each other without need for direct human supervision—is expected to consist of 50 billion objects by 2020 and is expected to generate vast sums of total aggregated data, thereby increasing the amount of data that must potentially be sorted through by litigators and interested parties through eDiscovery. The bottom line is that more and more varied data requires more comprehensive eDiscovery processes.

Many established law firms still have not incorporated eDiscovery into their array of offered services. In fact, many tried; as courts began sanctioning the use of eDiscovery in more corporate cases, major law firms began purchasing sophisticated software to bring eDiscovery services in-house. But these law firms lacked the necessary IT infrastructure and technical expertise to administer the process for clients. The main problem was these firms treated eDiscovery as a product to be sold as software rather than a process to be rendered as a service by trained professionals. Recently, consultant and tech giants have had more success at incorporating eDiscovery into their wheelhouse. It remains to be seen whether or not firms such as Xerox and Deloitte can have more success in this niche space than historically technophobic law partnerships. Regardless of the specific value chain, the corporate clients are, in all cases, by far the least powerful player in the value chain; from blue chip to microcap and below, there is close to perfectly inelastic demand for eDiscovery in the event of legal action. The case is self-evident; a corporation can do little to avoid some forms of litigation and can only make sure that it has an effective and uniform means by which it saves its own data in order to minimize costs.

Though eDiscovery process has put a strain on corporations involved in civil or criminal cases, one innovation may be able to bring down the cost of the process. Recently, several firms such as Kroll Ontrack, Daegis, CDS and KCura have experimented with machine learning in order to cut down on the most expensive and time-consuming part of the eDiscovery process: review. Without technology-assisted review (TAR), the review process is conducted by trained paralegals who are responsible for manually combing through all ESI to search for data that could potentially be relevant to the case. A computer can begin to pick up on trends in the data review process and “learn” to search for certain keywords. Predictive coding has been somewhat controversial in the eyes of the court. According to the 2014 changes to the Federal Rules of Civil Procedure (FRCP), predictive coding programs are judged based on the following four metrics:

  1. Precision: How effective is the program at parsing the important from the inconsequential and is that accuracy superior that of manual review?
  2. Can the program operate similarly well on a diverse range of projects?
  3. Does the program provide time and cost efficiencies compared to rival automated processes and manual review?
  4. Can the product of your TAR eDiscovery program truly stand as valid evidence in a court of law?

Programs that are able to meet these benchmarks can cut down an estimated 55 percent of the total cost of the eDiscovey process and consequently undercut all would-be competitors.

The eDiscovery space is brimming with opportunity. Clients will continue to need eDiscovery services as our world continues to churn out ever-greater volumes and varieties of data. Though effectively melding TAR with the eDiscovery process may not sound as flashy as developing the next Flappy Bird, for any entrepreneur interested in expanding markets, cutting-edge technology, and the future of big data, eDiscovery presents a practical solution.