Thomson Reuters News & Insight
Featured Content from WESTLAW

Legal

  •  
  •  

Lauren Aguiar (L), Jonathan A. Friedman (R)

Predictive coding: What it is and what you need to know about it

2/25/2013 COMMENTS (0)

 By Lauren Aguiar and Jonathan A. Friedman   

(Lauren Aguiar is a partner at Skadden, Arps, Slate, Meagher & Flom LLP in New York.  Jonathan Friedman, an associate in the firm’s New York office, provided assistance with the article.)   

Although the technologies involved with the concept of technology-assisted review, or “predictive coding,” have only emerged over the last several years, recent legal developments may signal a new chapter for electronic discovery in civil litigation.  Indeed, the first judicial opinion regarding predictive coding was issued just last year.   

WHAT IS PREDICTIVE CODING?

Similar to the technologies employed by Netflix and Pandora, predictive coding relies on a human to code a sample of documents, called a “seed set,” that in turn allows a sophisticated computer algorithm to identify properties of those documents and then evaluate the remaining documents, looking for similar characteristics and making predictions about how they should be coded.  A human reviewer then examines those predictions and confirms or refines them through a series of iterative rounds of further coding.  Ideally, this “training” will ultimately lead the software to a point where its predictions are accurate enough to produce a set of documents with high recall and precision – meaning, respectively, that all and only the responsive documents are included.

RECENT PRECEDENT

Da Silva Moore v. Publicis Groupe (2012)  

In Da Silva, Magistrate Judge Peck of the Southern District of New York considered the parties’ disputes regarding the defendant’s proposed protocol and production requirements, after the parties had agreed to implement predictive coding.  He concluded that predictive coding was appropriate because it advanced “the just, speedy, and inexpensive” resolution of the case, as dictated by Rule 1 of the Federal Rules of Civil Procedure, in light of the parties’ agreement, the large volume of electronically stored information to be reviewed, the superiority of predictive coding over available alternatives, the need for cost effectiveness and proportionality under Rule 26(b)(2)(C), and the transparent process proposed by the defendant.

The transparency consideration is particularly interesting because the defendant agreed to produce the entire seed set consisting of several thousand documents (both responsive and nonresponsive), except for privileged documents, so the plaintiffs could review these documents and their coding.  After coding the seed set, defendant’s counsel would conduct seven rounds of iterative review.  Invoking principles of proportionality, the court suggested that if, after these rounds, the null set (the documents the computer predicted to be nonresponsive) included relevant documents that “d[id] not add anything to the case,” it might not matter, but if so-called “smoking gun” or “hot” documents remained in the null set, more training of the software might be necessary.  The court was unconcerned with potential ambiguities in the standard for relevance because the proposed level of transparency would allow the plaintiffs to review the coding and raise issues with the court as needed.  Essentially, Magistrate Judge Peck decided to analyze the protocol’s adequacy by reviewing the results ex post.

Global AeroSpace v. Landow Aviation, L.P. (2012)  

In Landow,the Virginia Circuit Court authorized the defendants to utilize predictive coding over the plaintiffs’ opposition.  The defendants argued that in this case, predictive coding offered greater time savings, recall and precision.  Meanwhile, the plaintiffs disputed the software’s accuracy and declined to participate in the process, noting that the production’s small size failed to necessitate going through an iterative review process.  However, Judge Chamblin’s approach was to analogize the issue with disputes of an earlier era when parties argued about whether junior associates should be allowed to review documents instead of more senior lawyers.  In Chamblin’s view, both scenarios involve decisions about the practicality of reviewing large document sets.  The court concluded that the defendants could proceed with their preferred method, because it offered potential cost savings and quality of review that was at least as good as what could be expected from human review, but noted that the plaintiffs could raise subsequent objections as discovery progressed.  In January 2013, the defendants reportedly completed production and the deadline for the plaintiffs to raise their objections passed without incident.  As a consequence of the plaintiffs’ tacit acceptance of the process, it appears that the court will not have the opportunity review the adequacy of the results. 

Kleen Products LLC v. Packaging Corp. of America (2012)  

In Kleen, the plaintiffs sought to force predictive coding on the defendants even though the defendants had already produced documents using keyword searches.  Rather than weigh the merits of one technology over another, Magistrate Judge Nolan of the Northern District of Illinois emphasized the need for the parties to cooperate in crafting a keyword searching protocol so the case could progress.  She also endorsed Sedona Principle 6, which states that “[r]esponding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.”  At the court’s urging for cooperation in order to move the case along, the parties ultimately reached a stipulation by which the plaintiffs withdrew their initial challenge to defendants’ use of keyword searching but reserved their rights to object to the defendants’ methodology and to demand that predictive coding be used in later requests.

In re Actos (Pioglitazone) Products Liability Litigation (2012)  

In Actos, the parties had already stipulated to a predictive coding protocol, and the Western District of Louisiana court incorporated the parties’ agreement in the case management order, which included a “Search Methodology Proof of Concept” describing the parties’ agreement that both parties would have the opportunity to code all of the documents in the seed set, responsive or not, except for those subject to privilege.  Likewise, the order dictated that the parties would meet and confer regarding any conflicting coding decisions.  Actos appears to follow Da Silva in forcing the parties to make joint decisions about relevance in training the software, thus extending the theme of cooperation seen in the earlier cases.   

EORHB, Inc. v. HOA Holdings LLC (2012)  

In October 2012, Vice Chancellor Laster of the Delaware Court of Chancery endorsed the use of predictive coding in the context of a non-expedited indemnification proceeding.  He ordered the parties to show cause if they did not want to use such coding, and also ordered them to agree on a single discovery provider that could warehouse both parties’ documents and “maintain the integrity of both side’s [sic] documents.”  He found these proceedings to be an “ideal” context for predictive coding because “these types of indemnification claims can generate a huge amount of documents” but are not expedited.  His view was that it was better to employ technology-assisted review instead of “burning lots of hours with people reviewing.”

PRACTICAL TAKEAWAYS

Across these cases, two themes emerge.  First, Rules 1 and 26(b)(2)(C)(iii) of the Federal Rules of Civil Procedure are important in predictive coding decisions.  Rule 1 states that procedural decisions should be made “to secure the just, speedy, and inexpensive determination of every action and proceeding,” while Rule 26(b)(2)(C)(iii) allows a court to limit discovery if “the burden or expense of the proposed discovery outweighs its likely benefit, considering the needs of the case . . . and the importance of the discovery in resolving the issues.”  While not all of the judges addressed the virtues of one technology over another, all emphasized and approved of parties’ cooperation to facilitate the discovery process.  Second, the cases make clear that the standards being applied to predictive coding issues are not uniform and, at present, are being resolved on a case-by-case basis.  In Kleen, the parties cooperated so that proceedings on other pressing issues could progress, while in the other cases, the parties and the courts tested the waters with predictive coding, subject to the potential for additional hearings on disputed issues.   

Currently, predictive coding is a replacement for neither predecessor technology nor for human review.  Rather, all available technologies should be seen as tools that might be useful to practitioners when deployed under the proper circumstances.  In fact, predictive coding may prove helpful in contexts beyond document production that require the review of large sets of electronically stored information (e.g., in witness preparation or summary judgment briefing).  Finally, selecting a trustworthy and well-reputed e-discovery vendor may also be crucial to deciding among the divergent technologies, price structures, products and services that are available in any given case.


Register or log in to comment.

© 2013 Thomson Reuters