Monthly Archives: May 2015

Using Extrapolated Precision for Performance Measurement

This is a brief overview of my paper “Information Retrieval Performance Measurement Using Extrapolated Precision,” which I’ll be presenting on June 8th at the DESI VI workshop at ICAIL 2015 (slides now available here).  The paper provides a novel method for extrapolating a precision-recall point to a different level of recall, and advocates making performance comparisons by extrapolating results for all systems to the same level of recall if the systems cannot be evaluated at exactly the same recall (e.g., some predictive coding systems produce a binary yes/no prediction instead of a relevance score, so the user cannot select the recall that will be achieved).

High recall (finding most of the relevant documents) is important in e-discovery for defensibility.  High precision is desirable to ensure that there aren’t a lot of non-relevant documents mixed in with the relevant ones (i.e., high precision reduces the cost of review for responsiveness and privilege).  Making judgments about the relative performance of two predictive coding systems knowing only a single precision-recall point for each system is problematic—if one system has higher recall but lower precision for a particular task, is it the better system for that task?

There are various performance measures like the F1 score that combine precision and recall into a single number to allow performance comparisons.  Unfortunately, such measures often assume a trade-off between precision and recall that is not appropriate for e-discovery (I’ve written about problems with the  F1 score before).  To understand the problem, it is useful to look at how F1 varies as a function of the recall where it is measured.  Here are two precision-recall curves, with the one on the left being for an easy categorization task and the one on the right being for a hard task, with the F1 score corresponding to each point on the precision-recall curve superimposed:

f1_compare2If we pick a single point from the precision-recall curve and compute the value of F1 for that point, the resulting F1 is very sensitive to the precision-recall point we choose.  F1 is maximized at 46% recall in the graph on the right, which means that the trade-off between precision and recall that F1 deems to be reasonable implies that it is not worthwhile to produce more than 46% of the relevant documents for that task because precision suffers too much when you push to higher recall.  That is simply not compatible with the needs of e-discovery.  In e-discovery, the trade-off  between precision (cost) and recall required should be dictated by proportionality, not by some performance measure that is oblivious to the value of the case.  Other problems with the F1 score are detailed in the paper.

The strong dependence that F1 has on recall as we move along the precision-recall curve means that it is easy to draw wrong conclusions about which system is performing better when performance is measured at different levels of recall.  This strong dependence on recall occurs because the contours of equal F1 are not shaped like precision-recall curves, so a precision-recall curve will cut across many contours.   In order to have the freedom to measure performance at recall levels that are relevant for e-discovery (e.g., 75% or higher) without drawing wrong conclusions about which system is performing best, the paper proposes a performance measure that has constant-performance contours that are shaped like precision-recall curves, so the performance measure depends much less on the recall level where the measurement is made than F1 does. In other words, the proposed performance measure aims to be sensitive to how well the system is working while being insensitive to the specific point on the precision-recall curve where the measurement is made.  This graph compares the constant-performance contours for F1 to the measure proposed in the paper:


Since the constant-performance contours are shaped like typical precision-recall curves, we can view this measure as being equivalent to extrapolating the precision-recall point to some other target recall level, like 75%, by simply finding an idealized precision-recall curve that passes through the point and moving along that curve to the target recall.  This figure illustrates extrapolation of precision measurements for three different systems at different recall levels to 75% recall for comparison:


Finally, here is what the performance measure looks like if we evaluate it for each point in the two precision-recall curves from the first figure:


The blue performance curves are much flatter than the red F1 curves from the first figure, so the value is much less sensitive to the recall level where it is measured.  As an added bonus, the measure is an extrapolated estimate of the precision that the system would achieve at 75% recall, so it is inversely proportional to the cost of the document review needed (excluding training and testing) to reach 75% recall.  For more details, read the paper or attend my talk at DESI VI.

SSD Storage Can Lose Data When Left Without Power

I came across this article today, and I think it is important for everyone to be aware of it.  It says that SSDs (solid-state drives), which are becoming increasingly popular for computer storage due to their fast access times and ability to withstand being dropped, “need consistent access to a power source in order for them to not lose data over time. There are a number of factors that influence the non-powered retention period that an SSD has before potential data loss. These factors include amount of use the drive has already experienced, the temperature of the storage environment, and the materials that comprise the memory chips in the drive.”  Keep that risk in mind if computers are powered down during a legal hold.  The article gives details about how long the drives are supposed to retain data while powered down.

Highlights from the NorCal eDiscovery & Information Governance Retreat 2015

The NorCal eDiscovery & Information Governance Retreat is part of the series of retreats 2015_norcal_outsideheld by Chris La Cour’s company, Ing3nious.  This one was held at the Meritage Resort & Spa in Napa, California.  As always, the venue was beautiful, the food was good, and the talks were informative.  You can find all of my photos from the retreat and the nearby Skyline Wilderness Park here.  My notes below offer a few highlights from the sessions I attended.  There were often two sessions occurring simultaneously, so I couldn’t attend everything.

Keynote: Only the Paranoid Survive: What eDiscovery Needs to Survive the Big Data Tsunami

The keynote was by Alex Ponce de Leon from Google.  He made the point that there is a 2015_norcal_keynote   difference between Big Data, which can be analyzed, and “lots and lots of data.”  For information governance, lots of data is a problem.  The excitement over Big Data (he showed this graph and this one) is turning people into digital hoarders–they are saving things that will never be useful, which causes problems for ediscovery.  He mentioned that DuPont analyzed the documents they had to review for a case and found that 50% of them should have been discarded according to their retention policy, resulting in $12 million in document review that wouldn’t have been necessary if the retention policy had been followed (this article discusses it).  Legal and ediscovery people need to take the lead in getting companies to not keep everything.

Establishing In-House eDiscovery Playbooks, Procedures, Tool Selection, and Implementation

There was some discussion about corporations acquiring e-discovery tools and whether that 2015_norcal_seminarcaused concerns from outside counsel about what was being done since they must sign off on it.  Ben Robbins of LinkedIn said they haven’t had significant problems with that.  The panel emphasized the importance of documenting procedures and making sure that different types of matters were addressed individually.

Cybersecurity…it’s what’s for dinner. So, what’s the recipe and who’s the head chef?

I couldn’t attend this one.

A Look Back on Model eDiscovery Orders

Judge Rader’s e-discovery model order (here is a related article), which limits discovery to five custodians and five search terms per custodian, was discussed.  It was motivated by a need to curtail patent trolls in the Eastern District of Texas who were using ediscovery costs as a weapon.  It was mentioned that discovery of backups may become more feasible as people move away from using tape for backups.  Producing reports rather than raw databases was discussed, with the point being made that standard reports are usually okay, but custom reports often don’t match the requesting party’s expectations and cause conflicts.  Model orders go out the window when dealing with government agencies–many want everything.

Information Governance and Security: Keeping Security in Sight

I couldn’t attend this one.

How to Leverage Information Governance for Better eDiscovery

I couldn’t attend this one.

Avoiding Land Mines in TAR

I was on this panel, so I didn’t take notes.

Managing BYOC/D and Wearables in International eDiscovery and Investigations

I couldn’t attend this one.

Social Media – eDiscovery’s “friend”?

An employee may see a social media account as personal, but it must be preserved (possibly 2015_norcal_lunchfor years).  Need to remind the employee of the hold.  Don’t friend represented opposition, but okay to friend witnesses if you are up front about why.  Lawyers can friend judges, but not if they have a case before them.  You should read your judge’s tweets to see if there is a sign of bias.  Getting data from a social media company is difficult.  Look to see if jurors are tweeting about the case.

Inside the Threat Matrix: Cyber Security Risks, Incident Response, and the Discovery Impact

I couldn’t attend this one.

Resolving the Transparency Paradox

TAR 1.0 has a lot of foreign concepts like “stabilization” (optimal training), whereas TAR 2.0 (continuous active learning) is more like traditional review.  Hal Marcus of Recommind mentioned that when he surveyed the audience at another event, many said they had used predictive coding but few disclosed doing so.  The panel discussed allowing the requesting party to provide a seed set to make them feel better about using TAR, or raising the possibility of using TAR early on to see if there is pushback.  The Coalition of Technology Resources for Lawyers has a database of case law on predictive coding that was mentioned.

Judicial Panel

Judges now get ediscovery.  They see a lack of communication.  Responding parties object to everything.  Judges are unlikely to interfere when the parties have a thought-out ediscovery plan.  Inside counsel are taking more control to reduce costs.  The RAND study “Where the Money Goes” was mentioned.  Regarding cost shifting, an attorney may choose to pay to have more control.