September | 2016 | Clustify Blog – eDiscovery, Document Clustering, Technology-Assisted Review (Predictive Coding), Information Retrieval, and Software Development

The 2016 Northeast eDiscovery & IG Retreat was held at the Ocean Edge Resort & Golf Club. It was the third annual Ing3nious retreat held in Cape Cod. The retreat featured two simultaneous sessions throughout the day in a beautiful location. My notes below provide some highlights from the sessions I was able to attend. You can find additional photos here.

Peer-to-Peer Roundtables
The retreat started with peer-to-peer round tables where each table was tasked with answering the question: Why does e-discovery suck (gripes, pet peeves, issues, etc.) and how can it be improved? Responses included:

How to drive innovation? New technologies need to be intuitive and simple to get client adoption.
Why are e-discovery tools only for e-discovery? Should be using predictive coding for records management.
Need alignment between legal and IT. Need ongoing collaboration.
Handling costs. Cost models and comparing service providers are complicated.
Info governance plans for defensible destruction.
Failure to plan and strategize e-discovery.
Communication and strategy. It is important to get the right people together.
Why not more cooperation at meet-and-confer? Attorneys that are not comfortable with technology are reluctant to talk about it. Asymmetric knowledge about e-discovery causes problems–people that don’t know what they are doing ask for crazy things.

Catching Up on the Implementation of the Amended Federal Rules
I couldn’t attend this one.

Predictive Coding and Other Document Review Technologies–Where Are We Now?
It is important to validate the process as you go along, for any technology. It is important to understand the client’s documents. Pandora is more like TAR 2.0 than TAR 1.0, because it starts giving recommendations based on your feedback right away. The 2012 Rand Study found this e-discovery cost breakdown:73% document review, 8% collection, and 19% processing. A question from the audience about pre-culling with keyword search before applying predictive coding spurred some debate. Although it wasn’t mentioned during the panel, I’ll point out William Webber’s analysis of the Biomet case, which shows pre-culling discarded roughly 40% of the relevant documents before predictive coding was applied. There are many different ways of charging for predictive coding: amount of data, number of users, hose (total data flowing through) or bucket (max amount of data allowed at one time). Another barrier to use of predictive coding is lack of senior attorney time (e.g., to review documents for training). Factors that will aid in overcoming barriers: improving technologies, Sherpas to guide lawyers through the process, court rulings, influence from general counsel. Need to admit that predictive coding doesn’t work for everything, e.g., calendar entries. New technologies include anonymization tools and technology to reduce the size of collections. Existing technologies that are useful: entity extraction, email threading, facial recognition, and audio to text. Predictive coding is used in maybe less than 1% of cases, but email threading is used in 99%.

It’s All Greek To Me: Multi-Language Discovery Best Practices
Native speakers are important. An understanding of relevant industry terminology is important, too. The ALTA fluency test is poor–the test is written in English and then translated to other languages, so it’s not great for testing ability to comprehend text that originated in another language. Hot documents may be translated for presentation. This is done with a secure platform that prohibits the translator from downloading the documents. Privacy laws make it best to review in-country if possible. There are only 5 really good legal translation companies–check with large firms to see who they use. Throughput can be an issue. Most can do 20,000 words in 3 days. What if you need to do 200,000 in 3 days? Companies do share translators, but there’s no reason for good translators to work for low-tier companies–good translators are in high demand. QC foreign review to identify bad reviewers (need proficient managers). May need to use machine translation (MT) if there are millions of documents. QC the MT result and make sure it is actually useful–in 85% of cases it is not good enough. For CJK (Chinese, Japanese, Korean), MT is terrible. The translation industry is $40 billion. Google invested a lot in MT but it didn’t help much. One technology that is useful is translation memory, where repeated chunks of text are translated just once. People performing review in Japanese must understand the subtlety of the American legal system.

Top Trends in Discovery for 2016
I couldn’t attend this one

Measure Twice, Discover Once
Why measure in e-discovery? So you can explain what happened and why, for defensibility. Also important for cost management. The board of directors may want reports. When asked for more custodians you can show the cost and expected number of relevant documents that will be added by analyzing the number of keyword search hits. Everything gets an ID number for tracking and analysis (USB drives, batches of documents, etc.). Types of metrics ordered from most helpful to most harmful: useful, no metric, not useful, and misleading. A simple metric used often in document review is documents per hour per reviewer. What about document complexity, content complexity, number and type of issue codes, review complexity, risk tolerance instructions, number of “defect opportunities,” and number coded correctly? Many 6-sigma ideas from manufacturing are not applicable due to the subjectivity that is present in document review.

Information Governance and Data Privacy: A World of Risk
I couldn’t attend this one

The Importance of a Litigation Hold Policy
I couldn’t attend this one

Alone Together: Where Have All The Model TAR Protocols Gone?
If you are disclosing details, there are two types: inputs (search terms used to train, shared review of training docs) and outputs (target recall or disclosure of recall). Don’t agree to a specific level of recall before looking at the data–if prevalence is low it may be hard. Plaintiff might argue for TAR as a way to overcome cost objections from the defendant. There is concern about lack of sophistication from judges–there is “stunning” variation in expertise among federal judges. An attorney involved with the Rio Tinto case recommends against agreeing on seed sets because it is painful and focuses on the wrong thing. Sometimes there isn’t time to put eyes on all documents that will be produced. Does the TAR protocol need to address dupes, near-dupes, email threading, etc.?

Information Governance: Who Owns the Information, the Risk and the Responsibility?
I couldn’t attend this one

Bringing eDiscovery In-House — Savings and Advantages
I was on this panel so I didn’t take notes

Clustify Blog – eDiscovery, Document Clustering, Technology-Assisted Review (Predictive Coding), Information Retrieval, and Software Development

Thoughts on e-discovery, computers, and software development.

Monthly Archives: September 2016

Highlights from the Northeast eDiscovery & IG Retreat 2016