Tag Archives: Electronic discovery

TAR vs. Keyword Search Challenge

During my presentation at the NorCal eDiscovery & IG Retreat I challenged the audience to create keyword searches that would work better than technology-assisted review (predictive coding) for two topics.  Half of the room was tasked with finding articles about biology (science-oriented articles, excluding medical treatment) and the other half searched for articles about current law (excluding proposed laws or politics).  I ran one of the searches against TAR in Clustify live during the presentation (Clustify’s “shadow tags” feature allows a full document review to be simulated in a few minutes using documents that were pre-categorized by human reviewers), but couldn’t do the rest due to time constraints.  This article presents the results for all the queries submitted by the audience.

The audience had limited time to construct queries (working together in groups), they weren’t familiar with the data set, and they couldn’t do sampling to tune their queries, so I’m not claiming the exercise was comparable to an e-discovery project.  Still, it was entertaining.  The topics are pretty simple, so a large percentage of the relevant documents can be found with a pretty simple search using some broad terms.  For example, a search for “biology” would find 37% of the biology documents.  A search for “law” would find 71% of the law articles.  The trick is to find the relevant documents without pulling in too many of the non-relevant ones.

To evaluate the results, I measured the recall (percentage of relevant documents found) from the top 3,000 and top 6,000 hits on the search query (3% and 6% of the population respectively).  I’ve also included the recall achieved by looking at all docs that matched the search query, just to see what recall the search queries could achieve if you didn’t worry about pulling in a ton of non-relevant docs.  For the TAR results I used TAR 3.0 trained with two seed documents (one relevant from a keyword search and one random non-relevant document) followed by 20 iterations of 10 top-scoring cluster centers, so a total of 202 training documents (no control set needed with TAR 3.0).  To compare to the top 3,000 search query matches, the 202 training documents plus 2,798 top-scoring documents were used for TAR, so the total document review (including training) would be the same for TAR and the search query.

The search engine in Clustify is intended to help the user find a few seed documents to get active learning started, so it has some limitations.  If the audience’s search query included phrases, they were converted an AND search enclosed in parenthesis.  If the audience’s query included a wildcard, I converted it to a parenthesized OR search by looking at the matching words in the index and selecting only the ones that made sense (i.e., I made the queries better than they would have been with an actual wildcard).  I noticed that there were a lot of irrelevant words that matched the wildcards.  For example, “cell*” in a biology search should match cellphone, cellular, cellar, cellist, etc., but I excluded such words.  I would highly recommend that people using keyword search check to see what their wildcards are actually matching–you may be pulling in a lot of irrelevant words.  I removed a few words from the queries that weren’t in the index (so the words shown all actually had an impact).  When there is an “a” and “b” version of the query, the “a” version was the audience’s query as-is, and the “b” query was tweaked by me to retrieve more documents.

The tables below show the results.  The actual queries are displayed below the tables.  Discussion of the results is at the end.

Biology Recall
Query Total Matches Top 3,000 Top 6,000 All Matches
1 4,407 34.0% 47.2% 47.2%
2 13,799 37.3% 46.0% 80.9%
3 25,168 44.3% 60.9% 87.8%
4a 42 0.5% 0.5%
4b 2,283 20.9% 20.9%
TAR 72.1% 91.0%
Law Recall
Query Total Matches Top 3,000 Top 6,000 All Matches
5a 2,914 35.8% 35.8%
5b 9,035 37.2% 49.3% 60.6%
6 534 2.9% 2.9%
7 27,288 32.3% 47.1% 79.1%
TAR 62.3% 80.4%

tar_vs_search_biology

tar_vs_search_law

1) organism OR microorganism OR species OR DNA

2) habitat OR ecology OR marine OR ecosystem OR biology OR cell OR organism OR species OR photosynthesis OR pollination OR gene OR genetic OR genome AND NOT (treatment OR generic OR prognosis OR placebo OR diagnosis OR FDA OR medical OR medicine OR medication OR medications OR medicines OR medicated OR medicinal OR physician)

3) biology OR plant OR (phyllis OR phylos OR phylogenetic OR phylogeny OR phyllo OR phylis OR phylloxera) OR animal OR (cell OR cells OR celled OR cellomics OR celltiter) OR (circulation OR circulatory) OR (neural OR neuron OR neurotransmitter OR neurotransmitters OR neurological OR neurons OR neurotoxic OR neurobiology OR neuromuscular OR neuroscience OR neurotransmission OR neuropathy OR neurologically OR neuroanatomy OR neuroimaging OR neuronal OR neurosciences OR neuroendocrine OR neurofeedback OR neuroscientist OR neuroscientists OR neurobiologist OR neurochemical OR neuromorphic OR neurohormones OR neuroscientific OR neurovascular OR neurohormonal OR neurotechnology OR neurobiologists OR neurogenetics OR neuropeptide OR neuroreceptors) OR enzyme OR blood OR nerve OR brain OR kidney OR (muscle OR muscles) OR dna OR rna OR species OR mitochondria

4a) statistically AND ((laboratory AND test) OR species OR (genetic AND marker) OR enzyme) AND NOT (diagnosis OR treatment OR prognosis)

4b)  (species OR (genetic AND marker) OR enzyme) AND NOT (diagnosis OR treatment OR prognosis)

5a) federal AND (ruling OR judge OR justice OR (appellate OR appellant))

5b) ruling OR judge OR justice OR (appellate OR appellant)

6) amendments OR FRE OR whistleblower

7) ((law OR laws OR lawyer OR lawyers OR lawsuit OR lawsuits OR lawyering) OR (regulation OR regulations) OR (statute OR statutes) OR (standards)) AND NOT pending

TAR beat keyword search across the board for both tasks.  The top 3,000 documents returned by TAR achieved higher recall than the top 6,000 documents for any keyword search.  In other words, if documents will be reviewed before production, TAR achieves better results (higher recall) with half as much document review compared to any of the keyword searches.  The top 6,000 documents returned by TAR achieved higher recall than all of the documents matching any individual keyword search, even when the keyword search returned 27,000 documents.

Highlights from Ipro Innovations 2018

The 17th annual Ipro Innovations conference was held at the Talking Stick Resort.ipro2018_outside  It was well-organized with two and a half days of informative talks and fun activities.  Early in the day everyone met in a large hall for the talks, whereas there were seven simultaneous breakout sessions later in the day.  There were many sessions in computer labs where attendees could gain first-hand experience with the Ipro software.  I could only attend the tail end of the conference because I was at the NorCal eDiscovery & IG Retreat earlier in the week.  I’ve included my notes below.  You can find my full set of photos here.ipro2018_computers

The keynote on the final day was delivered by Afterburner, a consulting firm promoting a “Flawless Execution” methodology based on military strategy.  Their six steps of mission planning are: 1) determine the mission objective, 2) identify the threats, 3) identify your available and required resources, 4) evaluate lessons learned, 5) develop a course of action, and 6) plan for contingencies.   The audience participated in exercises to illustrate how easily attention can be channelized, meaning that you focus on one thing at the expense of everything else.  Channelized attention was the cause of a commercial airliner crash.  To avoid being distracted by minor things (deadlines, cost, etc.), keep track of what it is most important to pay attention to (customers).ipro2018_tiana

Tiana Van Dyk described her firm’s 1.5 year transition from Summation to Ipro’s Eclipse, including moving 325 cases over.  Substantial time and preparation are needed to avoid problems and overcome resistance to change.  Staff should not be allowed to access the new system without undergoing training.  Case studies are useful to convince people to use new analytics tools.  Start small with new analytics tools (email threading and near-dupe), then use clustering to remove some junk (football and LinkedIn emails),ipro2018_fun and finally TAR.  Use sampling to demonstrate that things are working.  Learn everything you can about the technology you have.  Missteps can set you back terribly, causing bad rumors and fear.  Continuous communication is important to minimize panic when there is a problem.

There were also talks on new functionality in the Ipro software.  I gave a short presentation on howipro2018_pool Ipro’s transition to the Clustify engine would improve TAR.  There were several opportunities for Ipro customers to give feedback about the functionality they would like to see.

Highlights from the NorCal eDiscovery & IG Retreat 2018

The 2018 NorCal eDiscovery & IG Retreat was held at the Carmel Valley Ranch,norcal2018_valley location of the first Ing3nious retreat in 2011 (though the company wasn’t called Ing3nious at the time).  It was a full day of talks with a parallel set of talks on Cybersecurity, Privacy, and Data Protection in the adjacent room.  Attendees could attend talks from either track.  Below are my notes (certainly not exhaustive) from the eDiscovery and IG sessions.  My full set of photos is available here.

Digging Into TAR
I moderated this panel, so I didn’t take notes.  We challenged the audience to create a keyword search that would work better than TAR.  Results are posted here.

Information Governance In The Age Of Encryption And Ephemeral Communications
Facebook messenger has an ephemeral mode, though it is currently only available to Facebook executives.  You can be forced to decrypt data (despite the 5th Amendment) if it can be proven that you have the password.  Ephemeral communication is a replacement for in-person communication, but it can look bad (like you have something to hide).  53% of email is read on mobile devices, but personal devices often aren’t collected.  Slack is useful for passing institutional knowledge along to new employees, but general counsel wants things deleted after 30 days.  Some ephemeral communication tools have archiving options.  You may want to record some conversations in email–you may need them as evidence in the future.  Are there unencrypted copies of encrypted data in some locations?norcal2018_intro

Blowing The Whistle
eDiscovery can be used as a weapon to drive up costs for an adversary.  The plaintiff should be skeptical about about claims of burden–has appropriate culling been performed? Do a meet and confer as early as possible.  Examine data for a few custodians and see if more are needed. A data dump is when a lot of non-relevant docs are produced (e.g., due to a broad search or a search that matches an email signature).  Do sampling to test search terms.  Be explicit about what production formatting you want (e.g., searchable PDF, color, meta data).

Emerging Technology And The Impact On eDiscovery
There may be a lack of policy for new data sources.  Text messages and social media are becoming relevant for more cases.  Your Facebook info can be accessed through your friends.  Fitbit may show whether the person could have committed the murder. IP addresses can reveal whether email was sent from home or work. The change to the Twitter character limit may break some collection tools–QC early on to detect such problems.  Vendors should have multiple tools.  Communicate about what tech is involved and what you need to collect.norcal2018_lunch

Technology Solution Update From Corporate, Law Firm And Service Provider Perspective
Cloud computing (infrastructure, storage, productivity, and web apps) will cause conflict between EU privacy law and US discovery.  AWS provides lots of security options, but it can be difficult to get right (must be configured correctly).  Startups aim to build fast and don’t think enough about how to get the data out.  Are law firm clients looking at cloud agreements and how to export data?  Free services (Facebook, Gmail, etc.) spy on users, which makes them inappropriate for corporate use where privacy is needed.  Slack output is one long conversation.  What about tools that provide a visualization?  You may need the data, not just a screenshot.  Understand the limit of repositories–Office 365 limits to 10GB of PST at a time.  What about versioning storage?  It is becoming more common as storage prices decline.  Do you need to collect all versions of a document?  “Computer ate my homework” excuses don’t fare well in court (e.g., production of privileged docs due to a bad mouse click, or missing docs matching a keyword search because they weren’t OCRed).  GDPR requires knowing where the users are (not where the data is stored).  Employees don’t want their private phones collected, so sandbox work stuff.

Employing Intelligence – Both Human And Artificial (AI) – To Reduce Overall eDiscovery Costs
You need to talk to custodians–the org chart doesn’t really tell you what you need to know.  Search can show who communicates with whom about a topic.  To discover that a custodian is involved that is not known to the attorney, look at the data and interview the ground troops.  Look for a period when there is a lack of communication.  Use sentiment analysis (including emojis).  Watch for strange bytes in the review tool–they may be emojis that can only be viewed in the original app.  Automate legal holds as much as possible.  Escalate to a manager if the employee doesn’t respond to the hold in a timely manner.  Filter on meta data to reduce the amount that goes into the load file.  Sometimes things go wrong with the software (trained on biased data, not finding relevant spreadsheets, etc.).  QC to ensure the human element doesn’t fail.  Use phonetic search on audio files instead of transcribing before search.  Analyze data as it comes in–you may spot months of missing email.  Do proof of concept when selecting tools.norcal2018_pool

Practical Discussion: eDiscovery Process With Law Firms, In-House And Vendor
Stick with a single vendor so you know it is done the same way every time.  Figure out what your data sources are.  Get social media data into the review platform in a usable form (e.g., Skype).  Finding the existence of cloud data stores requires effort.  How long is the cloud data being held (Twitter only holds the last 100 direct messages)?  The company needs to provide the needed apps so employees aren’t tempted to go outside to get what they need.

Highlights from DESI VII / ICAIL 2017

DESI (Discovery of Electronically Stored Information) is a one-day workshop within ICAIL (International Conference on Artificial Intelligence and Law), which is held every other year.  The conference was held in London last month.  Rumor has it that the next ICAIL will be in North America, perhaps Montreal.

I’m not going to go into the DESI talks based on papers and slides that are posteddesi_vii_lecture on the DESI VII website since you can read that content directly.  The workshop opened with a keynote by Maura Grossman and Gordon Cormack where they talked about the history of TREC tracks that are relevant to e-discovery (Spam, Legal, and Total Recall), the limitation on the recall that can be achieved due to ambiguous relevance (reviewer disagreement) for some documents, and the need for high recall when it comes to identifying privileged documents or documents where privacy must be protected.  When looking for privileged documents it is important to note that many tools don’t make use of metadata.  Documents that are missed may be technically relevant but not really important — you should look at a sample to see whether they are important.

Between presentations based on submitted papers there was a lunch where people separated into four groups to discuss specific topics.  The first group focused on e-discovery users.  Visualizations were deemed “nice to look at” but not always useful — does the visualization help you to answer a question faster?  Another group talked about how to improve e-discovery, including attorney aversion to algorithms and whether a substantial number of documents could be missed by CAL after the gain curve had plateaued.  Another group discussed dreams about future technologies, like better case assessment and redacting video.  The fourth group talked about GDPR and speculated that the UK would obey GDPR.desi_vii_buckingham_palace

DESI ended with a panel discussion about future directions for e-discovery.  It was suggested that a government or consumer group should evaluate TAR systems.  Apparently, NIST doesn’t want to do it because it is too political.  One person pointed out that consumers aren’t really demanding it.  It’s not just a matter of optimizing recall and precision — process (quality control and workflow) matters, which makes comparisons hard.  It was claimed that defense attorneys were motivated to lobby against the federal rules encouraging the use of TAR because they don’t want incriminating things to be found.  People working in archiving are more enthusiastic about TAR.

Following DESI (and other workshops conducted in parallel on the first day), ICAIL had three more days of paper presentations followed by another day of workshops.  You can find the schedule is here.  I only attended the first day of non-DESI presentations.  There are two papers from that day that I want to point out.  The first is  Effectiveness Results for Popular e-Discovery Algorithms by Yang, David Grossman, Frieder, and Yurchak.  They compared performance of the CAL (relevance feedback) approach to TAR for several different classification algorithms, feature types, feature weightings,  desi_vii_guardand with/without LSI.  They used several different performance metrics, though they missed the one I think is most relevant for e-discovery (review effort required to achieve an acceptable level of recall).  Still, it is interesting to see such an exhaustive comparison of algorithms used in TAR / predictive coding.  They’ve made their code available here.  The second paper is Scenario Analytics: Analyzing Jury Verdicts to Evaluate Legal Case Outcomes by Conrad and Al-Kofahi.  The authors analyze a large database of jury verdicts in an effort to determine the feasibility of building a system to give strategic litigation advice (e.g., potential award size, trial duration, and suggested claims) based on a data-driven analysis of the case.

Highlights from the Northeast IG Retreat 2017

The 2017 Northeast Information Governance Retreat was held at the Salamander northeast2017_buildingResort & Spa in Middleburg, Virginia.  After round table discussions, the retreat featured two simultaneous sessions throughout the day. My notes below provide some highlights from the sessions I was able to attend.

Enhancing eDiscovery With Next Generation Litigation Management Software
I couldn’t attend this

Legal Tech and AI – Inventing The Futurenortheast2017_keynote
Machines are currently only good a routine tasks.  Interactions with machines should allow humans and machines to do what they do best.  Some areas where AI can aid lawyers: determining how long litigation will take, suggesting cases you should reference, telling how often the opposition has won in the past, determining appropriate prices for fixed fee arrangements, recruiting, or determining which industry on which to focus.  AI promises to help with managing data (e.g., targeted deletion), not just e-discovery.  Facial recognition may replace plane tickets someday.

Zen & The Art Of Multi-Language Discovery: Risks, Review & Translation
I couldn’t attend this

NexLP Demo
The NexLP tool emphasizes feature extraction and use of domain knowledge from external sources to figure out the story behind the data.  It can generate alerts based on changes in employee behavior over time.  Company should have a policy allowing the scanning of emails to detect bad behavior.  It was claimed that using AI on emails is better for privacy than having a human review random emails since it keeps human eyes away from emails that are not relevant.northeast2017_lunch

TAR: What Have We Learned?
I moderated this panel, so I didn’t take notes.

Are Managed Services Manageable?
I couldn’t attend this

Cyber And Data Security For The GC: How To Stay Out Of Headlines And Crosshairs
I couldn’t attend this

The Office Is Out: Preservation And Collection In The Merry Old LandOf Office 365
Enterprise 5 (E5) has advanced analytics from Equivio.  E3 and E1 can do legal hold but don’t have advanced analytics.  There are options available that are not on the website, and there are different builds — people are not all using the same thing.  Search functionality works on limited file types (e.g., Microsoft products).  Email attachments are OK if they are from Microsoft products.  It will not OCR PDFs that lack embedded text.  What about emails attached to emails?  Previously, it only went one layer deep on attachments.  Latest versions say they are “relaxing” that, but it is unclear what that means (how deep?).  User controls sync — are we really searching everything?  Make sure you involve IT, privacy, info governance, etc. if considering transition to 365.  Be aware of data that is already on hold if you migrate to 365.  Start by migrating a small group of people that are not often subject to litigation.  Test each data type after conversion.

How To Make Sense Of Information Governance Rules For Contractors When The Government Itself Can’t?northeast2017_garden
I couldn’t attend this

Judges, The Law And Guidance: Does ‘Reasonableness’ Provide Clarity?
This was primarily about the impact of the new Federal rules of civil procedure.  Clients are finally giving up on putting everything on hold.  Tie document retention to business needs — shouldn’t have to worry about sanctions.  Document everything (e.g., why you chose specific custodians to hold).  Accidentally missing one custodian out of a hundred is now OK.  Some judges acknowledge the new rules but then ignore them.  Boilerplate objections to discovery requests needs to stop — keep notes on why you made each objection.

Beyond The Firewall: Cybersecurity & The Human Factor
I couldn’t attend this

The Theory of Relativity: Is There A Black Hole In Electronic Discovery?northeast2017_social
The good about Relativity: everyone knows it, it has plug-ins, and moving from document to document is fast compared to previous tools.  The bad: TAR 1.0 (federal judiciary prefers CAL).  An audience member expressed concern that as Relativity gets close to having a monopoly we should expect high prices and a lack of innovation.  Relativity One puts kCura in competition with service providers.

The day ended with a wine social.

Highlights from Ipro Innovations 2017

The 16th annual Ipro Innovations conference was held at the Talking Stick Resort.ipro2017_stage  It was a well-organized conference with over 500 attendees, lots of good food and swag, and over two days worth of content.  Sometimes, everyone attended the same presentation in a large hall.  Other times, there were seven simultaneous breakout sessions.  My notes below cover only the small subset of the presentations that I was able to attend.  I visited the Ipro office on the final day.  It’s an impressive, modern office with lots of character.  If you are wondering whether the Ipro people have a sense of humor, you need look no farther than the signs for the restrooms.ipro2017_bathroom

The conference started with a summary of recent changes to the Ipro software line-up, how it enables a much smaller team to manage large projects, and stats on the growing customer base.  They announced that Clustify will soon replace Content Analyst as their analytics engine.  In the first phase, both engines will be available and will be implemented similarly, so the user can choose which one to use.  Later phases will make more of Clustify’s unique functionality available.  They announced an investment by ParkerGale Capital.  Operations will largely remain unchanged, but there may be some acquisitions.  The first evening ended with a party at Top Golf.ipro2017_topgolf

Ari Kaplan gave a presentation entitled “The Opportunity Maker,” where he told numerous entertaining stories about business problems and how to find opportunities.  He explained that doing things that nobody else does can create opportunities.  He contacts strangers from his law school on LinkedIn and asks them to meet for coffee when he travels to their town — many accept because “nobody does that.”  He sends postscards to his clients when traveling, and they actually keep them.  To illustrate the value of putting yourself into the path of opportunity, he described how he got to see the Mets in the World Series.  He mentioned HelpAReporter.com as a way to get exposure for yourself as an expert.

One of the tracks during the breakout sessions was run by The Sedona Conference and offered CLE credits.  One of the TSC presentations was “Understanding the Science & Math Behind TAR” by Maura Grossman.  She covered the basics like TAR 1.0 vs. 2.0, human review achieving roughly 70% recall due to mistakes, and how TAR performs compared to keyword search.  She mentioned that control sets can become stale because the reviewer’s concept of relevance may shift during the review.  People tend to get pickier about relevance as the review progresses, so an estimate of the number of relevant docs taken on a control set at the beginning may be too high.  She also warned that making multiple measurements against the control set can give a biased estimate about when a certain level of performance is achieved (sidenote: this is because people watch for a measure like F1 to cross a threshold to determine training completeness, which is not the best way to use a control set).  She mentioned that she and Cormack have a new paper coming out that compares human review to TAR using better-reviewed data (Tim Kaine’s emails) that addresses some criticisms of their earlier JOLT study.ipro2017_computers

There were also breakout sessions where attendees could use the Ipro software with guidance from the staff in a room full of computers.  I attended a session on ECA/EDA.  One interesting feature that was demonstrated was checking the number of documents matching a keyword search that did not match any of the other searches performed — if the number is large, it may not be a very good search query.

ipro2017_salsaAnother TSC session I attended was by Brady, Grossman, and Shonka on responding to government and internal investigations.  Often (maybe 20% of the time) the government is inquiring because you are a source of information, not the target of the investigation, so it may be unwise to raise suspicion by resisting the request.  There is nothing similar to the Federal Rules of Civil Procedure for investigations.  The scope of an investigation can be much broader than civil discovery.  There is nothing like rule 502 (protecting privilege) for investigations.  The federal government is pretty open to the use of TAR (don’t want to receive a document dump), though the DOJ may want transparency.  There may be questions about how some data types (like text messages) were handled.  State agencies can be more difficult.

Talking Stick ResortThe last session I attended was the analytics roundtable, where Ipro employees asked the audience questions about how they were using the software and solicited suggestions for how it could be improved.  The day ended with the Salsa Challenge (as in food, not dancing) and dinner.  I wasn’t able to attend the presentations on the final day, but the schedule looked interesting.

Highlights from the East Coast eDiscovery & IG Retreat 2015

This was the second year that Ing3nious has held a retreat on the east coast, with other events organized by Chris LaCour held in California going back five years. east_coast_2015_beach The event was held at the Wequassett Resort in Cape Cod.  As always, the event was well-organized and the location was beautiful.  Luckily, the weather was fantastic.  My notes below only capture a small amount of the information presented. There were often two simultaneous sessions, so I couldn’t attend everything.

Keynote: Away with Words: The Myths and Misnomers of Conventional Search Strategies

Thomas Barnett started the keynote by asking the audience to suggest keyword searches to find items discussing the meaning of existence.  He then said that he had in mind “to be, or not to be” and pointed out that it contains only stop words.  He then described unsupervised (clustering) and supervised (predictive coding) machine learning.  He talked about entity extraction, meaning the identification of dates and names of people and companies in a document.  He talked about sentiment analysis and how a person might change their language when they are doing something wrong.  He also pointed out that a product may have different names in different countries, which can make it easy to miss things with keyword search.

Advancing Discovery: What if Lawyers are the Problem?

I couldn’t attend this one.

Turbulent Sea in the Safe Harbor.  Is There a Lifeboat for Transfers of EU Data to the US?

Max Schrems complained to the Irish Data Protections Commissioner 22 times about the Safe Harbor Privacy Principles failing to protect the privacy of E.U. citizens’ data when companies move the data to the U.S..  After Snowden released information on NSA data collection, Schrems complained a 23rd time.  Ultimately, a judge found the Safe Harbor to be invalid.east_coast_2015_seminar

Companies must certify to the Department of Commerce that they will adhere to the Safe Harbor Privacy Principles.  Many e-discovery service providers were pressured to certify so they could bring data to the U.S. for discovery even though e-discovery usage of the data would involve very bad privacy violations.

Some argue that there is no other legal mechanism that could work for bringing data to the U.S. because the U.S. government can pick up everything, so no guarantees above privacy can be made.   The best option would be to get consent from the person, but it must be done in a very clear manner specifying what data and who will see it.  An employer asking an employee for consent would be seen as coercive.  It will be hard to get consent from someone if you are investigating them for criminal activity.

There is really no way to move data from Europe to the U.S. for litigation without violating the law.  Consent would be required not just from the custodian but from everyone in the emails.  Some countries (France, Germany, and Switzerland) have blocking statutes that make taking the data a criminal offense.

Ethics: eDiscovery, Social Media, and the Internet of Things

I couldn’t attend this one.

Understanding the Data Visualization Trend in Legal

I was on this panel, so I didn’t take notes.  I did mention Vischeck, which allows you to see what your graphics would look like to a color-blind person.

Information Governance – How Do You Eat an Elephant?

I couldn’t attend this one.

Email Laws, IG Policies and the “Smoking Gun”

There has been confusion over what should be considered a record.  In the past, emails that were considered to be records were printed and stored.  Now email should be considered to be a record by default.  30-day retention policies are hard to defend.  Keep deleted emails for 60 days and use analytics to identify emails that employees should not have deleted so they can be saved.  Use automated logging to show compliance.

Protecting Enterprise Data Across Partners, Providers and the Planet

I couldn’t attend this one.

Defeating Analysis Paralysis – Strategies and Success Stories for Implementing IG Policies and Using TAR / Data Analytics

Berkeley Research Group finds that most companies are still keeping everything.  The longer data is kept, the less value it has to the company and the more risk it poses (ediscovery cost and privacy issues if there is a breach).  Different departments within the company may want different retention rules.  Breaches cost the company in lawsuits and in reputation.  The E.U. requires breach notification within 24 hours.east_coast_2015_diningroom

Having employees tag documents gives low-quality tags (they aren’t lawyers), but retention based on those tags is good enough to satisfy the court.  Need employees to follow the retention policy, so keep it simple.  Some speculate that insurance providers may end up driving info governance by forcing their clients to do it.

The Coalition of Technology Resources for Lawyers found that 56% of legal departments are reporting that they use analytics.  Clustering can help with investigation and determining search terms.  Look at email domain names (e.g., nytimes.com) to cull.  Note that email journaling keeps everything.  Analytics technology has improved, so if you were disappointed in the past you might want to try it again.

How Automated Digital Discovery is Changing eDiscovery as We Know It

I couldn’t attend this one.

Creating Order Out of Chaos: Framing and Taming Data Discovery Challenges in Expedited Matters

This panel started by walking through a (hypothetical?) investigation of a head of operations who left and joined a competitor in violation of a non-compete agreement that was determined to be unenforceable.  Did he transfer company data to the competitor?

Look for evidence that USB devices were used on the company laptop.  Unfortunately, you can’t tell what was copied onto them.  Look for attempts to hide what was done, such as removal of USB insertion data from the current registry (but failing to remove from the registry snapshot).  Look at the WiFi connection history for connections to the competitor’s network.  It is very important to explain the situation to the forensics person and communicate with him/her frequently about what you each have found in order to develop a picture of what actually happened.

If you hire someone from a competitor and there is suspicion that they took data from their previous employer, ambush them and take all their devices before they have a chance to destroy anything.  This will show the judge that you were not complicit.

When investigating someone who quit on bad terms, look for deals with “special terms” or side letter deals — they may be a sign of fraud.  Be careful about any applicable European laws.  Europe says you can’t move the data to the U.S., but the SEC doesn’t care.  Can you use a review tool in the U.S. with the data in Europe?  Officially, no, but it is less bad than moving the data.  Everyone says you can’t produce the data from Europe, but everyone does.

Make sure your agreements are up to date and are written by the attorney that will litigate them.

Just Patch and Pray?

A study by Verizon found that 90% of breaches are caused by employees.  Info governance can reduce risk.  Keeping everything is risky due to e-discovery, risk of breach, and having to explain loss of old data to customers.east_coast_2015_lighthouse

Email problems include bad passwords, use of the same password on multiple websites so having one hacked can allow access to others, and getting inside the network (emailed malware).  2-factor authentication is recommended.  Don’t send an email to the SEC with BCC to the client or the client might hit reply-all and say something problematic — instead, email only the SEC and forward a copy to the client later.

Mobile technology can create discovery headaches, needs to be managed/updated/wiped remotely, and can easily be lost.  Encrypt, audit, and apply anti-malware.  BYOD should be limited to enterprise-ready devices.  Avoid insecure WiFi.  Control access to enterprise data.  Secure data in transit.  Ensure that devices get updated/upgraded.

Unaware or non-compliant employees need training.  When training to spot phishing emails, services can test the employees by sending phishing emails that report who clicked on them.

Vendors and third parties that handle enterprise data can be a problem.  Regulators require vendor oversight.  Limit access to necessary systems.  Segregate sensitive data.  Beware of payroll vendors and the possibility of identity theft from the data they hold.  Make sure cybersecurity insurance policy covers vendors.

Employees want data access from anywhere.  Encrypting email is hard — better to use collaborative workspaces.  Home networks should be protected.  Don’t use the neighbor’s Internet connection.

After having a breach, 39% of companies still don’t form a response plan.  There is no federal data breach notification law, but many states have such laws.  You may need to notify employees, customers, and the attorney general in some specific time frame.  Also notify your insurance company.

Mergers & Acquisitions: Strategy and Execution Concerns

I couldn’t attend this one.