Tag Archives: information governance

Highlights from IG3 West 2019

IG3 West was held at the Pelican Hill Resort in Newport Coast, California.  It consisted of one day of product demos followed by one day of talks.  The talks were divided into two simultaneous sessions throughout the day, so I could only attend half of them.  My notes below provide some highlights from the talks I attended.  You can find my full set of photos here.ig3west_2019_pool

Technology Solution Update from Corporate, Law Firm and Service Provider Perspective
How do we get the data out of the free version of Slack?  It is hard to get the data out of Office 365.  Employees are bringing in technologies such as Slack without going through the normal decision making process.  IT and legal don’t talk to each other enough.  When doing a pilot of legal hold software, don’t use it on a custodian that is on actual hold because something might go wrong.  Remember that others know much less than you, so explain things as if you were talking to a third grader.   Old infrastructure is a big problem.  Many systems haven’t really been tested since Y2K.  Business continuity should be a top priority.ig3west_2019_panel

Staying on Pointe: The Striking Similarities Between Ballet and eDiscovery
I wasn’t able to attend this one.

Specialized eDiscovery: Rethinking the Notion of Relevancy
Does traditional ediscovery still work?  The traditional ways of communicating and creating data are shrinking.  WeChat and WhatsApp are now popular.  Prepare the client for litigation by helping the client find all sources of data and format the exotic data.  Requesting party may want native format (instead of PDF) to get the meta data, but keep in mind that you may have to pay for software to examine data that is in exotic formats.  Slack meta data is probably useless (there is no tool to analyze it).  Be careful about Ring doorbells and home security systems recording audio (e.g., recording a contractor working in your home) — recording audio is illegal in some areas if you haven’t provided notification to the person being recorded.  Chat, voice, and video are known problems.  Emoji’s with skins and legacy data are less-known problems.  Before you end up in litigation, make sure IT people are trained on where data is and how to produce it.  If you are going to delete data (e.g., to reduce risk of high ediscovery costs in the future), make sure you are consistent about it (e.g., delete all emails after 3 months unless they are on hold).  Haphazard deletion is going to raise questions.  Even if you are consistent about deletion, you may still encounter a judge who questions why you didn’t just save everything because doing so is easier.  Currently, people don’t often go after text messages, but it depends on the situation.  Some people only text (no emails).  Oddest sources of data seen: a Venmo comment field indicating why a payment was made, and chat from an online game.

SaaS or Vendor – An eDiscovery Conversation
I wasn’t able to attend this one.

Ick, Math!  Ensuring Production Quality
I moderated this panel, so I didn’t take notes.  You can find the slides here.

Still Looking for the Data
I wasn’t able to attend this one.

Data Breach: Incident Response Notification
I wasn’t able to attend this one.

“Small” Data in the Era of “Big” Data
Data minimization reduces the data that can be misused or leaked by deleting it or moving it to more secure storage when it is no longer needed.  People need quick access to the insights from the data, not the raw data itself.  Most people no longer see storage cost as a driver for data minimization, though some do (can be annoying to add storage when maintaining your own secure infrastructure).  A survey by CTRL found that most people say IT should be responsible for the data minimization program.  Legal/compliance should have a role, too.  When a hacker gets into your system, he/she is there for over 200 days on average — lots of time to learn about your data.  Structured data is usually well managed/mapped (85%), but unstructured is not (15%).  Ephemeral technology solves the deletion problem by never storing the data.  Social engineering is one of the biggest ways that data gets out.ig3west_2019_reception

Mobile Device Forensics 2020: An FAQ Session Regarding eDiscovery and Data Privacy Considerations for the Coming Year
It is now possible that visiting the wrong website with your phone can result in it being jailbroken and having malware installed.  iOS sync can spread your data to other devices, so you may have text messages on your computer.  A woman found out about her husband’s affair from his FitBit by noticing his heart rate increased at 4:30am.  Time of death can also be found from a FitBit by when the heart stopped.  No increase in heart rate before a murder sugggests the victim knew the murderer.  Wage and hour litigation uses location tracking.  Collecting app data from a phone may not give you everything you want since the app may store some data on the server.  Collection software may only handle certain versions of an app.  Use two collection tools and see if the results match.  Someone had 1.3 million WeChat chats on one phone.  iTunes is going away — you will be forced to use iCloud instead.  iTunes backup gives more data than iCloud (e.g., deleted messages).  Some of the email might be on the phone, while some might be on the server.  Who owns the data in the cloud?  Jailbreaking is possible again, which gives real access to the data.  When there is a litigation hold and you have the device, use a forensic tool on it.  When you don’t have the device, use the backups.  Backups may be incomplete — the users chooses what to back up (e.g., may not back up photos). If malware gets onto the device, how do you know if the user really sent the text message?  Text message slang the kids use: “kk” = okay (kk instead of k because k will auto-correct to I), and “k.” = whatever (angry).  The chat in Clash of Clans and other games has been used by ISIS and criminals to communicate.  Google’s Project Zero found that China was using an iOS bug to attack people from a particular religious group.

The Human Mind in the Age of Intelligent Machines
I wasn’t able to attend this one.

 

Highlights from IG3 Mid-Atlantic 2019

The first Mid-Atlantic IG3 was held at the Watergate Hotel in Washington, D.C.. It was a day and a half long with a keynote followed by two concurrent sets of sessions.  I’ve provided some notes below from the sessions I was able to attend.  You can find my full set of photos here.ig3east2019_hotel

Big Foot, Aliens, or a Culture of Governance: Are Any of Them Real?
In 2012 12% of companies had a chief data officer, but now 63.4% do.  Better data management can give insight into the business.  It may also be possible to monetize the data.  Cigna has used Watson, but you do have to put work into teaching it.  Remember the days before GPS, when you had to keep driving directions in your head or use printed maps.  Data is now more available.

Practical Applications of AI and Analytics: Gain Insights to Augment Your Review or End It Early
Opposing counsel may not even agree to threading, so getting approval for AI can be a problem.  If the requesting party is the government, they want everything and they don’t care about the cost to you.  TAR 2.0 allows you to jump into review right away with no delay for training by an expert, and it is becoming much more common.  TAR 1.0 is still used for second requests [presumably to produce documents without review].  With TAR 1.0 you know how much review you’ll have to do if you are going to review the docs that will potentially be produced, whereas you don’t with TAR 2.0 [though you could get a rough estimate with additional sampling].  Employees may utilize code words, and some people such as traders use unique lingo — will this cause problems for TAR?  It is useful to use unsupervised learning (clustering) to identify issues and keywords.  Negotiation over TAR use can sometimes be more work than doing the review without TAR.  It is hard to know the size of the benefit that TAR will provide for a project in advance, which can make it hard to convince people to use it.  Do you have to disclose the use of TAR to the other side?  If you are using it to cull, rather than just to prioritize the review, probably.  Courts will soon require or encourage the use of TAR.  There is a proportionality argument that it is unreasonable to not use it.  Data volumes are skyrocketing.  90% of the data in the world was created in the last 2 years.ig3east2019_talk

Is There Room for Governance in Digital Transformation?
I wasn’t able to attend this one.

Investigative Analytics and Machine Learning; The Right Mindset, Tools, and Approach can Make all the Difference
You can use e-discovery AI tools to get the investigation going.  Some people still use paper, and the meta data from the label on the box containing the documents may be all you have.  While keyword search may not be very effective, the query may be a starting point for communicating what the person is looking for so you can figure out how to find it.  Use clustering to look for outliers.  Pushing people to use tech just makes them hate you.  Teach them in a way that is relatable.  Listen to the people that are trying to learn and see what they need.  Admit that tech doesn’t always work.  Don’t start filtering the data down too early — you need to understand it first.  It is important to be able to predict things such as cost.  Figure out which people to look at first (tiering).  Convince people to try analytics by pointing out how it can save time so they can spend more time with their kids.  Tech vendors need to be honest about what their products can do (users need to be skeptical).

CCPA and New US Privacy Laws Readiness
I wasn’t able to attend this one.

Ick, Math! Ensuring Production Quality
I moderated this panel, so I didn’t take notes.

Effective Data Mapping Policies and Avoiding Pitfalls in GDPR and Data Transfers for Cross-Border Litigations and Investigations
I wasn’t able to attend this one.

Technology Solution Update From Corporate, Law Firm and Service Provider Perspective
I wasn’t able to attend this one.

Selecting eDiscovery Platforms and Vendors
People often pick services offered by their friends rather than doing an unbiased analysis.  Often do an RFI, then RFP, then POC to see what you really get out of the system.  Does the vendor have experience in your industry?  What is billable vs non-billable?  Are you paying for peer QC?  What does data in/out mean for billing?  Do a test run with the vendor before making any decisions for the long term.  Some vendors charge by the user, instead of, or in addition to, charging based on data volume.  What does “unlimited” really mean?  Government agencies tend to demand a particular way of pricing, and projects are usually 3-5 years.  Charging a lot for a large number of users working on a small database really annoys the customer.  Per-user fees are really a Relativity thing, and other platforms should not attempt it.  Firms will bring data in house to avoid user fees unless the data is too big (e.g., 10GB).  How do dupes impact billing?  Are they charging to extract a dupe?  Concurrent user licenses were annoying, so many moved to named user licenses (typically 4 or 5 to one).  Concurrent licenses may have a burst option to address surges in usage, perhaps setting to the new level.  Some people use TAR on all cases while others in the firm/company never use it, so keep that in mind when licensing it.  Forcing people to use an unfamiliar platform to save money can be a mistake since there may be a lot of effort required to learn it.

eDiscovery Support and Pricing Model — Do we have it all Wrong?
Various pricing models: data in/out + hosting + reviewers, based on number of custodians, or bulk rate (flat monthly fee).  Redaction, foreign language, and privilege logs used to be separate charges, but there is now pressure to include them in the base fee.  Some make processing free but compensate by raising the rate for review.  RFP / procurement is a terrible approach for ediscovery because you work with and need to like the vendor/team.  Ask others about their experience with the vendor, though there is now less variability in quality between the vendors.  Encourage the vendor to make suggestions and not just be an order-taker.  Law firms often blame the vendor when a privileged document is produced, and the lack of transparency about what really happened is frustrating.  The client needs good communication with both the law firm and the vendor.  Law firms shouldn’t offer ediscovery services unless they can do it as well as the vendors (law firms have a fiduciary duty).  ig3east2019_memorial

Still Looking for the Data
I wasn’t able to attend this one.

Recycling Your eDiscovery Data: How Managing Data Across Your Portfolio can Help to Reduce Wasteful Spending
I wasn’t able to attend this one.

Ready, Fire, Aim!  Negotiating Discovery Protocols
The Mandatory Initial Discovery Pilot Program in the Northern District of Illinois and Arizona requires production within 70 days from filing in order to motivate both sides to get going and cooperate.  One complaint about this is that people want a motion to dismiss to be heard before getting into ediscovery.  Can’t get away with saying “give us everything” under the pilot program since there is not enough time for that to be possible.  Nobody wants to be the unreasonable party under such a tight deadline.  The Commercial Division of the NY Supreme Court encourages categorical privilege logs.  You describe the category, say why it is privileged, and specify how many documents were redacted vs being withheld in their entirety.  Make a list of third parties that received the privileged documents (not a full list of all from/to).  It can be a pain to come up with a set of categories when there is a huge number of documents.  When it comes to TAR protocols, one might disclose the tool used or whether only the inclusive email was produced.  Should the seed set size or elusion set size be disclosed?  Why is the producing party disclosing any of this instead of just claiming that their only responsibility is to produce the documents?  Disclosing may reduce the risk of having a fight over sufficiency.  Government regulators will just tell you to give them everything exactly the way they want it.  When responding to a criminal antitrust investigation you can get in trouble if you standardize the timezone in the data.  Don’t do threading without consent.  A second request may require you to provide a list of all keywords in the collection and their frequencies.  Be careful about orders requiring you to produce the full family — this will compel you to produce non-responsive attachments.

Document Review Pricing Reset
A common approach is hourly pricing for everything (except hosting).  This may be attractive to the customer because other approaches require the vendor to take on risk that the labor will be more than expected and they will build that into the price.  If the customer doesn’t need predictable cost, they won’t want to pay (implicitly) for insurance against a cost overrun.  It is a choice between predictability of cost and lowest cost.  Occasionally review is priced on a per-document basis, but it is hard to estimate what the fair price is since data can vary.  Per-document pricing puts some pressure on the review team to better manage the process for efficiency.  Some clients are asking for a fixed price to handle everything for the next three years. ig3east2019_reflecting_pool A hybrid model has a fixed monthly fee with a lower hourly rate for review, with the lower hourly review making paying for extra QC review less painful.  Using separate vendors and review companies can have a downside if reviewers sit idle while the tech is not ready.  On the other hand, if there are problems with the reviewers it is nice to have the option to swap them out for another review team.

Finding Common Ground: Legal & IT Working Together
I wasn’t able to attend this one.

Highlights from IG3 West 2018

The IG3 West conference was held by Ing3nious at the Paséa Hotel & Spa in Huntington Beach, California. ig3west2018_hotel This conference differed from other recent Ing3nious events in several ways.  It was two days of presentations instead of one.  There were three simultaneous panels instead of two.  Between panels there were sometimes three simultaneous vendor technology demos.  There was an exhibit hall with over forty vendor tables.  Due to the different format, I was only able to attend about a third of the presentations.  My notes are below.  You can find my full set of photos here.

Stop Chasing Horses, Start Building Fences: How Real-Time Technologies Change the Game of Compliance and Governance
Chris Surdak, the author of Jerk:  Twelve Steps to Rule the World, talked about changing technology and the value of information, claiming that information is the new wealth.  Facebook, Amazon, Apple, Netflix, and Google together are worth more than France [apparently he means the sum of their market capitalizations  is greater than the GDP of France, though that is a rather apples-to-oranges comparison since GDP is an annualized number].  We are exposed to persistent ambient surveillance (Alexa, Siri, Progressive Snapshot, etc.).  It is possible to detect whether someone is lying by using video to detect blood flow to their face.  Car companies monetized data about passengers’ weight (measured due to air bags). ig3west2018_keynote Sentiment analysis has a hard time with sarcasm.  You can’t find emails about fraud by searching for “fraud” — discussions about fraudulent activity may be disguised as weirdly specific conversations about lunch.  The problem with graph analysis is that a large volume of talk about something doesn’t mean that it’s important.  The most important thing may be what’s missing.  When RadioShack went bankrupt, its remaining value was in its customer data — remember them asking for your contact info when you bought batteries?  A one-word change to FRCP 37(e) should have changed corporate retention policies, but nobody changed.  The EU’s right to be forgotten is virtually impossible to implement in reality (how to deal with backup tapes?) and almost nobody does it.  Campbell’s has people shipping their DNA to them so they can make diet recommendations to them.  With the GDPR, consent nullifies the protections, so it doesn’t really protect your privacy.

AI and the Corporate Law Department of the Future
Gartner says AI is at the peak of inflated expectations and a trough of disillusionment will follow.  Expect to be able to buy autonomous vehicles by 2023.  The economic downturn of 2008 caused law firms to start using metrics.  Legal will take a long time to adopt AI — managing partners still have assistants print stuff out.  Embracing AI puts a firm ahead of its competitors.  Ethical obligations are also an impediment to adoption of technology, since lawyers are concerned about understanding the result.

Advanced TAR Considerations: A 500 Level Crash Course
Continuous Active Learning (CAL), also called TAR 2.0, can adapt to shifts in the concept of relevance that may occur during the review.  There doesn’t seem to be much difference in the efficiency of SVM vs logistic regression when they are applied to the same task.  There can be a big efficiency difference between different tasks.  TAR 1.0 requires a subject-matter expert for training, but senior attorneys are not always readily available.  With TAR 1.0 you may be concerned that you will be required to disclose the training set (including non-responsive documents), but with TAR 2.0 there is case law that supports that being unnecessary [I’ve seen the argument that the production itself is the training set, but that neglects the non-responsive documents that were reviewed (and used for training) but not produced.  On the other hand, if you are taking about disclosing just the seed set that was used to start the process, that can be a single document and it has very little impact on the result.].  Case law can be found at predictivecoding.com, which is updated at the end of each year.  TAR needs text, not image data.  Sometimes keywords are good enough.  When it comes to government investigations, many agencies (FTC, DOJ) use/accept TAR.  It really depends on the individual investigator, though, and you can’t fight their decision (the investigator is the judge).  Don’t use TAR for government investigations without disclosing that you are doing so.  TAR can have trouble if there are documents having high conceptual similarity where some are relevant and some aren’t.  Should you tell opposing counsel that you’re using TAR?  Usually, but it depends on the situation.  When the situation is symmetrical, both sides tend to be reasonable.  When it is asymmetrical, the side with very little data may try to make things expensive for the other side, so say something like “both sides may use advanced technology to produce documents” and don’t give more detail than that (e.g., how TAR will be trained, who will do the training, etc.) or you may invite problems.  Disclosing the use of TAR up front and getting agreement may avoid problems later.  Be careful about “untrainable documents” (documents containing too little text) — separate them out, and maybe use meta data or file type to help analyze them.  Elusion testing can be used to make sure too many relevant documents weren’t missed.  One panelist said 384 documents could be sampled from the elusion set, though that may sometimes not be enough.  [I have to eat some crow here.  I raised my hand and pointed out that the margin of error for the elusion has to be divided by the prevalence to get the margin of error for the recall, which is correct.  I went on to say that with a sample of 384 giving ±5% for the elusion you would have ±50% for the recall if prevalence was 10%, making the measurement worthless.  The mistake is that while a sample of 384 technically implies a worst case of ±5% for the margin of error for elusion, it’s not realistic for the margin of error to be that bad for elusion because ±5% would occur if elusion was near 50%, but elusion is typically very small (smaller than the prevalence), causing the margin of error for the elusion to be significantly less than ±5%.  The correct margin of error for the recall from an elusion sample of 384 documents would be ±13% if the prevalence is 10%, and ±40% if the prevalence is 1%.  So, if prevalence is around 10% an elusion sample of 384 isn’t completely worthless (though it is much worse than the ±5% we usually aim for), but if prevalence is much lower than that it would be].

40 Years in 30 Minutes: The Background to Some of the Interesting Issues we Face
Steven Brower talked about the early days of the Internet and the current state of technology. ig3west2018_reception1 Early on, a user ID was used to tell who you were, not to keep you out.  Technology was elitist, and user-friendly was not a goal.  Now, so much is locked down for security reasons that things become unusable.  Law firms that prohibit access to social media force lawyers onto “secret” computers when a client needs something taken down from YouTube.  Emails about laws against certain things can be blocked due to keyword hits for the illegal things being described.  We don’t have real AI yet.  The next generation beyond predictive coding will be able to identify the 50 key documents for the case.  During e-discovery, try searching for obscenities to find things like: “I don’t give a f*** what the contract says.”  Autonomous vehicles won’t come as soon as people are predicting.  Snow is a problem for them.  We may get vehicles that drive autonomously from one parking lot to another, so the route is well known.  When there are a bunch of inebriated people in the car, who should it take commands from?  GDPR is silly since email bounces from computer to computer around the world.  The Starwood breach does not mean you need to get a new passport — your passport number was already out there.  To improve your security, don’t try to educate everyone about cybersecurity — you can eliminate half the risk by getting payroll to stop responding to emails asking for W2 data that appear to come from the CEO.  Scammers use the W2 data to file tax returns to get the refunds.  This is so common the IRS won’t even accept reports on it anymore.  You will still get your refund if it happens to you, but it’s a hassle.

Digging Into TAR
I moderated this panel, so I didn’t take notes.  We did the TAR vs. Keyword Search Challenge again.  The results are available here.

After the Incident: Investigating and Responding to a Data Breach
Plan in advance, and remember that you may not have access to the laptop containing the plan when there is a breach. Get a PR firm that handles crises in advance.  You need to be ready for the negative comments on Twitter and Facebook.  Have the right SMEs for the incident on the team.  Assume that everything is discoverable — attorney-client privilege won’t save you if you ask the attorney for business (rather than legal) advice.  Notification laws vary from state to state.  An investigation by law enforcement may require not notifying the public for some period of time.  You should do an annual review of your cyber insurance since things are changing rapidly.  Such policies are industry specific.

Employing Technology/Next-Gen Tools to Reduce eDiscovery Spend
Have a process, but also think about what you are doing and the specifics of the case.  Restrict the date range if possible.  Reuse the results when you have overlapping cases (e.g., privilege review).  Don’t just look at docs/hour when monitoring the review.  Look at accuracy and get feedback about what they are finding.  CAL tends to result in doing too much document review (want to stop at 75% recall but end up hitting 89%).  Using a tool to do redactions will give false positives, so you need manual QC of the result.  When replacing a patient ID with a consistent anonymized identifier, you can’t just transform the ID because that could be inverted, resulting in a HIPAA violation.

eDiscovery for the Rest of us
What are ediscovery considerations for relatively small data sets?  During meet and confer, try to cooperate.  Judges hate ediscovery disputes.  Let the paralegals hash out the details — attorneys don’t really care about the details as long as it works.  Remote collection can avoid travel costs and hourly fees while keeping strangers out of the client’s office.  The biggest thing they look for from vendors is cost.  Need a certain volume of data for TAR to be practical.  Email threading can be used at any size.

Does Compliance Stifle or Spark Innovation?
Startups tend to be full of people fleeing big corporations to get away from compliance requirements. ig3west2018_reception2 If you do compliance well, that can be an advantage over competitors.  Look at it as protecting the longevity of the business (protecting reputation, etc.).  At the DoD, compliance stifles innovation, but it creates a barrier against bad guys.  They have thousands of attacks per day and are about 8 years behind normal innovation.  Gray crimes are a area for innovation — examples include manipulation (influencing elections) and tanking a stock IPO by faking a poisoning.  Hospitals and law firms tend to pay, so they are prime targets for ransomware.

Panels That I Couldn’t Attend:
California and EU Privacy Compliance
What it all Comes Down to – Enterprise Cybersecurity Governance
Selecting eDiscovery Platforms and Vendors
Defensible Disposition of Data
Biometrics and the Evolving Legal Landscape
Storytelling in the Age of eDiscovery
Technology Solution Update From Corporate, Law Firm and Service Provider Perspective
The Internet of Things and Everything as a Service – the Convergence of Security, Privacy and Product Liability
Similarities and Differences Between the GDPR and the New California Consumer Privacy Act – Similar Enough?
The Impact of the Internet of Things on eDiscovery
Escalating Cyber Risk From the IT Department to the Boardroom
So you Weren’t Quite Ready for GDPR?
Security vs. Compliance and Why Legal Frameworks Fall Short to Improve Information Security
How to Clean up Files for Governance and GDPR
Deception, Active Defense and Offensive Security…How to Fight Back Without Breaking the Law?
Information Governance – Separating the “Junk” from the “Jewels”
What are Big Law Firms Saying About Their LegalTech Adoption Opportunities and Challenges?
Cyber and Data Security for the GC: How to Stay out of Headlines and Crosshairs

Highlights from the Northeast eDiscovery & IG Retreat 2018

The 2018 Northeast eDiscovery and Information Governance Retreat was northeast_2018_building1held at the Salamander Resort & Spa in Middleburg, Virginia.  It was a full day of talks with a parallel set of talks on Cybersecurity, Privacy, and Data Protection in the adjacent room. Attendees could attend talks from either track. Below are my notes (certainly not exhaustive) from the eDiscovery and IG sessions. My full set of photos is available here.

Stratagies For Data Minimization Of Legacy Data
Backup and archiving should be viewed as separate functions.  When it comes to spoliation (FRCP Rule 37), reasonableness of the company’s data retention plan is key.  Over preservation is expensive.  There are not many cases on Rule 37 relating to backup tapes.  People are changing their behavior due to the changes in the FRCP, especially in heavily regulated industries such as healthcare and financial services.  Studies find that typically 70% of data has no business value and is not subject to legal hold or retention requirements for compliance.  When using machine learning, you can focus on finding what to keep or what to get rid of.  It is often best to start with unsupervised machine learning.  Be mindful of destructive malware.  To mitigate security risks, it is important to know where your data (including backup tapes) is.  If a backup tape goes missing, do you need to notify customers (privacy)?  To get started, create a matrix showing what you need to keep, keeping in mind legal holds and privacy (GDPR).  Old backup tapes are subject to GDPR.  Does the right to be forgotten apply to backup tapes?  There is currently no answer.  It would be hard to selectively delete data from the tapes, so maybe have a process that deletes during the restore.  There can be conflicts between U.S. ediscovery and GDPR, so you must decide which is the bigger risk.

Preparing A Coordinated Response To Government Inquiries And Investigations
You might find out that you are being investigated by the FBI or other investigator approaching one of your employees — get an attorney. northeast_2018_horses Reach out to the investigator, take it seriously, and ask for a timeline.  You may receive a broad subpoena because the investigator whats to ensure they get everything important, but you can often get them to narrow it.  Be sure to retain outside counsel immediately.  In one case a CEO negotiated search terms with a prosecutor without discussing custodians, so they had to search all employees.  The prosecutor can’t handle a huge volume of data, so it should be possible to negotiate a reasonable production.  In addition to satisfying the subpoena, you need to simultaneously investigate whether there is an ongoing problem that needs to be addressed.  Is your IT group able to forensically preserve and produce the documents?  You don’t want to mess up a production in front of a regulator, so get expertise in place early.  Data privacy can be an issue.  When dealing with operations in Europe, it is helpful to get employee consent in advance — nobody wants to consent during an investigation.  Beware of data residing in disparate systems in different languages.  Google translate is not very good, e.g. you have to be careful about slang.    Employees may try to cover their tracks.  In one case an employee was using “chocolate” as an encoded way to refer to a payment.  In another case an employee took a hammer to a desktop computer, though the hard drive was still recoverable.  Look for gaps in email or anomalous email volume.  Note that employees may use WhatsApp or Signal to communicate.  The DOJ expects you to be systematic (e.g., use analytics) about compliance.  See what data is available, even if it wasn’t subpoenaed, since it may help your side (email usually doesn’t).

Digging Into TAR
I moderated this panel, so I didn’t take notes. We challenged the audience to create a keyword search that would work better than technology-assisted review. Results are posted here.

Implementing Information Governance – Nightmare On Corporate America Street?
You need to weigh the value of the data against the risk of keeping it.  What is your business model?  That will dictate information governance. northeast_2018_reception Domino’s was described as a technology company that happens to distribute hot bread.  Unstructured data has the biggest footprint and the most rapid growth.  Did you follow your policies?  Your insurance company may be very picky about that when looking for a reason not to pay out.  They may pay out and then sue you over the loss.  Fear is a good motivator.  Threats from the OCC or FDIC over internal data management can motivate change.  You can quantify risk because the cost of having a data breach is now known. Info governance is utilization awareness, not just data management.  Know where your data is.  What about the employee that creates an unauthorized AWS account?  This is the “shadow ecosystem” or “shadow IT.”  One company discovered they had 50,000 collaborative SharePoint sites they didn’t know about.  For info governance standards see The Sedona Conference and EDRM.

Technology Solution Update From Corporate, Law Firm And Service Provider Perspective
Artificial intelligence (AI) should not merely analyze; it should present a result in a way that is actionable.  It might tell you how much two people talk, their sentiment, and whether there are any spikes in communication volume.  AI can be used by law firms for budgeting by analyzing prior matters.  There are concerns about privacy with AI.  Many clients are moving to the cloud.  Many are using private clouds for collaboration, not necessarily for utilizing large computing power.  Office 365 is of interest to many companies.  There was extensive discussion about the ediscovery analytics capabilities being added from the Equivio acquisition, and a demo by Marcel Katz of Microsoft.  The predictive coding (TAR) capability uses simple active learning (SAL) rather than continuous active learning (CAL).  It is 20 times slower in the cloud than running Equivio on premises.  There is currently no review tool in Office 365, so you have to export the predictions out and do the review elsewhere.  Mobile devices create additional challenges for ediscovery.  The time when a text message is sent may not match the time when it is received if the receiving device is off when the message is sent.  Technology needs to be able to handle emojis.  There are many different apps with many different data storage formats.

The ‘Team Of Teams’ Approach To Enterprise Security And Threat Management
Fast response is critical when you are attacked.  Response must be automated because a human response is not fast enough.  It can take 200 days to detect an adversary on the network, so assume someone is already inside.  What are the critical assets, and what threats should you look for?  What value does the data have to the attacker?  What is the impact on the business?  What is the impact on the people?  Know what is normal for your systems.  Is a large data transfer at 2:00am normal?  Simulate a phishing attack and see if your employees fall for it.  In one case a CEO was known to be in China for a deal, so someone impersonating the CEO emailed the CFO to send $50 million for the deal.  The money was never recovered.  Have processes in place, like requiring a signature for amounts greater than $10,000.  If a company is doing a lot of acquisitions, it can be hard to know what is on their network.  How should small companies get started?  Change passwords, hire an external auditor, and make use of open source tools.

From Data To GRC Insight
Governance, risk management, and compliance (GRC) needs tonortheast_2018_building2 become centralized and standardized.  Practicing incident response as a team results in better responses when real incidents happen.  Growing data means growing risk.  Beware of storage of social security numbers and credit card numbers.  Use encryption and limit access based on role.  Detect emailing of spreadsheets full of data.  Know what the cost of HIPAA violations is and assign the risk of non-compliance to an individual.  Learn about the NIST Cybersecurity Framework.  Avoid fines and reputational risk, and improve the organization.  Transfer the risk by having data hosted by a company that provides security.  Cloud and mobile can have big security issues.  The company can’t see traffic on mobile devices to monitor for phishing.

 

Highlights from the South Central eDiscovery & IG Retreat 2018

The 2018 South Central eDiscovery and Information Governance Retreat was held at Lakeway Resort and Spa, outside of Austin.  It was a full day of talks with a parallel set of talks on Cybersecurity, Privacy, and Data Protection in the adjacent room.  Attendees could attend talks from either track. Below are my notes (certainly not exhaustive) from the eDiscovery and IG sessions. My full set of photos is available here.southcentral_2018_pool

Blowing The Whistle
eDiscovery can be used as a weapon to drive up costs for an adversary.  Make requests broad and make the other side reveal what they actually have.  Ask for “all communications” rather than “all Office 365 emails” or you may miss something (for example, they may use Slack).  The collection may be 1% responsive.  How can it be culled defensibly?  Ask for broad search terms, get hit rates, and then adjust.  The hit rates don’t tell how many documents were actually relevant, so use sampling.  When searching for patents, search for “123 patent” instead of just “123” to avoid false positives (patent references often use just the last 3 digits).  This rarely happens, but you might get the producing party to disclose top matches for the queries and examine them to give feedback on desired adjustments.  You should have a standard specification for the production format you want, and you should get it to the producing party as soon as possible, or you might get 20,000 emails produced in one large PDF that you’ll have to waste time dissecting, and meta data may be lost.  If keyword search is used during collection, be aware that Office 365 currently doesn’t OCR non-searchable content, so it will be missed.  Demand that the producing party OCR before applying any search terms.  In one production there were a lot of “gibberish” emails returned because the search engine was matching “ING” to all words ending in “ing” rather than requiring the full word to match.  If ediscovery disputes make it to the judge, it’s usually not a good thing since the judge may not be very technical.

Digging Into TAR
I moderated this panel, so I didn’t take notes.  We challenged the audience to create a keyword search that would work better than TAR.  Results are posted here.

Beyond eDiscovery – Creating Context By Connecting Disparate Data
Beyond the custodian, who else had access to this file?  Who should have access, and who shouldn’t?  Forensics can determine who accessed or printed a confidential file.  The Windows registry tracks how users access files.  When you print, an image is stored.  Figure out what else you can do with the tech you have.  For example, use Sharepoint workflows to help with ediscovery.  Predictive coding can be used with structured data.  Favorite quote: “Anyone who says they can solve all of my problems with one tool is a big fat liar.”southcentral_2018_keynote

Improving Review Efficiency By Maximizing The Use Of Clustering Technology
Clustering can lead to more consistent review by ensuring the same person reviews similar documents and reviews them together.  The requesting party can use clustering to get an overview of what they’ve received.  Image clustering identifies glyphs to determine document similarity, so it can detect things like a Nike logo, or it can be sensitive to the location on the page where the text occurs.  It is important to get the noise (e.g., email footers) out of the data before clustering.  Text messages and spreadsheets may cause problems.  Clustering can be used for ECA or keyword generation, where it is not making final determinations for a document.  It can reveal abbreviations scientists are using for technical terms.  It can also be used to identify clusters that can be excluded from review (not relevant).  It can be used to prioritize review, with more promising clusters reviewed first.  Should you tell the other side you are using clustering to come up with keywords?  No, you are just inviting controversy.

Technology Solution Update From Corporate, Law Firm And Service Provider Perspective
Migration to Office 365 and other cloud offerings can cause problems.  Data can be dumped into the cloud without tracking where it went.  Figuring out how to collect from the cloud can be difficult.  Microsoft is always changing Office 365, making it difficult to stay on top of the changes.  Favorite quote: “I’m always running to keep up.  I should be skinnier, but I’m not.”  Office 365 is supposed to have OCR soon.  What if the cloud platform gets hacked?  There can be throttling issues when collecting from One Drive by downloading everything (not using Microsoft’s tool).  Rollout of cloud services should be slow to make sure everyone knows what should be put in the cloud and what shouldn’t, and to ensure that you keep track of where everything is.  Be careful about emailing passwords since they may be recorded — use ephemeral communications instead of email for that.  Personal devices cause problems because custodians don’t like having their devices collected.  Policy is critical, but it is not a cure-all.  Policy must be surrounded by communication and re-certification to ensure it is followed.  Google mail is not a good solution for restricting data location since attachments are copied to the local disk when they are viewed.southcentral_2018_lunch

Achieving GDPR Compliance For Unstructured Content
Some technology was built for GDPR while other tech was build for some other purpose like ediscovery and tweaked for GDPR, so be careful.  For example. you don’t want to have to collect the data before determining whether it contains PII.  The California privacy law taking effect in 2020 is similar to GDPR, so U.S. companies cannot ignore the issue.  Backup tapes should be deleted after 90 days.  They are for emergencies, not retention.  Older backups often don’t work (e.g., referenced network addresses are no longer valid).

Escalating Cyber Risk From The IT Department To The Boardroom
One very effective way to change a company’s culture with respect to security is to break people up into white vs. black teams and hold war games where one team attacks and the other tries to come up with the best way to defend against it.  You need to point out both the risk and how to fix it to get the board’s attention.  Show the board a graph with the expected value lost in a breach on the vertical axis and cost to eliminate the risk on the horizontal axis — points lying above the 45 degsouthcentral_2018_buildingree line are risks that should be eliminated (doing so saves money).  On average, a server breach costs 28% of operating costs.  Investors may eventually care if someone on the board has a security certification.  It is OK to question directors, but don’t call out their b.s..  The Board cares most about what the CEO and CFO are saying.  Ethical problems tend to happen when things are too siloed.

Highlights from the NorCal eDiscovery & IG Retreat 2018

The 2018 NorCal eDiscovery & IG Retreat was held at the Carmel Valley Ranch,norcal2018_valley location of the first Ing3nious retreat in 2011 (though the company wasn’t called Ing3nious at the time).  It was a full day of talks with a parallel set of talks on Cybersecurity, Privacy, and Data Protection in the adjacent room.  Attendees could attend talks from either track.  Below are my notes (certainly not exhaustive) from the eDiscovery and IG sessions.  My full set of photos is available here.

Digging Into TAR
I moderated this panel, so I didn’t take notes.  We challenged the audience to create a keyword search that would work better than TAR.  Results are posted here.

Information Governance In The Age Of Encryption And Ephemeral Communications
Facebook messenger has an ephemeral mode, though it is currently only available to Facebook executives.  You can be forced to decrypt data (despite the 5th Amendment) if it can be proven that you have the password.  Ephemeral communication is a replacement for in-person communication, but it can look bad (like you have something to hide).  53% of email is read on mobile devices, but personal devices often aren’t collected.  Slack is useful for passing institutional knowledge along to new employees, but general counsel wants things deleted after 30 days.  Some ephemeral communication tools have archiving options.  You may want to record some conversations in email–you may need them as evidence in the future.  Are there unencrypted copies of encrypted data in some locations?norcal2018_intro

Blowing The Whistle
eDiscovery can be used as a weapon to drive up costs for an adversary.  The plaintiff should be skeptical about about claims of burden–has appropriate culling been performed? Do a meet and confer as early as possible.  Examine data for a few custodians and see if more are needed. A data dump is when a lot of non-relevant docs are produced (e.g., due to a broad search or a search that matches an email signature).  Do sampling to test search terms.  Be explicit about what production formatting you want (e.g., searchable PDF, color, meta data).

Emerging Technology And The Impact On eDiscovery
There may be a lack of policy for new data sources.  Text messages and social media are becoming relevant for more cases.  Your Facebook info can be accessed through your friends.  Fitbit may show whether the person could have committed the murder. IP addresses can reveal whether email was sent from home or work. The change to the Twitter character limit may break some collection tools–QC early on to detect such problems.  Vendors should have multiple tools.  Communicate about what tech is involved and what you need to collect.norcal2018_lunch

Technology Solution Update From Corporate, Law Firm And Service Provider Perspective
Cloud computing (infrastructure, storage, productivity, and web apps) will cause conflict between EU privacy law and US discovery.  AWS provides lots of security options, but it can be difficult to get right (must be configured correctly).  Startups aim to build fast and don’t think enough about how to get the data out.  Are law firm clients looking at cloud agreements and how to export data?  Free services (Facebook, Gmail, etc.) spy on users, which makes them inappropriate for corporate use where privacy is needed.  Slack output is one long conversation.  What about tools that provide a visualization?  You may need the data, not just a screenshot.  Understand the limit of repositories–Office 365 limits to 10GB of PST at a time.  What about versioning storage?  It is becoming more common as storage prices decline.  Do you need to collect all versions of a document?  “Computer ate my homework” excuses don’t fare well in court (e.g., production of privileged docs due to a bad mouse click, or missing docs matching a keyword search because they weren’t OCRed).  GDPR requires knowing where the users are (not where the data is stored).  Employees don’t want their private phones collected, so sandbox work stuff.

Employing Intelligence – Both Human And Artificial (AI) – To Reduce Overall eDiscovery Costs
You need to talk to custodians–the org chart doesn’t really tell you what you need to know.  Search can show who communicates with whom about a topic.  To discover that a custodian is involved that is not known to the attorney, look at the data and interview the ground troops.  Look for a period when there is a lack of communication.  Use sentiment analysis (including emojis).  Watch for strange bytes in the review tool–they may be emojis that can only be viewed in the original app.  Automate legal holds as much as possible.  Escalate to a manager if the employee doesn’t respond to the hold in a timely manner.  Filter on meta data to reduce the amount that goes into the load file.  Sometimes things go wrong with the software (trained on biased data, not finding relevant spreadsheets, etc.).  QC to ensure the human element doesn’t fail.  Use phonetic search on audio files instead of transcribing before search.  Analyze data as it comes in–you may spot months of missing email.  Do proof of concept when selecting tools.norcal2018_pool

Practical Discussion: eDiscovery Process With Law Firms, In-House And Vendor
Stick with a single vendor so you know it is done the same way every time.  Figure out what your data sources are.  Get social media data into the review platform in a usable form (e.g., Skype).  Finding the existence of cloud data stores requires effort.  How long is the cloud data being held (Twitter only holds the last 100 direct messages)?  The company needs to provide the needed apps so employees aren’t tempted to go outside to get what they need.

Highlights from the SoCal eDiscovery & IG Retreat 2017

The 2017 SoCal eDiscovery & IG Retreat was held at the Pelican Hill Resort in Newport Coast, California.   The format was somewhat different from other recent Ing3nious retreats, having at single session at a time instead of two sessions in parallel.  My notes below provide some highlights.  I’ve posted my full set of photos from the conference and nearby Crystal Cove here.socal2017_building

How Well Can Your Organization Protect Against Encrypted Traffic Threats?
Companies should be concerned about encrypted traffic, because they don’t know what is leaving their network.  Get encryption keys for cloud services the company uses so you can check outgoing content and block all other encrypted traffic — if something legitimate breaks, employees will let you know.  It is important to distinguish personal Drop Box use from corporate use.  Make sure you have a policy that says all corporate devices are subject to inspection and monitoring.  The CSO should report to the CEO rather than the CIO or too much ends up being spent on technology with too little spent on risk reduction.  Security tech must be kept up to date.  Some security vendors are using artificial intelligence.  The board of directors needs to be educated about their fiduciary duty to provide oversight for security, as established in a 1996 case in Delaware (see this article).  In what country is the backup of your cloud data stored?  That could be important during litigation.  The amount of unstructured data companies have can be surprising, and represents additional risk.  When the CSO reports to the board, he/she should speak in terms of risk (don’t use tech speak).  Build in security from the beginning when starting new projects.  GDPR violations could bring penalties of up to 4% of revenue. Guidance papers on GDPR are all over 40 pages long.  “Internet of Things” devices (e.g., refrigerators) are typically not secure.  Use DNS to detect attempts by IoT devices to call out.  IoT is collecting data about you to sell.  The book Future Crimes by Marc Goodman was recommended.

Using Technology To Reduce eDiscovery Spend
Artificial intelligence (AI) can be used before collection to reduce data volume.  Have a conversation about what’s really needed and use ECA to cull by date, topic, etc.  Process data from key players first.  It is important for project managers to know the data.  Parse out domain names, see who is talking to whom, see which folders people really have access to, and get rid of bad file types.  Image the machine of the person who will be leaving, then tell them you will be imaging the machine in the near future and see what they delete.  Use sentiment analysis and see if sentiment changes over time.  Use clustering to identify stuff that can be culled (e.g., stuff about the NFL).  Use clustering, rather than random sampling, to see what the data looks like.  Redaction of things like social security numbers can be automated.socal2017_hall

It’s All Greek To Me: Multi-Language Document Review from Shakespeare To FCPA
Examples were given of Craigslist ads seeking temporary people for foreign language document review, showing that companies performing such reviews may not have capable people on staff.  Law firms are relying on external providers to manage reviews in languages in which they are not fluent. English in Singapore is not the same as English in the U.S. (different expressions) — cultural context is important.  There are 6,900 languages around the world.  Law firms must do diligence to ensure a language expert is trustworthy.  Law firms don’t like being beta testers for technologies like TAR and machine translation.  Communications in Asia are often not in text file format (e.g., chat applications) and can involve hundreds of thousands of non-standard emojis (how to even render them?).  Facebook got a Palestinian man arrested by mistranslating his “good morning” to “attack them” (see this article).  One speaker suggested Googling “fraudulent foreign language reviewers” (the top match is here).  There was skepticism about the ALTA language proficiency test.

Artificial Intelligence – Facial Expression Analytics As A Competitive Advantage In Risk Mitigation
Monitoring emotional response can provide an advantage at trial.  Universal emotions: joy, sadness, surprise, fear, anger, disgust, and contempt.  The lawyer should avoid causing sadness since that is detrimental to being liked — let the witness do it.  Emotional response can depend on demographics.  For example, the contempt response depends on age, and women tend to show a larger fear response.  Software can now detect emotion from facial photos very quickly.  One panelist warned against using the iPhone X’s authentication via face recognition because Apple has software for detecting emotion and could monitor your mood.  80% of what a jury picks up on is non-verbal.  Analyze video of depositions to look for ways to improve.  Senior people at companies refuse to believe they don’t come across well, but they often show signs of contempt at questions they deem to be petty.  There is no facial expression for deception — look for a shift in expression.  Realize that software may not be making decisions in the same way as a human would.  For example, a neural network that did a good job of distinguishing wolves from dogs was actually making the decision based on the presence or absence of snow in the background.

TAR: What Have We Learned?
I moderated this panel, so I didn’t take notes.

Bridging The Gap Between Inside And Outside Counsel: Next Generation Strategies For Collaborating On Complex Litigation Matters
Communicate about what you actually need or they may collect everything regardless of date or custodian, resulting in high costs for hosting.  Insourcing is a trend — the company keeps the data in house (reduce cost and risk) and provides outside counsel with access.  socal2017_golfThis means imposing technology on the outside counsel.  One benefit of insourcing is that in house counsel learns about the data, which may help with future cases.  Another trend is disaggregation, where legal tasks are split up among different law firms instead of using a single firm for everything.  It is important to ensure that technologies being used are understood by all parties from the start to avoid big problems later.  Paralegals can be good at keeping communication flowing between the outside attorney and the client.  Tech companies that want people to adopt their products need to help outside counsel explain the benefits to clients.

Cyber And Data Security For The GC: How To Stay Out Of Headlines And Crosshairs
I couldn’t attend this panel because I had to catch my flight.

 

Highlights from the NorCal IG Retreat 2017

The 2017 NorCal Information Governance Retreat was norcal2017_lodgeheld by Ing3nious at the Quail Lodge & Golf Club in Carmel Valley, California.  After round table discussions, the retreat featured two simultaneous sessions throughout the day. My notes below provide some highlights from the sessions I was able to attend.  I’ve posted additional photos here.

The intro to the round table discussions included some comments on the evolution of the Internet, the importance of searching for obscenities to find critical documents or to identify data that has been scrubbed (it is implausible that there are no emails containing obscenities for a failing project), the difficulty of searching for “IT” (meaning information technology rather than the pronoun), and the inability of many tools to search for emojis.norcal2017_keynote

TAR: What Have We Learned?
I moderated this panel, so I didn’t take notes.

How Well Can Your Organization Protect Against Encrypted Traffic Threats?
I couldn’t attend this

IG Analytics And Infonomics: The Future Is Now
I couldn’t attend this

Breaches Happen. Going On The Cyber Offense With Deception
Breach stories that were mentioned included Equifax, Target, an employee that built their own (insecure) tunnel to get data out to their home, and an employee that carried data out on a microSD card.  In the RSA / Lockheed Martin breach, a Lockheed contractor was fooled by a phishing email, illustrating how hard it is to keep attackers out.  Email is a very common source of breaches.  A big mistake is not knowing that you’ve been breached.  People put honeypots outside the firewall to detect attacks. It’s better to use deception technology, which puts decoys inside the firewall.

Social Media And Website Information Governance
There has been some regulation of social media, especially for certain industries.  The SEC in 2012 required financial institutions to archive it.  The FTC has been enforcing paid endorsement disclosure guidelines (e.g., Kim Kardashian’s endorsement of a morning sickness drug).  Collecting evidence from social media is tricky.  A screenshot could be photoshopped, so how to prove it is legitimate?  Should collect a screenshot, source code, meta data, and a digital signature with time stamp.  Corporate policy on social media use will depend on the kind of company and the industry it is in.  There should also be a policy on monitoring employee’s social media use.  Companies using an internal social media system are asking for problems.  How will they police/discipline improper usage?  If an employee posts “Why haven’t I seen John lately?” and another replies that John has cancer, you have a problem.  Does a company social media system really improve productivity?  Can you find out who posted something anonymously on public social media?  If they posted from Starbucks or a library, probably not (finding the IP address won’t reveal the person’s identity).  This strategy worked for a bad review of a doctor that was thought to be from another doctor: 1) file in Federal court and get a court order to get the user’s IP address from the social media website, 2) go back to the judge and get a court order to get the ISP to give the identity of the person using that IP address at that time, 3) there is a motion to quash, which confirms that the right person was found (otherwise wouldn’t bother to fight it).

Bridging The Gap Between Inside And Outside Counsel: Next Generation Strategies For Collaborating On Complex Litigation Matters
I couldn’t attend thisnorcal2017_lunch

Preventing Inadvertent Disclosure In A Multi-Language World
Start by identifying teams and process.  Be aware of cultural differences.  Be aware of technological issues — there are 2 or 3 alternatives to MS Word that you might encounter for documents in Korean.  Be aware of laws against removing certain documents from the country.  There was disagreement among panel members about whether review quality of foreign documents was better in the U.S. due to reviewers better understanding U.S. law.  Viewing a document in the U.S. that is stored on a server in the E.U. is not a valid work-around for restrictions on exporting the documents.  Review in the U.S. is much cheaper than reviewing overseas (about 1/5 to 1/10 of the cost).  Violation of GDPR puts 4% of revenue at risk, but a U.S. judge may not care.  Take only what you need out of the country.  Many tools work best when they are analyzing documents in a single language, so use language identification and separate documents by language before analysis.  TAR may not work as well for non-English documents, but it does work.

What’s Your Trust Point?
I couldn’t attend this

Legal Tech And AI – Inventing The Future
Humans are better than computers at handling low-probability outlier events, because there is a lack of training data to teach machines to handle such things.  It is important for the technology to be easy for the user to interact with.  Legal clients are very cost averse, so a free trial of new tech is attractive.

The Cloud, New Technologies And Other Developments In Trade Secret Theft
I couldn’t attend this

Are You Prepared For The Impact Of Changing EU Data Privacy On U.S. Litigation?
I couldn’t attend this

IG Policy Pain Points In E-Discovery
Deletion of data that is not on hold 60 days after an employee norcal2017_mountainsleaves the company may not get everything since other custodians may have copies.  You may find that employees have archived their emails on a local hard drive.  Be clear about data ownership — wiping the phone of an employee that left the company may hit their personal data.  The general counsel is often not involved in decisions like BYOD (treated as an IT decision), but they should be.  Realize that having more data about employee behavior (e.g., GPS tracking) makes the company more responsible.  You rarely need the employee’s phone since there is little data cached there (data is on mail servers, etc.).  You should do info governance compliance testing to ensure that employees are following the procedures.  Policies must be realistic — there won’t be perfect separation of work and personal activity.  Flouted rules may be worse than no rules.  Keep personal data separate (personal folder, personal email address, use phone for accessing Facebook).  When doing an annual cleanup, what about the data from the employee who left the company?  A study showed that 85% of stored data is rot.  Have a checklist that you follow when an employee leaves — don’t wipe the computer without copying stuff you may need.

Highlights from the Northeast IG Retreat 2017

The 2017 Northeast Information Governance Retreat was held at the Salamander northeast2017_buildingResort & Spa in Middleburg, Virginia.  After round table discussions, the retreat featured two simultaneous sessions throughout the day. My notes below provide some highlights from the sessions I was able to attend.

Enhancing eDiscovery With Next Generation Litigation Management Software
I couldn’t attend this

Legal Tech and AI – Inventing The Futurenortheast2017_keynote
Machines are currently only good a routine tasks.  Interactions with machines should allow humans and machines to do what they do best.  Some areas where AI can aid lawyers: determining how long litigation will take, suggesting cases you should reference, telling how often the opposition has won in the past, determining appropriate prices for fixed fee arrangements, recruiting, or determining which industry on which to focus.  AI promises to help with managing data (e.g., targeted deletion), not just e-discovery.  Facial recognition may replace plane tickets someday.

Zen & The Art Of Multi-Language Discovery: Risks, Review & Translation
I couldn’t attend this

NexLP Demo
The NexLP tool emphasizes feature extraction and use of domain knowledge from external sources to figure out the story behind the data.  It can generate alerts based on changes in employee behavior over time.  Company should have a policy allowing the scanning of emails to detect bad behavior.  It was claimed that using AI on emails is better for privacy than having a human review random emails since it keeps human eyes away from emails that are not relevant.northeast2017_lunch

TAR: What Have We Learned?
I moderated this panel, so I didn’t take notes.

Are Managed Services Manageable?
I couldn’t attend this

Cyber And Data Security For The GC: How To Stay Out Of Headlines And Crosshairs
I couldn’t attend this

The Office Is Out: Preservation And Collection In The Merry Old LandOf Office 365
Enterprise 5 (E5) has advanced analytics from Equivio.  E3 and E1 can do legal hold but don’t have advanced analytics.  There are options available that are not on the website, and there are different builds — people are not all using the same thing.  Search functionality works on limited file types (e.g., Microsoft products).  Email attachments are OK if they are from Microsoft products.  It will not OCR PDFs that lack embedded text.  What about emails attached to emails?  Previously, it only went one layer deep on attachments.  Latest versions say they are “relaxing” that, but it is unclear what that means (how deep?).  User controls sync — are we really searching everything?  Make sure you involve IT, privacy, info governance, etc. if considering transition to 365.  Be aware of data that is already on hold if you migrate to 365.  Start by migrating a small group of people that are not often subject to litigation.  Test each data type after conversion.

How To Make Sense Of Information Governance Rules For Contractors When The Government Itself Can’t?northeast2017_garden
I couldn’t attend this

Judges, The Law And Guidance: Does ‘Reasonableness’ Provide Clarity?
This was primarily about the impact of the new Federal rules of civil procedure.  Clients are finally giving up on putting everything on hold.  Tie document retention to business needs — shouldn’t have to worry about sanctions.  Document everything (e.g., why you chose specific custodians to hold).  Accidentally missing one custodian out of a hundred is now OK.  Some judges acknowledge the new rules but then ignore them.  Boilerplate objections to discovery requests needs to stop — keep notes on why you made each objection.

Beyond The Firewall: Cybersecurity & The Human Factor
I couldn’t attend this

The Theory of Relativity: Is There A Black Hole In Electronic Discovery?northeast2017_social
The good about Relativity: everyone knows it, it has plug-ins, and moving from document to document is fast compared to previous tools.  The bad: TAR 1.0 (federal judiciary prefers CAL).  An audience member expressed concern that as Relativity gets close to having a monopoly we should expect high prices and a lack of innovation.  Relativity One puts kCura in competition with service providers.

The day ended with a wine social.

Highlights from the South Central IG Retreat 2017

The 2017 South Central Information Governance Retreat was the first retreat in the Ing3nious series held in Texas at the La Cantera Resort & Spa.  The retreat featured two simultaneous sessions throughout the day.  My notes below provide some highlights from the sessions I was able to attend.

The day started with roundtable discussions that were kicked off by a speaker who talked about the early days of the Internet.  He made the point that new lawyers may know less about how computers actually work even though they were born in an era when they are more pervasive.  He mentioned that one of the first keyword searches he performs when he receives a production is for “f*ck.”  If a company was having problems with a product and there isn’t a single email using that word, something was surely withheld from the production.  He made the point that expert systems that are intended to replace lawyers must be based on how the experts (lawyers) actually think.  How do you identify the 50 documents that will actually be used in trial?

Borrowing Agile Development Concepts To Jump-Start Your Information Governance Program
I couldn’t attend this

Your Duty To Preserve: Avoiding Traps In Troubled Times
When storing data in the cloud, what is actually retained?  How can you get the data out?  Google Vault only indexes newly added emails, not old ones.  The company may not have the right to access employee data in the cloud.  One panelist commented that collection is preferred to preservation in place.

Enhancing eDiscovery With Next Generation Litigation Management Software
I couldn’t attend this one.

Leveraging The Cloud & Technology To Accelerate Your eDiscovery Process
Cloud computing seems to have reached an inflection point.  A company cannot put the resources into security and data protection that Amazon can.  The ability to scale up/down is good for litigation that comes and goes.  Employees can jump into cloud services without the preparation that was required for doing things on site.  Getting data out can be hard.  Office 365 download speed can be a problem (2-3 GB/hr) — reduce data as much as possible.

Strategies For Effectively Managing Your eDiscovery Spend
I couldn’t attend this one.

TAR: What Have We Learned?
I moderated this panel, so I didn’t take notes.

Achieving GDPR Compliance For Unstructured Content
I couldn’t attend this one.

Zen & The Art Of Multi-Language Discovery: Risks, Review & Translation
The translation company should be brought in when the team is formed (it often isn’t done until later).  Help may be needed from translator / localization expert to come up with search terms.  For example, there are 20 ways to say “CEO” in Korean.  Translation must be done by an expert to be certified.  When using TAR, do review in the native language and translate the result before presenting to the legal team.  Translation is much slower than review.  Machine translation has improved over the last 2 years, but it’s not good enough to rely on for anything important.  A translator leaked Toyota’s data to the press — keep the risk in mind and make sure you are informed about the environment where the work is being done (screenshots should be prohibited).

Beyond The Firewall: Cybersecurity & The Human Factor
I couldn’t attend this one.

Ethical Obligations Relating To Metadata
Nineteen states have enacted ethical rules on meta-data.  Sometimes, metadata is enough to tell the whole story.  John McAfee was found and arrested because of GPS coordinates embedded in a photo of him.  Metadata showed that a terminated whistleblower’s employee review was written 3 months after termination.  Forensic collection is important to not spoil the metadata.  Ethical obligations of attorneys are broader than attorney-client privilege.  Should attorneys be encrypting email?  Make the client aware of metadata and how it can be viewed.  The attorney must understand metadata and scrub it as necessary (e.g, change tracking in Word).  In e-discovery metadata is treated like other ESI.  Think about metadata when creating a protective order.  What are the ethical restrictions of viewing and mining metadata received through discovery?  Whether you need to disclose receipt of confidential or privileged metadata depends on the jurisdiction.

Legal Risks Associated With Failing To Have A Cyber Incident Response Plan
I couldn’t attend this one.

“Defensible Deletion” Is The Wrong Frame
Defensible deletion started with an IBM survey that found that on average 69% of corporate data has no value, 6% is subject to litigation hold, and 25% is useful.  IBM started offering to remove 45% of data without doing any harm to a company (otherwise, you don’t have to pay).  Purging requires effort, so make deletion the default.  Statistical sampling can be used to confirm that retention rules won’t cause harm.  After a company said that requested data wasn’t available because it had been deleted in accordance with the retention policy, an employee who was being deposed said he had copied everything to 35 CDs — it can be hard to ensure that everything is gone even if you have the right policy.