Tag Archives: information governance

Highlights from IG3 West 2018

The IG3 West conference was held by Ing3nious at the Paséa Hotel & Spa in Huntington Beach, California. ig3west2018_hotel This conference differed from other recent Ing3nious events in several ways.  It was two days of presentations instead of one.  There were three simultaneous panels instead of two.  Between panels there were sometimes three simultaneous vendor technology demos.  There was an exhibit hall with over forty vendor tables.  Due to the different format, I was only able to attend about a third of the presentations.  My notes are below.  You can find my full set of photos here.

Stop Chasing Horses, Start Building Fences: How Real-Time Technologies Change the Game of Compliance and Governance
Chris Surdak, the author of Jerk:  Twelve Steps to Rule the World, talked about changing technology and the value of information, claiming that information is the new wealth.  Facebook, Amazon, Apple, Netflix, and Google together are worth more than France [apparently he means the sum of their market capitalizations  is greater than the GDP of France, though that is a rather apples-to-oranges comparison since GDP is an annualized number].  We are exposed to persistent ambient surveillance (Alexa, Siri, Progressive Snapshot, etc.).  It is possible to detect whether someone is lying by using video to detect blood flow to their face.  Car companies monetized data about passengers’ weight (measured due to air bags). ig3west2018_keynote Sentiment analysis has a hard time with sarcasm.  You can’t find emails about fraud by searching for “fraud” — discussions about fraudulent activity may be disguised as weirdly specific conversations about lunch.  The problem with graph analysis is that a large volume of talk about something doesn’t mean that it’s important.  The most important thing may be what’s missing.  When RadioShack went bankrupt, its remaining value was in its customer data — remember them asking for your contact info when you bought batteries?  A one-word change to FRCP 37(e) should have changed corporate retention policies, but nobody changed.  The EU’s right to be forgotten is virtually impossible to implement in reality (how to deal with backup tapes?) and almost nobody does it.  Campbell’s has people shipping their DNA to them so they can make diet recommendations to them.  With the GDPR, consent nullifies the protections, so it doesn’t really protect your privacy.

AI and the Corporate Law Department of the Future
Gartner says AI is at the peak of inflated expectations and a trough of disillusionment will follow.  Expect to be able to buy autonomous vehicles by 2023.  The economic downturn of 2008 caused law firms to start using metrics.  Legal will take a long time to adopt AI — managing partners still have assistants print stuff out.  Embracing AI puts a firm ahead of its competitors.  Ethical obligations are also an impediment to adoption of technology, since lawyers are concerned about understanding the result.

Advanced TAR Considerations: A 500 Level Crash Course
Continuous Active Learning (CAL), also called TAR 2.0, can adapt to shifts in the concept of relevance that may occur during the review.  There doesn’t seem to be much difference in the efficiency of SVM vs logistic regression when they are applied to the same task.  There can be a big efficiency difference between different tasks.  TAR 1.0 requires a subject-matter expert for training, but senior attorneys are not always readily available.  With TAR 1.0 you may be concerned that you will be required to disclose the training set (including non-responsive documents), but with TAR 2.0 there is case law that supports that being unnecessary [I’ve seen the argument that the production itself is the training set, but that neglects the non-responsive documents that were reviewed (and used for training) but not produced.  On the other hand, if you are taking about disclosing just the seed set that was used to start the process, that can be a single document and it has very little impact on the result.].  Case law can be found at predictivecoding.com, which is updated at the end of each year.  TAR needs text, not image data.  Sometimes keywords are good enough.  When it comes to government investigations, many agencies (FTC, DOJ) use/accept TAR.  It really depends on the individual investigator, though, and you can’t fight their decision (the investigator is the judge).  Don’t use TAR for government investigations without disclosing that you are doing so.  TAR can have trouble if there are documents having high conceptual similarity where some are relevant and some aren’t.  Should you tell opposing counsel that you’re using TAR?  Usually, but it depends on the situation.  When the situation is symmetrical, both sides tend to be reasonable.  When it is asymmetrical, the side with very little data may try to make things expensive for the other side, so say something like “both sides may use advanced technology to produce documents” and don’t give more detail than that (e.g., how TAR will be trained, who will do the training, etc.) or you may invite problems.  Disclosing the use of TAR up front and getting agreement may avoid problems later.  Be careful about “untrainable documents” (documents containing too little text) — separate them out, and maybe use meta data or file type to help analyze them.  Elusion testing can be used to make sure too many relevant documents weren’t missed.  One panelist said 384 documents could be sampled from the elusion set, though that may sometimes not be enough.  [I have to eat some crow here.  I raised my hand and pointed out that the margin of error for the elusion has to be divided by the prevalence to get the margin of error for the recall, which is correct.  I went on to say that with a sample of 384 giving ±5% for the elusion you would have ±50% for the recall if prevalence was 10%, making the measurement worthless.  The mistake is that while a sample of 384 technically implies a worst case of ±5% for the margin of error for elusion, it’s not realistic for the margin of error to be that bad for elusion because ±5% would occur if elusion was near 50%, but elusion is typically very small (smaller than the prevalence), causing the margin of error for the elusion to be significantly less than ±5%.  The correct margin of error for the recall from an elusion sample of 384 documents would be ±13% if the prevalence is 10%, and ±40% if the prevalence is 1%.  So, if prevalence is around 10% an elusion sample of 384 isn’t completely worthless (though it is much worse than the ±5% we usually aim for), but if prevalence is much lower than that it would be].

40 Years in 30 Minutes: The Background to Some of the Interesting Issues we Face
Steven Brower talked about the early days of the Internet and the current state of technology. ig3west2018_reception1 Early on, a user ID was used to tell who you were, not to keep you out.  Technology was elitist, and user-friendly was not a goal.  Now, so much is locked down for security reasons that things become unusable.  Law firms that prohibit access to social media force lawyers onto “secret” computers when a client needs something taken down from YouTube.  Emails about laws against certain things can be blocked due to keyword hits for the illegal things being described.  We don’t have real AI yet.  The next generation beyond predictive coding will be able to identify the 50 key documents for the case.  During e-discovery, try searching for obscenities to find things like: “I don’t give a f*** what the contract says.”  Autonomous vehicles won’t come as soon as people are predicting.  Snow is a problem for them.  We may get vehicles that drive autonomously from one parking lot to another, so the route is well known.  When there are a bunch of inebriated people in the car, who should it take commands from?  GDPR is silly since email bounces from computer to computer around the world.  The Starwood breach does not mean you need to get a new passport — your passport number was already out there.  To improve your security, don’t try to educate everyone about cybersecurity — you can eliminate half the risk by getting payroll to stop responding to emails asking for W2 data that appear to come from the CEO.  Scammers use the W2 data to file tax returns to get the refunds.  This is so common the IRS won’t even accept reports on it anymore.  You will still get your refund if it happens to you, but it’s a hassle.

Digging Into TAR
I moderated this panel, so I didn’t take notes.  We did the TAR vs. Keyword Search Challenge again.  The results are available here.

After the Incident: Investigating and Responding to a Data Breach
Plan in advance, and remember that you may not have access to the laptop containing the plan when there is a breach. Get a PR firm that handles crises in advance.  You need to be ready for the negative comments on Twitter and Facebook.  Have the right SMEs for the incident on the team.  Assume that everything is discoverable — attorney-client privilege won’t save you if you ask the attorney for business (rather than legal) advice.  Notification laws vary from state to state.  An investigation by law enforcement may require not notifying the public for some period of time.  You should do an annual review of your cyber insurance since things are changing rapidly.  Such policies are industry specific.

Employing Technology/Next-Gen Tools to Reduce eDiscovery Spend
Have a process, but also think about what you are doing and the specifics of the case.  Restrict the date range if possible.  Reuse the results when you have overlapping cases (e.g., privilege review).  Don’t just look at docs/hour when monitoring the review.  Look at accuracy and get feedback about what they are finding.  CAL tends to result in doing too much document review (want to stop at 75% recall but end up hitting 89%).  Using a tool to do redactions will give false positives, so you need manual QC of the result.  When replacing a patient ID with a consistent anonymized identifier, you can’t just transform the ID because that could be inverted, resulting in a HIPAA violation.

eDiscovery for the Rest of us
What are ediscovery considerations for relatively small data sets?  During meet and confer, try to cooperate.  Judges hate ediscovery disputes.  Let the paralegals hash out the details — attorneys don’t really care about the details as long as it works.  Remote collection can avoid travel costs and hourly fees while keeping strangers out of the client’s office.  The biggest thing they look for from vendors is cost.  Need a certain volume of data for TAR to be practical.  Email threading can be used at any size.

Does Compliance Stifle or Spark Innovation?
Startups tend to be full of people fleeing big corporations to get away from compliance requirements. ig3west2018_reception2 If you do compliance well, that can be an advantage over competitors.  Look at it as protecting the longevity of the business (protecting reputation, etc.).  At the DoD, compliance stifles innovation, but it creates a barrier against bad guys.  They have thousands of attacks per day and are about 8 years behind normal innovation.  Gray crimes are a area for innovation — examples include manipulation (influencing elections) and tanking a stock IPO by faking a poisoning.  Hospitals and law firms tend to pay, so they are prime targets for ransomware.

Panels That I Couldn’t Attend:
California and EU Privacy Compliance
What it all Comes Down to – Enterprise Cybersecurity Governance
Selecting eDiscovery Platforms and Vendors
Defensible Disposition of Data
Biometrics and the Evolving Legal Landscape
Storytelling in the Age of eDiscovery
Technology Solution Update From Corporate, Law Firm and Service Provider Perspective
The Internet of Things and Everything as a Service – the Convergence of Security, Privacy and Product Liability
Similarities and Differences Between the GDPR and the New California Consumer Privacy Act – Similar Enough?
The Impact of the Internet of Things on eDiscovery
Escalating Cyber Risk From the IT Department to the Boardroom
So you Weren’t Quite Ready for GDPR?
Security vs. Compliance and Why Legal Frameworks Fall Short to Improve Information Security
How to Clean up Files for Governance and GDPR
Deception, Active Defense and Offensive Security…How to Fight Back Without Breaking the Law?
Information Governance – Separating the “Junk” from the “Jewels”
What are Big Law Firms Saying About Their LegalTech Adoption Opportunities and Challenges?
Cyber and Data Security for the GC: How to Stay out of Headlines and Crosshairs

Highlights from the Northeast eDiscovery & IG Retreat 2018

The 2018 Northeast eDiscovery and Information Governance Retreat was northeast_2018_building1held at the Salamander Resort & Spa in Middleburg, Virginia.  It was a full day of talks with a parallel set of talks on Cybersecurity, Privacy, and Data Protection in the adjacent room. Attendees could attend talks from either track. Below are my notes (certainly not exhaustive) from the eDiscovery and IG sessions. My full set of photos is available here.

Stratagies For Data Minimization Of Legacy Data
Backup and archiving should be viewed as separate functions.  When it comes to spoliation (FRCP Rule 37), reasonableness of the company’s data retention plan is key.  Over preservation is expensive.  There are not many cases on Rule 37 relating to backup tapes.  People are changing their behavior due to the changes in the FRCP, especially in heavily regulated industries such as healthcare and financial services.  Studies find that typically 70% of data has no business value and is not subject to legal hold or retention requirements for compliance.  When using machine learning, you can focus on finding what to keep or what to get rid of.  It is often best to start with unsupervised machine learning.  Be mindful of destructive malware.  To mitigate security risks, it is important to know where your data (including backup tapes) is.  If a backup tape goes missing, do you need to notify customers (privacy)?  To get started, create a matrix showing what you need to keep, keeping in mind legal holds and privacy (GDPR).  Old backup tapes are subject to GDPR.  Does the right to be forgotten apply to backup tapes?  There is currently no answer.  It would be hard to selectively delete data from the tapes, so maybe have a process that deletes during the restore.  There can be conflicts between U.S. ediscovery and GDPR, so you must decide which is the bigger risk.

Preparing A Coordinated Response To Government Inquiries And Investigations
You might find out that you are being investigated by the FBI or other investigator approaching one of your employees — get an attorney. northeast_2018_horses Reach out to the investigator, take it seriously, and ask for a timeline.  You may receive a broad subpoena because the investigator whats to ensure they get everything important, but you can often get them to narrow it.  Be sure to retain outside counsel immediately.  In one case a CEO negotiated search terms with a prosecutor without discussing custodians, so they had to search all employees.  The prosecutor can’t handle a huge volume of data, so it should be possible to negotiate a reasonable production.  In addition to satisfying the subpoena, you need to simultaneously investigate whether there is an ongoing problem that needs to be addressed.  Is your IT group able to forensically preserve and produce the documents?  You don’t want to mess up a production in front of a regulator, so get expertise in place early.  Data privacy can be an issue.  When dealing with operations in Europe, it is helpful to get employee consent in advance — nobody wants to consent during an investigation.  Beware of data residing in disparate systems in different languages.  Google translate is not very good, e.g. you have to be careful about slang.    Employees may try to cover their tracks.  In one case an employee was using “chocolate” as an encoded way to refer to a payment.  In another case an employee took a hammer to a desktop computer, though the hard drive was still recoverable.  Look for gaps in email or anomalous email volume.  Note that employees may use WhatsApp or Signal to communicate.  The DOJ expects you to be systematic (e.g., use analytics) about compliance.  See what data is available, even if it wasn’t subpoenaed, since it may help your side (email usually doesn’t).

Digging Into TAR
I moderated this panel, so I didn’t take notes. We challenged the audience to create a keyword search that would work better than technology-assisted review. Results are posted here.

Implementing Information Governance – Nightmare On Corporate America Street?
You need to weigh the value of the data against the risk of keeping it.  What is your business model?  That will dictate information governance. northeast_2018_reception Domino’s was described as a technology company that happens to distribute hot bread.  Unstructured data has the biggest footprint and the most rapid growth.  Did you follow your policies?  Your insurance company may be very picky about that when looking for a reason not to pay out.  They may pay out and then sue you over the loss.  Fear is a good motivator.  Threats from the OCC or FDIC over internal data management can motivate change.  You can quantify risk because the cost of having a data breach is now known. Info governance is utilization awareness, not just data management.  Know where your data is.  What about the employee that creates an unauthorized AWS account?  This is the “shadow ecosystem” or “shadow IT.”  One company discovered they had 50,000 collaborative SharePoint sites they didn’t know about.  For info governance standards see The Sedona Conference and EDRM.

Technology Solution Update From Corporate, Law Firm And Service Provider Perspective
Artificial intelligence (AI) should not merely analyze; it should present a result in a way that is actionable.  It might tell you how much two people talk, their sentiment, and whether there are any spikes in communication volume.  AI can be used by law firms for budgeting by analyzing prior matters.  There are concerns about privacy with AI.  Many clients are moving to the cloud.  Many are using private clouds for collaboration, not necessarily for utilizing large computing power.  Office 365 is of interest to many companies.  There was extensive discussion about the ediscovery analytics capabilities being added from the Equivio acquisition, and a demo by Marcel Katz of Microsoft.  The predictive coding (TAR) capability uses simple active learning (SAL) rather than continuous active learning (CAL).  It is 20 times slower in the cloud than running Equivio on premises.  There is currently no review tool in Office 365, so you have to export the predictions out and do the review elsewhere.  Mobile devices create additional challenges for ediscovery.  The time when a text message is sent may not match the time when it is received if the receiving device is off when the message is sent.  Technology needs to be able to handle emojis.  There are many different apps with many different data storage formats.

The ‘Team Of Teams’ Approach To Enterprise Security And Threat Management
Fast response is critical when you are attacked.  Response must be automated because a human response is not fast enough.  It can take 200 days to detect an adversary on the network, so assume someone is already inside.  What are the critical assets, and what threats should you look for?  What value does the data have to the attacker?  What is the impact on the business?  What is the impact on the people?  Know what is normal for your systems.  Is a large data transfer at 2:00am normal?  Simulate a phishing attack and see if your employees fall for it.  In one case a CEO was known to be in China for a deal, so someone impersonating the CEO emailed the CFO to send $50 million for the deal.  The money was never recovered.  Have processes in place, like requiring a signature for amounts greater than $10,000.  If a company is doing a lot of acquisitions, it can be hard to know what is on their network.  How should small companies get started?  Change passwords, hire an external auditor, and make use of open source tools.

From Data To GRC Insight
Governance, risk management, and compliance (GRC) needs tonortheast_2018_building2 become centralized and standardized.  Practicing incident response as a team results in better responses when real incidents happen.  Growing data means growing risk.  Beware of storage of social security numbers and credit card numbers.  Use encryption and limit access based on role.  Detect emailing of spreadsheets full of data.  Know what the cost of HIPAA violations is and assign the risk of non-compliance to an individual.  Learn about the NIST Cybersecurity Framework.  Avoid fines and reputational risk, and improve the organization.  Transfer the risk by having data hosted by a company that provides security.  Cloud and mobile can have big security issues.  The company can’t see traffic on mobile devices to monitor for phishing.

 

Highlights from the South Central eDiscovery & IG Retreat 2018

The 2018 South Central eDiscovery and Information Governance Retreat was held at Lakeway Resort and Spa, outside of Austin.  It was a full day of talks with a parallel set of talks on Cybersecurity, Privacy, and Data Protection in the adjacent room.  Attendees could attend talks from either track. Below are my notes (certainly not exhaustive) from the eDiscovery and IG sessions. My full set of photos is available here.southcentral_2018_pool

Blowing The Whistle
eDiscovery can be used as a weapon to drive up costs for an adversary.  Make requests broad and make the other side reveal what they actually have.  Ask for “all communications” rather than “all Office 365 emails” or you may miss something (for example, they may use Slack).  The collection may be 1% responsive.  How can it be culled defensibly?  Ask for broad search terms, get hit rates, and then adjust.  The hit rates don’t tell how many documents were actually relevant, so use sampling.  When searching for patents, search for “123 patent” instead of just “123” to avoid false positives (patent references often use just the last 3 digits).  This rarely happens, but you might get the producing party to disclose top matches for the queries and examine them to give feedback on desired adjustments.  You should have a standard specification for the production format you want, and you should get it to the producing party as soon as possible, or you might get 20,000 emails produced in one large PDF that you’ll have to waste time dissecting, and meta data may be lost.  If keyword search is used during collection, be aware that Office 365 currently doesn’t OCR non-searchable content, so it will be missed.  Demand that the producing party OCR before applying any search terms.  In one production there were a lot of “gibberish” emails returned because the search engine was matching “ING” to all words ending in “ing” rather than requiring the full word to match.  If ediscovery disputes make it to the judge, it’s usually not a good thing since the judge may not be very technical.

Digging Into TAR
I moderated this panel, so I didn’t take notes.  We challenged the audience to create a keyword search that would work better than TAR.  Results are posted here.

Beyond eDiscovery – Creating Context By Connecting Disparate Data
Beyond the custodian, who else had access to this file?  Who should have access, and who shouldn’t?  Forensics can determine who accessed or printed a confidential file.  The Windows registry tracks how users access files.  When you print, an image is stored.  Figure out what else you can do with the tech you have.  For example, use Sharepoint workflows to help with ediscovery.  Predictive coding can be used with structured data.  Favorite quote: “Anyone who says they can solve all of my problems with one tool is a big fat liar.”southcentral_2018_keynote

Improving Review Efficiency By Maximizing The Use Of Clustering Technology
Clustering can lead to more consistent review by ensuring the same person reviews similar documents and reviews them together.  The requesting party can use clustering to get an overview of what they’ve received.  Image clustering identifies glyphs to determine document similarity, so it can detect things like a Nike logo, or it can be sensitive to the location on the page where the text occurs.  It is important to get the noise (e.g., email footers) out of the data before clustering.  Text messages and spreadsheets may cause problems.  Clustering can be used for ECA or keyword generation, where it is not making final determinations for a document.  It can reveal abbreviations scientists are using for technical terms.  It can also be used to identify clusters that can be excluded from review (not relevant).  It can be used to prioritize review, with more promising clusters reviewed first.  Should you tell the other side you are using clustering to come up with keywords?  No, you are just inviting controversy.

Technology Solution Update From Corporate, Law Firm And Service Provider Perspective
Migration to Office 365 and other cloud offerings can cause problems.  Data can be dumped into the cloud without tracking where it went.  Figuring out how to collect from the cloud can be difficult.  Microsoft is always changing Office 365, making it difficult to stay on top of the changes.  Favorite quote: “I’m always running to keep up.  I should be skinnier, but I’m not.”  Office 365 is supposed to have OCR soon.  What if the cloud platform gets hacked?  There can be throttling issues when collecting from One Drive by downloading everything (not using Microsoft’s tool).  Rollout of cloud services should be slow to make sure everyone knows what should be put in the cloud and what shouldn’t, and to ensure that you keep track of where everything is.  Be careful about emailing passwords since they may be recorded — use ephemeral communications instead of email for that.  Personal devices cause problems because custodians don’t like having their devices collected.  Policy is critical, but it is not a cure-all.  Policy must be surrounded by communication and re-certification to ensure it is followed.  Google mail is not a good solution for restricting data location since attachments are copied to the local disk when they are viewed.southcentral_2018_lunch

Achieving GDPR Compliance For Unstructured Content
Some technology was built for GDPR while other tech was build for some other purpose like ediscovery and tweaked for GDPR, so be careful.  For example. you don’t want to have to collect the data before determining whether it contains PII.  The California privacy law taking effect in 2020 is similar to GDPR, so U.S. companies cannot ignore the issue.  Backup tapes should be deleted after 90 days.  They are for emergencies, not retention.  Older backups often don’t work (e.g., referenced network addresses are no longer valid).

Escalating Cyber Risk From The IT Department To The Boardroom
One very effective way to change a company’s culture with respect to security is to break people up into white vs. black teams and hold war games where one team attacks and the other tries to come up with the best way to defend against it.  You need to point out both the risk and how to fix it to get the board’s attention.  Show the board a graph with the expected value lost in a breach on the vertical axis and cost to eliminate the risk on the horizontal axis — points lying above the 45 degsouthcentral_2018_buildingree line are risks that should be eliminated (doing so saves money).  On average, a server breach costs 28% of operating costs.  Investors may eventually care if someone on the board has a security certification.  It is OK to question directors, but don’t call out their b.s..  The Board cares most about what the CEO and CFO are saying.  Ethical problems tend to happen when things are too siloed.

Highlights from the NorCal eDiscovery & IG Retreat 2018

The 2018 NorCal eDiscovery & IG Retreat was held at the Carmel Valley Ranch,norcal2018_valley location of the first Ing3nious retreat in 2011 (though the company wasn’t called Ing3nious at the time).  It was a full day of talks with a parallel set of talks on Cybersecurity, Privacy, and Data Protection in the adjacent room.  Attendees could attend talks from either track.  Below are my notes (certainly not exhaustive) from the eDiscovery and IG sessions.  My full set of photos is available here.

Digging Into TAR
I moderated this panel, so I didn’t take notes.  We challenged the audience to create a keyword search that would work better than TAR.  Results are posted here.

Information Governance In The Age Of Encryption And Ephemeral Communications
Facebook messenger has an ephemeral mode, though it is currently only available to Facebook executives.  You can be forced to decrypt data (despite the 5th Amendment) if it can be proven that you have the password.  Ephemeral communication is a replacement for in-person communication, but it can look bad (like you have something to hide).  53% of email is read on mobile devices, but personal devices often aren’t collected.  Slack is useful for passing institutional knowledge along to new employees, but general counsel wants things deleted after 30 days.  Some ephemeral communication tools have archiving options.  You may want to record some conversations in email–you may need them as evidence in the future.  Are there unencrypted copies of encrypted data in some locations?norcal2018_intro

Blowing The Whistle
eDiscovery can be used as a weapon to drive up costs for an adversary.  The plaintiff should be skeptical about about claims of burden–has appropriate culling been performed? Do a meet and confer as early as possible.  Examine data for a few custodians and see if more are needed. A data dump is when a lot of non-relevant docs are produced (e.g., due to a broad search or a search that matches an email signature).  Do sampling to test search terms.  Be explicit about what production formatting you want (e.g., searchable PDF, color, meta data).

Emerging Technology And The Impact On eDiscovery
There may be a lack of policy for new data sources.  Text messages and social media are becoming relevant for more cases.  Your Facebook info can be accessed through your friends.  Fitbit may show whether the person could have committed the murder. IP addresses can reveal whether email was sent from home or work. The change to the Twitter character limit may break some collection tools–QC early on to detect such problems.  Vendors should have multiple tools.  Communicate about what tech is involved and what you need to collect.norcal2018_lunch

Technology Solution Update From Corporate, Law Firm And Service Provider Perspective
Cloud computing (infrastructure, storage, productivity, and web apps) will cause conflict between EU privacy law and US discovery.  AWS provides lots of security options, but it can be difficult to get right (must be configured correctly).  Startups aim to build fast and don’t think enough about how to get the data out.  Are law firm clients looking at cloud agreements and how to export data?  Free services (Facebook, Gmail, etc.) spy on users, which makes them inappropriate for corporate use where privacy is needed.  Slack output is one long conversation.  What about tools that provide a visualization?  You may need the data, not just a screenshot.  Understand the limit of repositories–Office 365 limits to 10GB of PST at a time.  What about versioning storage?  It is becoming more common as storage prices decline.  Do you need to collect all versions of a document?  “Computer ate my homework” excuses don’t fare well in court (e.g., production of privileged docs due to a bad mouse click, or missing docs matching a keyword search because they weren’t OCRed).  GDPR requires knowing where the users are (not where the data is stored).  Employees don’t want their private phones collected, so sandbox work stuff.

Employing Intelligence – Both Human And Artificial (AI) – To Reduce Overall eDiscovery Costs
You need to talk to custodians–the org chart doesn’t really tell you what you need to know.  Search can show who communicates with whom about a topic.  To discover that a custodian is involved that is not known to the attorney, look at the data and interview the ground troops.  Look for a period when there is a lack of communication.  Use sentiment analysis (including emojis).  Watch for strange bytes in the review tool–they may be emojis that can only be viewed in the original app.  Automate legal holds as much as possible.  Escalate to a manager if the employee doesn’t respond to the hold in a timely manner.  Filter on meta data to reduce the amount that goes into the load file.  Sometimes things go wrong with the software (trained on biased data, not finding relevant spreadsheets, etc.).  QC to ensure the human element doesn’t fail.  Use phonetic search on audio files instead of transcribing before search.  Analyze data as it comes in–you may spot months of missing email.  Do proof of concept when selecting tools.norcal2018_pool

Practical Discussion: eDiscovery Process With Law Firms, In-House And Vendor
Stick with a single vendor so you know it is done the same way every time.  Figure out what your data sources are.  Get social media data into the review platform in a usable form (e.g., Skype).  Finding the existence of cloud data stores requires effort.  How long is the cloud data being held (Twitter only holds the last 100 direct messages)?  The company needs to provide the needed apps so employees aren’t tempted to go outside to get what they need.

Highlights from the SoCal eDiscovery & IG Retreat 2017

The 2017 SoCal eDiscovery & IG Retreat was held at the Pelican Hill Resort in Newport Coast, California.   The format was somewhat different from other recent Ing3nious retreats, having at single session at a time instead of two sessions in parallel.  My notes below provide some highlights.  I’ve posted my full set of photos from the conference and nearby Crystal Cove here.socal2017_building

How Well Can Your Organization Protect Against Encrypted Traffic Threats?
Companies should be concerned about encrypted traffic, because they don’t know what is leaving their network.  Get encryption keys for cloud services the company uses so you can check outgoing content and block all other encrypted traffic — if something legitimate breaks, employees will let you know.  It is important to distinguish personal Drop Box use from corporate use.  Make sure you have a policy that says all corporate devices are subject to inspection and monitoring.  The CSO should report to the CEO rather than the CIO or too much ends up being spent on technology with too little spent on risk reduction.  Security tech must be kept up to date.  Some security vendors are using artificial intelligence.  The board of directors needs to be educated about their fiduciary duty to provide oversight for security, as established in a 1996 case in Delaware (see this article).  In what country is the backup of your cloud data stored?  That could be important during litigation.  The amount of unstructured data companies have can be surprising, and represents additional risk.  When the CSO reports to the board, he/she should speak in terms of risk (don’t use tech speak).  Build in security from the beginning when starting new projects.  GDPR violations could bring penalties of up to 4% of revenue. Guidance papers on GDPR are all over 40 pages long.  “Internet of Things” devices (e.g., refrigerators) are typically not secure.  Use DNS to detect attempts by IoT devices to call out.  IoT is collecting data about you to sell.  The book Future Crimes by Marc Goodman was recommended.

Using Technology To Reduce eDiscovery Spend
Artificial intelligence (AI) can be used before collection to reduce data volume.  Have a conversation about what’s really needed and use ECA to cull by date, topic, etc.  Process data from key players first.  It is important for project managers to know the data.  Parse out domain names, see who is talking to whom, see which folders people really have access to, and get rid of bad file types.  Image the machine of the person who will be leaving, then tell them you will be imaging the machine in the near future and see what they delete.  Use sentiment analysis and see if sentiment changes over time.  Use clustering to identify stuff that can be culled (e.g., stuff about the NFL).  Use clustering, rather than random sampling, to see what the data looks like.  Redaction of things like social security numbers can be automated.socal2017_hall

It’s All Greek To Me: Multi-Language Document Review from Shakespeare To FCPA
Examples were given of Craigslist ads seeking temporary people for foreign language document review, showing that companies performing such reviews may not have capable people on staff.  Law firms are relying on external providers to manage reviews in languages in which they are not fluent. English in Singapore is not the same as English in the U.S. (different expressions) — cultural context is important.  There are 6,900 languages around the world.  Law firms must do diligence to ensure a language expert is trustworthy.  Law firms don’t like being beta testers for technologies like TAR and machine translation.  Communications in Asia are often not in text file format (e.g., chat applications) and can involve hundreds of thousands of non-standard emojis (how to even render them?).  Facebook got a Palestinian man arrested by mistranslating his “good morning” to “attack them” (see this article).  One speaker suggested Googling “fraudulent foreign language reviewers” (the top match is here).  There was skepticism about the ALTA language proficiency test.

Artificial Intelligence – Facial Expression Analytics As A Competitive Advantage In Risk Mitigation
Monitoring emotional response can provide an advantage at trial.  Universal emotions: joy, sadness, surprise, fear, anger, disgust, and contempt.  The lawyer should avoid causing sadness since that is detrimental to being liked — let the witness do it.  Emotional response can depend on demographics.  For example, the contempt response depends on age, and women tend to show a larger fear response.  Software can now detect emotion from facial photos very quickly.  One panelist warned against using the iPhone X’s authentication via face recognition because Apple has software for detecting emotion and could monitor your mood.  80% of what a jury picks up on is non-verbal.  Analyze video of depositions to look for ways to improve.  Senior people at companies refuse to believe they don’t come across well, but they often show signs of contempt at questions they deem to be petty.  There is no facial expression for deception — look for a shift in expression.  Realize that software may not be making decisions in the same way as a human would.  For example, a neural network that did a good job of distinguishing wolves from dogs was actually making the decision based on the presence or absence of snow in the background.

TAR: What Have We Learned?
I moderated this panel, so I didn’t take notes.

Bridging The Gap Between Inside And Outside Counsel: Next Generation Strategies For Collaborating On Complex Litigation Matters
Communicate about what you actually need or they may collect everything regardless of date or custodian, resulting in high costs for hosting.  Insourcing is a trend — the company keeps the data in house (reduce cost and risk) and provides outside counsel with access.  socal2017_golfThis means imposing technology on the outside counsel.  One benefit of insourcing is that in house counsel learns about the data, which may help with future cases.  Another trend is disaggregation, where legal tasks are split up among different law firms instead of using a single firm for everything.  It is important to ensure that technologies being used are understood by all parties from the start to avoid big problems later.  Paralegals can be good at keeping communication flowing between the outside attorney and the client.  Tech companies that want people to adopt their products need to help outside counsel explain the benefits to clients.

Cyber And Data Security For The GC: How To Stay Out Of Headlines And Crosshairs
I couldn’t attend this panel because I had to catch my flight.

 

Highlights from the NorCal IG Retreat 2017

The 2017 NorCal Information Governance Retreat was norcal2017_lodgeheld by Ing3nious at the Quail Lodge & Golf Club in Carmel Valley, California.  After round table discussions, the retreat featured two simultaneous sessions throughout the day. My notes below provide some highlights from the sessions I was able to attend.  I’ve posted additional photos here.

The intro to the round table discussions included some comments on the evolution of the Internet, the importance of searching for obscenities to find critical documents or to identify data that has been scrubbed (it is implausible that there are no emails containing obscenities for a failing project), the difficulty of searching for “IT” (meaning information technology rather than the pronoun), and the inability of many tools to search for emojis.norcal2017_keynote

TAR: What Have We Learned?
I moderated this panel, so I didn’t take notes.

How Well Can Your Organization Protect Against Encrypted Traffic Threats?
I couldn’t attend this

IG Analytics And Infonomics: The Future Is Now
I couldn’t attend this

Breaches Happen. Going On The Cyber Offense With Deception
Breach stories that were mentioned included Equifax, Target, an employee that built their own (insecure) tunnel to get data out to their home, and an employee that carried data out on a microSD card.  In the RSA / Lockheed Martin breach, a Lockheed contractor was fooled by a phishing email, illustrating how hard it is to keep attackers out.  Email is a very common source of breaches.  A big mistake is not knowing that you’ve been breached.  People put honeypots outside the firewall to detect attacks. It’s better to use deception technology, which puts decoys inside the firewall.

Social Media And Website Information Governance
There has been some regulation of social media, especially for certain industries.  The SEC in 2012 required financial institutions to archive it.  The FTC has been enforcing paid endorsement disclosure guidelines (e.g., Kim Kardashian’s endorsement of a morning sickness drug).  Collecting evidence from social media is tricky.  A screenshot could be photoshopped, so how to prove it is legitimate?  Should collect a screenshot, source code, meta data, and a digital signature with time stamp.  Corporate policy on social media use will depend on the kind of company and the industry it is in.  There should also be a policy on monitoring employee’s social media use.  Companies using an internal social media system are asking for problems.  How will they police/discipline improper usage?  If an employee posts “Why haven’t I seen John lately?” and another replies that John has cancer, you have a problem.  Does a company social media system really improve productivity?  Can you find out who posted something anonymously on public social media?  If they posted from Starbucks or a library, probably not (finding the IP address won’t reveal the person’s identity).  This strategy worked for a bad review of a doctor that was thought to be from another doctor: 1) file in Federal court and get a court order to get the user’s IP address from the social media website, 2) go back to the judge and get a court order to get the ISP to give the identity of the person using that IP address at that time, 3) there is a motion to quash, which confirms that the right person was found (otherwise wouldn’t bother to fight it).

Bridging The Gap Between Inside And Outside Counsel: Next Generation Strategies For Collaborating On Complex Litigation Matters
I couldn’t attend thisnorcal2017_lunch

Preventing Inadvertent Disclosure In A Multi-Language World
Start by identifying teams and process.  Be aware of cultural differences.  Be aware of technological issues — there are 2 or 3 alternatives to MS Word that you might encounter for documents in Korean.  Be aware of laws against removing certain documents from the country.  There was disagreement among panel members about whether review quality of foreign documents was better in the U.S. due to reviewers better understanding U.S. law.  Viewing a document in the U.S. that is stored on a server in the E.U. is not a valid work-around for restrictions on exporting the documents.  Review in the U.S. is much cheaper than reviewing overseas (about 1/5 to 1/10 of the cost).  Violation of GDPR puts 4% of revenue at risk, but a U.S. judge may not care.  Take only what you need out of the country.  Many tools work best when they are analyzing documents in a single language, so use language identification and separate documents by language before analysis.  TAR may not work as well for non-English documents, but it does work.

What’s Your Trust Point?
I couldn’t attend this

Legal Tech And AI – Inventing The Future
Humans are better than computers at handling low-probability outlier events, because there is a lack of training data to teach machines to handle such things.  It is important for the technology to be easy for the user to interact with.  Legal clients are very cost averse, so a free trial of new tech is attractive.

The Cloud, New Technologies And Other Developments In Trade Secret Theft
I couldn’t attend this

Are You Prepared For The Impact Of Changing EU Data Privacy On U.S. Litigation?
I couldn’t attend this

IG Policy Pain Points In E-Discovery
Deletion of data that is not on hold 60 days after an employee norcal2017_mountainsleaves the company may not get everything since other custodians may have copies.  You may find that employees have archived their emails on a local hard drive.  Be clear about data ownership — wiping the phone of an employee that left the company may hit their personal data.  The general counsel is often not involved in decisions like BYOD (treated as an IT decision), but they should be.  Realize that having more data about employee behavior (e.g., GPS tracking) makes the company more responsible.  You rarely need the employee’s phone since there is little data cached there (data is on mail servers, etc.).  You should do info governance compliance testing to ensure that employees are following the procedures.  Policies must be realistic — there won’t be perfect separation of work and personal activity.  Flouted rules may be worse than no rules.  Keep personal data separate (personal folder, personal email address, use phone for accessing Facebook).  When doing an annual cleanup, what about the data from the employee who left the company?  A study showed that 85% of stored data is rot.  Have a checklist that you follow when an employee leaves — don’t wipe the computer without copying stuff you may need.

Highlights from the Northeast IG Retreat 2017

The 2017 Northeast Information Governance Retreat was held at the Salamander northeast2017_buildingResort & Spa in Middleburg, Virginia.  After round table discussions, the retreat featured two simultaneous sessions throughout the day. My notes below provide some highlights from the sessions I was able to attend.

Enhancing eDiscovery With Next Generation Litigation Management Software
I couldn’t attend this

Legal Tech and AI – Inventing The Futurenortheast2017_keynote
Machines are currently only good a routine tasks.  Interactions with machines should allow humans and machines to do what they do best.  Some areas where AI can aid lawyers: determining how long litigation will take, suggesting cases you should reference, telling how often the opposition has won in the past, determining appropriate prices for fixed fee arrangements, recruiting, or determining which industry on which to focus.  AI promises to help with managing data (e.g., targeted deletion), not just e-discovery.  Facial recognition may replace plane tickets someday.

Zen & The Art Of Multi-Language Discovery: Risks, Review & Translation
I couldn’t attend this

NexLP Demo
The NexLP tool emphasizes feature extraction and use of domain knowledge from external sources to figure out the story behind the data.  It can generate alerts based on changes in employee behavior over time.  Company should have a policy allowing the scanning of emails to detect bad behavior.  It was claimed that using AI on emails is better for privacy than having a human review random emails since it keeps human eyes away from emails that are not relevant.northeast2017_lunch

TAR: What Have We Learned?
I moderated this panel, so I didn’t take notes.

Are Managed Services Manageable?
I couldn’t attend this

Cyber And Data Security For The GC: How To Stay Out Of Headlines And Crosshairs
I couldn’t attend this

The Office Is Out: Preservation And Collection In The Merry Old LandOf Office 365
Enterprise 5 (E5) has advanced analytics from Equivio.  E3 and E1 can do legal hold but don’t have advanced analytics.  There are options available that are not on the website, and there are different builds — people are not all using the same thing.  Search functionality works on limited file types (e.g., Microsoft products).  Email attachments are OK if they are from Microsoft products.  It will not OCR PDFs that lack embedded text.  What about emails attached to emails?  Previously, it only went one layer deep on attachments.  Latest versions say they are “relaxing” that, but it is unclear what that means (how deep?).  User controls sync — are we really searching everything?  Make sure you involve IT, privacy, info governance, etc. if considering transition to 365.  Be aware of data that is already on hold if you migrate to 365.  Start by migrating a small group of people that are not often subject to litigation.  Test each data type after conversion.

How To Make Sense Of Information Governance Rules For Contractors When The Government Itself Can’t?northeast2017_garden
I couldn’t attend this

Judges, The Law And Guidance: Does ‘Reasonableness’ Provide Clarity?
This was primarily about the impact of the new Federal rules of civil procedure.  Clients are finally giving up on putting everything on hold.  Tie document retention to business needs — shouldn’t have to worry about sanctions.  Document everything (e.g., why you chose specific custodians to hold).  Accidentally missing one custodian out of a hundred is now OK.  Some judges acknowledge the new rules but then ignore them.  Boilerplate objections to discovery requests needs to stop — keep notes on why you made each objection.

Beyond The Firewall: Cybersecurity & The Human Factor
I couldn’t attend this

The Theory of Relativity: Is There A Black Hole In Electronic Discovery?northeast2017_social
The good about Relativity: everyone knows it, it has plug-ins, and moving from document to document is fast compared to previous tools.  The bad: TAR 1.0 (federal judiciary prefers CAL).  An audience member expressed concern that as Relativity gets close to having a monopoly we should expect high prices and a lack of innovation.  Relativity One puts kCura in competition with service providers.

The day ended with a wine social.

Highlights from the South Central IG Retreat 2017

The 2017 South Central Information Governance Retreat was the first retreat in the Ing3nious series held in Texas at the La Cantera Resort & Spa.  The retreat featured two simultaneous sessions throughout the day.  My notes below provide some highlights from the sessions I was able to attend.

The day started with roundtable discussions that were kicked off by a speaker who talked about the early days of the Internet.  He made the point that new lawyers may know less about how computers actually work even though they were born in an era when they are more pervasive.  He mentioned that one of the first keyword searches he performs when he receives a production is for “f*ck.”  If a company was having problems with a product and there isn’t a single email using that word, something was surely withheld from the production.  He made the point that expert systems that are intended to replace lawyers must be based on how the experts (lawyers) actually think.  How do you identify the 50 documents that will actually be used in trial?

Borrowing Agile Development Concepts To Jump-Start Your Information Governance Program
I couldn’t attend this

Your Duty To Preserve: Avoiding Traps In Troubled Times
When storing data in the cloud, what is actually retained?  How can you get the data out?  Google Vault only indexes newly added emails, not old ones.  The company may not have the right to access employee data in the cloud.  One panelist commented that collection is preferred to preservation in place.

Enhancing eDiscovery With Next Generation Litigation Management Software
I couldn’t attend this one.

Leveraging The Cloud & Technology To Accelerate Your eDiscovery Process
Cloud computing seems to have reached an inflection point.  A company cannot put the resources into security and data protection that Amazon can.  The ability to scale up/down is good for litigation that comes and goes.  Employees can jump into cloud services without the preparation that was required for doing things on site.  Getting data out can be hard.  Office 365 download speed can be a problem (2-3 GB/hr) — reduce data as much as possible.

Strategies For Effectively Managing Your eDiscovery Spend
I couldn’t attend this one.

TAR: What Have We Learned?
I moderated this panel, so I didn’t take notes.

Achieving GDPR Compliance For Unstructured Content
I couldn’t attend this one.

Zen & The Art Of Multi-Language Discovery: Risks, Review & Translation
The translation company should be brought in when the team is formed (it often isn’t done until later).  Help may be needed from translator / localization expert to come up with search terms.  For example, there are 20 ways to say “CEO” in Korean.  Translation must be done by an expert to be certified.  When using TAR, do review in the native language and translate the result before presenting to the legal team.  Translation is much slower than review.  Machine translation has improved over the last 2 years, but it’s not good enough to rely on for anything important.  A translator leaked Toyota’s data to the press — keep the risk in mind and make sure you are informed about the environment where the work is being done (screenshots should be prohibited).

Beyond The Firewall: Cybersecurity & The Human Factor
I couldn’t attend this one.

Ethical Obligations Relating To Metadata
Nineteen states have enacted ethical rules on meta-data.  Sometimes, metadata is enough to tell the whole story.  John McAfee was found and arrested because of GPS coordinates embedded in a photo of him.  Metadata showed that a terminated whistleblower’s employee review was written 3 months after termination.  Forensic collection is important to not spoil the metadata.  Ethical obligations of attorneys are broader than attorney-client privilege.  Should attorneys be encrypting email?  Make the client aware of metadata and how it can be viewed.  The attorney must understand metadata and scrub it as necessary (e.g, change tracking in Word).  In e-discovery metadata is treated like other ESI.  Think about metadata when creating a protective order.  What are the ethical restrictions of viewing and mining metadata received through discovery?  Whether you need to disclose receipt of confidential or privileged metadata depends on the jurisdiction.

Legal Risks Associated With Failing To Have A Cyber Incident Response Plan
I couldn’t attend this one.

“Defensible Deletion” Is The Wrong Frame
Defensible deletion started with an IBM survey that found that on average 69% of corporate data has no value, 6% is subject to litigation hold, and 25% is useful.  IBM started offering to remove 45% of data without doing any harm to a company (otherwise, you don’t have to pay).  Purging requires effort, so make deletion the default.  Statistical sampling can be used to confirm that retention rules won’t cause harm.  After a company said that requested data wasn’t available because it had been deleted in accordance with the retention policy, an employee who was being deposed said he had copied everything to 35 CDs — it can be hard to ensure that everything is gone even if you have the right policy.

 

Highlights from the Northeast eDiscovery & IG Retreat 2016

The 2016 Northeast eDiscovery & IG Retreat was held at the Ocean Edge Resort & Golf Club.  It was the third annual Ing3nious retreat held in Cape Cod.  The retreat featured two 2016northeast_mansionsimultaneous sessions throughout the day in a beautiful location.  My notes below provide some highlights from the sessions I was able to attend.  You can find additional photos here.

Peer-to-Peer Roundtables
The retreat started with peer-to-peer round tables where each table was tasked with answering the question: Why does e-discovery suck (gripes, pet peeves, issues, etc.) and how can it be improved?  Responses included:

  • How to drive innovation?  New technologies need to be intuitive and simple to get client adoption.
  • Why are e-discovery tools only for e-discovery?  Should be using predictive coding for records management.
  • Need alignment between legal and IT.  Need ongoing collaboration.
  • Handling costs.  Cost models and comparing service providers are complicated.
  • Info governance plans for defensible destruction.
  • Failure to plan and strategize e-discovery.
  • Communication and strategy.  It is important to get the right people together.
  • Why not more cooperation at meet-and-confer?  Attorneys that are not comfortable with technology are reluctant to talk about it.  Asymmetric knowledge about e-discovery causes problems–people that don’t know what they are doing ask for crazy things.

Catching Up on the Implementation of the Amended Federal Rules
I couldn’t attend this one.

Predictive Coding and Other Document Review Technologies–Where Are We Now?
It is important to validate the process as you go along, for any technology.  It is important to understand the client’s documents.  Pandora is more like TAR 2.0 than TAR 1.0, because it starts giving recommendations based on your feedback right away.  The 2012 Rand Study found this e-discovery cost breakdown:73% document review, 8% collection, and 19% processing.  A question from the audience about pre-culling with keyword search before applying predictive coding spurred some debate.  Although it wasn’t mentioned during the panel, I’ll point out William Webber’s analysis of the Biomet case, which shows pre-culling discarded roughly 40% of the relevant documents before predictive coding was applied.  There are many different ways of charging for predictive coding: amount of data, number of users, hose (total data flowing through) or bucket (max amount of data allowed at one time).  Another barrier to use of predictive coding is lack of senior attorney time (e.g., to review documents for training).  Factors that will aid in overcoming barriers: improving technologies, Sherpas to guide lawyers through the process, court rulings, influence from general counsel.  Need to admit that predictive coding doesn’t work for everything, e.g., calendar entries.  New technologies include anonymization tools and technology to reduce the size of collections.  Existing technologies that are useful: entity extraction, email threading, facial recognition, and audio to text.  Predictive coding is used in maybe less than 1% of cases, but email threading is used in 99%.

It’s All Greek To Me: Multi-Language Discovery Best Practices 2016northeast_intro
Native speakers are important.  An understanding of relevant industry terminology is important, too.  The ALTA fluency test is poor–the test is written in English and then translated to other languages, so it’s not great for testing ability to comprehend text that originated in another language.  Hot documents may be translated for presentation.  This is done with a secure platform that prohibits the translator from downloading the documents.  Privacy laws make it best to review in-country if possible.  There are only 5 really good legal translation companies–check with large firms to see who they use.  Throughput can be an issue.  Most can do 20,000 words in 3 days.  What if you need to do 200,000 in 3 days?  Companies do share translators, but there’s no reason for good translators to work for low-tier companies–good translators are in high demand.  QC foreign review to identify bad reviewers (need proficient managers).  May need to use machine translation (MT) if there are millions of documents.  QC the MT result and make sure it is actually useful–in 85% of cases it is not good enough.  For CJK (Chinese, Japanese, Korean), MT is terrible.  The translation industry is $40 billion.  Google invested a lot in MT but it didn’t help much.  One technology that is useful is translation memory, where repeated chunks of text are translated just once.  People performing review in Japanese must understand the subtlety of the American legal system.

Top Trends in Discovery for 2016
I couldn’t attend this one

Measure Twice, Discover Once 2016northeast_beach
Why measure in e-discovery?  So you can explain what happened and why, for defensibility.  Also important for cost management.  The board of directors may want reports.  When asked for more custodians you can show the cost and expected number of relevant documents that will be added by analyzing the number of keyword search hits.  Everything gets an ID number for tracking and analysis (USB drives, batches of documents, etc.).  Types of metrics ordered from most helpful to most harmful: useful, no metric, not useful, and misleading.  A simple metric used often in document review is documents per hour per reviewer.  What about document complexity, content complexity, number and type of issue codes, review complexity, risk tolerance instructions, number of “defect opportunities,” and number coded correctly?  Many 6-sigma ideas from manufacturing are not applicable due to the subjectivity that is present in document review.

Information Governance and Data Privacy: A World of Risk
I couldn’t attend this one

The Importance of a Litigation Hold Policy
I couldn’t attend this one

Alone Together: Where Have All The Model TAR Protocols Gone? 2016northeast_roof
If you are disclosing details, there are two types: inputs (search terms used to train, shared review of training docs) and outputs (target recall or disclosure of recall).  Don’t agree to a specific level of recall before looking at the data–if prevalence is low it may be hard.  Plaintiff might argue for TAR as a way to overcome cost objections from the defendant.  There is concern about lack of sophistication from judges–there is “stunning” variation in expertise among federal judges.  An attorney involved with the Rio Tinto case recommends against agreeing on seed sets because it is painful and focuses on the wrong thing.  Sometimes there isn’t time to put eyes on all documents that will be produced.  Does the TAR protocol need to address dupes, near-dupes, email threading, etc.?

Information Governance: Who Owns the Information, the Risk and the Responsibility?
I couldn’t attend this one

Bringing eDiscovery In-House — Savings and Advantages
I was on this panel so I didn’t take notes

Highlights from the Southeast eDiscovery & IG Retreat 2016

This retreat was the first one held by Ing3nious in the Southeast.  It was at the Chateau Elan2016_SE_retreat_outside Winery & Resort in Brasel­ton, Geor­gia.  Like all of the e-discovery retreats organized by Chris LaCour, it featured informative panels in a beautiful setting.  My notes below offer a few highlights from the sessions I attended.  There were often two sessions occurring simultaneously, so I couldn’t attend everything.

Peer-to-Peer Roundtables
My table discussed challenges people were facing.  These included NSF files (Lotus Notes), weird native file formats, and 40-year-old documents that had to be scanned and OCRed. Companies having a “retain everything” culture are problematic (e.g., 25,000 backup tapes).  One company had a policy of giving each employee a DVD containing all of their emails when they left the company.  When they got sued they had to hunt down those DVDs to retrieve emails they no longer had.  If a problem (information governance) is too big, nothing will be done at all.  In Canada there are virtually never sanctions, so there is always a fight about handing anything over.2016_SE_retreat_roundtables

Proactive Steps to Cut E-Discovery Costs
I couldn’t attend this one.

The Intersection of Legal and Technical Issues in Litigation Readiness Planning
It is important to establish who you should go to.  Many companies don’t have a plan (figure it out as you go), but it is a growing trend to have one due to data security and litigation risk.  Having an IT / legal liaison is becoming more common.  For litigation readiness, have providers selected in advance.  To get people on board with IG, emphasize cost (dollars) vs. benefit (risk).  Should have an IG policy about mobile devices, but they are still challenging.  Worry about data disposition by a third party provider when the case is over.  Educate people about company policies.2016_SE_retreat_panel

Examining Your Tools & Leveraging Them for Proactive Information Governance Strategy
I couldn’t attend this one.

Got Data? Analytics to the Rescue
Only 56% of in-house counsel use analytics, but 93% think it would be useful.  Use foreign language identification at start to know what you are dealing with.  Be careful about coded language (e.g., language about fantasy sports that really means something else) — don’t cull it!  Graph who is speaking to whom.  Who are emails being forwarded to?  Use clustering to find themes.  Use assisted redaction of PII, but humans should validate the result (this approach gives a 33% reduction in time).  Re-OCR after redaction to make sure it is really gone.  Alex Ponce de Leon from Google said they apply predictive coding immediately as early-case assessment and know the situation and critical documents before hiring outside counsel (many corporate attorneys in the audience turned green with envy).  Predictive coding is also useful when you are the requesting party.  Use email threading to identify related emails.  The requesting party may agree to receive just the last email in the thread.  Use analytics and sampling to show the judge the burden of adding custodians and the number of relevant documents expected — this is much better than just throwing around cost numbers.  Use analytics for QC and reviewer analysis.  Is someone reviewing too slow/fast (keep in mind that document type matters, e.g. spreadsheets) or marking too many docs as privileged?

The Power of Analytics: Strategies for Investigations and Beyond
Focus on the story (fact development), not just producing documents.  Context is very important for analyzing instant messages.  Keywords often don’t work for IMs due to misspellings.  Analytics can show patterns and help detect coded language.  Communicate about how emails are being handled — are you producing threads or everything, and are you logging threads or everything (producing and logging may be different).  Regarding transparency, are the seed set and workflow work product?  When working with the DOJ, showed them results for different bands of predictive coding results and they were satisfied with that.  Nobody likes the idea of doing a clawback agreement and skipping privilege review.

Freedom of Speech Isn’t Free…of Consequences
The 1st Amendment prohibits Congress from passing laws restricting speech, but that doesn’t keep companies from putting restrictions on employees.  With social media, cameras everywhere, and the ability of things to go viral (the grape lady was mentioned), companies are concerned about how their reputations could be damaged by employees’ actions, even outside the workplace.  A doctor and a Taco Bell executive were fired due to videos of them attacking Uber drivers.  Employers creating policies curbing employee behavior must be careful about Sec. 8 of the National Labor Relations Act, which prohibits employers from interfering with employees’ Sec. 7 rights to self-organize or join/form a labor organization.  Taken broadly, employers cannot prohibit employees from complaining about working conditions since that could be seen as a step toward organizing.  Employers have to be careful about social media policies or prohibiting employees from talking to the media because of this.  Even a statement in the employee handbook saying employees should be respectful could be problematic because requiring them to be respectful toward their boss could be a violation.  The BYOD policy should not prohibit accessing Facebook (even during work) because Facebook could be used to organize.  On the other hand, employers could face charges of negligent retention/hiring if they don’t police social media.

Generating a Competitive Advantage Through Information Governance: Lessons from the Field
I couldn’t attend this one.

Destruction Zone
The government is getting more sophisticated in its investigations — it is important to give 2016_SE_retreat_insidethem good productions and avoid losing important data.  Check to see if there is a legal hold before discarding old computer systems and when employees leave the company.  It is important to know who the experts are in the company and ensure communication across functions.  Information governance is about maximizing value of information while minimizing risks.  The government is starting to ask for text messages.  Things you might have to preserve in the future include text messages, social media, videos, and virtual reality.  It’s important to note the difference between preserving the text messages by conversation and by custodian (where things would have to be stitched back together to make any sense of the conversation).  Many companies don’t turn on recording of IMs, viewing them as conversational.

Managing E-Discovery as a Small Firm or Solo Practitioner
I couldn’t attend this one.

Overcoming the Objections to Utilizing TAR
I was on this panel, so I didn’t take notes.

Max Schrems, Edward Snowden and the Apple iPhone: Cross-Border Discovery and Information Management Times Are A-Changing
I couldn’t attend this one.