Tag Archives: analytics

Highlights from the Northeast eDiscovery & IG Retreat 2018

The 2018 Northeast eDiscovery and Information Governance Retreat was northeast_2018_building1held at the Salamander Resort & Spa in Middleburg, Virginia.  It was a full day of talks with a parallel set of talks on Cybersecurity, Privacy, and Data Protection in the adjacent room. Attendees could attend talks from either track. Below are my notes (certainly not exhaustive) from the eDiscovery and IG sessions. My full set of photos is available here.

Stratagies For Data Minimization Of Legacy Data
Backup and archiving should be viewed as separate functions.  When it comes to spoliation (FRCP Rule 37), reasonableness of the company’s data retention plan is key.  Over preservation is expensive.  There are not many cases on Rule 37 relating to backup tapes.  People are changing their behavior due to the changes in the FRCP, especially in heavily regulated industries such as healthcare and financial services.  Studies find that typically 70% of data has no business value and is not subject to legal hold or retention requirements for compliance.  When using machine learning, you can focus on finding what to keep or what to get rid of.  It is often best to start with unsupervised machine learning.  Be mindful of destructive malware.  To mitigate security risks, it is important to know where your data (including backup tapes) is.  If a backup tape goes missing, do you need to notify customers (privacy)?  To get started, create a matrix showing what you need to keep, keeping in mind legal holds and privacy (GDPR).  Old backup tapes are subject to GDPR.  Does the right to be forgotten apply to backup tapes?  There is currently no answer.  It would be hard to selectively delete data from the tapes, so maybe have a process that deletes during the restore.  There can be conflicts between U.S. ediscovery and GDPR, so you must decide which is the bigger risk.

Preparing A Coordinated Response To Government Inquiries And Investigations
You might find out that you are being investigated by the FBI or other investigator approaching one of your employees — get an attorney. northeast_2018_horses Reach out to the investigator, take it seriously, and ask for a timeline.  You may receive a broad subpoena because the investigator whats to ensure they get everything important, but you can often get them to narrow it.  Be sure to retain outside counsel immediately.  In one case a CEO negotiated search terms with a prosecutor without discussing custodians, so they had to search all employees.  The prosecutor can’t handle a huge volume of data, so it should be possible to negotiate a reasonable production.  In addition to satisfying the subpoena, you need to simultaneously investigate whether there is an ongoing problem that needs to be addressed.  Is your IT group able to forensically preserve and produce the documents?  You don’t want to mess up a production in front of a regulator, so get expertise in place early.  Data privacy can be an issue.  When dealing with operations in Europe, it is helpful to get employee consent in advance — nobody wants to consent during an investigation.  Beware of data residing in disparate systems in different languages.  Google translate is not very good, e.g. you have to be careful about slang.    Employees may try to cover their tracks.  In one case an employee was using “chocolate” as an encoded way to refer to a payment.  In another case an employee took a hammer to a desktop computer, though the hard drive was still recoverable.  Look for gaps in email or anomalous email volume.  Note that employees may use WhatsApp or Signal to communicate.  The DOJ expects you to be systematic (e.g., use analytics) about compliance.  See what data is available, even if it wasn’t subpoenaed, since it may help your side (email usually doesn’t).

Digging Into TAR
I moderated this panel, so I didn’t take notes. We challenged the audience to create a keyword search that would work better than technology-assisted review. Results are posted here.

Implementing Information Governance – Nightmare On Corporate America Street?
You need to weigh the value of the data against the risk of keeping it.  What is your business model?  That will dictate information governance. northeast_2018_reception Domino’s was described as a technology company that happens to distribute hot bread.  Unstructured data has the biggest footprint and the most rapid growth.  Did you follow your policies?  Your insurance company may be very picky about that when looking for a reason not to pay out.  They may pay out and then sue you over the loss.  Fear is a good motivator.  Threats from the OCC or FDIC over internal data management can motivate change.  You can quantify risk because the cost of having a data breach is now known. Info governance is utilization awareness, not just data management.  Know where your data is.  What about the employee that creates an unauthorized AWS account?  This is the “shadow ecosystem” or “shadow IT.”  One company discovered they had 50,000 collaborative SharePoint sites they didn’t know about.  For info governance standards see The Sedona Conference and EDRM.

Technology Solution Update From Corporate, Law Firm And Service Provider Perspective
Artificial intelligence (AI) should not merely analyze; it should present a result in a way that is actionable.  It might tell you how much two people talk, their sentiment, and whether there are any spikes in communication volume.  AI can be used by law firms for budgeting by analyzing prior matters.  There are concerns about privacy with AI.  Many clients are moving to the cloud.  Many are using private clouds for collaboration, not necessarily for utilizing large computing power.  Office 365 is of interest to many companies.  There was extensive discussion about the ediscovery analytics capabilities being added from the Equivio acquisition, and a demo by Marcel Katz of Microsoft.  The predictive coding (TAR) capability uses simple active learning (SAL) rather than continuous active learning (CAL).  It is 20 times slower in the cloud than running Equivio on premises.  There is currently no review tool in Office 365, so you have to export the predictions out and do the review elsewhere.  Mobile devices create additional challenges for ediscovery.  The time when a text message is sent may not match the time when it is received if the receiving device is off when the message is sent.  Technology needs to be able to handle emojis.  There are many different apps with many different data storage formats.

The ‘Team Of Teams’ Approach To Enterprise Security And Threat Management
Fast response is critical when you are attacked.  Response must be automated because a human response is not fast enough.  It can take 200 days to detect an adversary on the network, so assume someone is already inside.  What are the critical assets, and what threats should you look for?  What value does the data have to the attacker?  What is the impact on the business?  What is the impact on the people?  Know what is normal for your systems.  Is a large data transfer at 2:00am normal?  Simulate a phishing attack and see if your employees fall for it.  In one case a CEO was known to be in China for a deal, so someone impersonating the CEO emailed the CFO to send $50 million for the deal.  The money was never recovered.  Have processes in place, like requiring a signature for amounts greater than $10,000.  If a company is doing a lot of acquisitions, it can be hard to know what is on their network.  How should small companies get started?  Change passwords, hire an external auditor, and make use of open source tools.

From Data To GRC Insight
Governance, risk management, and compliance (GRC) needs tonortheast_2018_building2 become centralized and standardized.  Practicing incident response as a team results in better responses when real incidents happen.  Growing data means growing risk.  Beware of storage of social security numbers and credit card numbers.  Use encryption and limit access based on role.  Detect emailing of spreadsheets full of data.  Know what the cost of HIPAA violations is and assign the risk of non-compliance to an individual.  Learn about the NIST Cybersecurity Framework.  Avoid fines and reputational risk, and improve the organization.  Transfer the risk by having data hosted by a company that provides security.  Cloud and mobile can have big security issues.  The company can’t see traffic on mobile devices to monitor for phishing.

 

Highlights from Ipro Innovations 2018

The 17th annual Ipro Innovations conference was held at the Talking Stick Resort.ipro2018_outside  It was well-organized with two and a half days of informative talks and fun activities.  Early in the day everyone met in a large hall for the talks, whereas there were seven simultaneous breakout sessions later in the day.  There were many sessions in computer labs where attendees could gain first-hand experience with the Ipro software.  I could only attend the tail end of the conference because I was at the NorCal eDiscovery & IG Retreat earlier in the week.  I’ve included my notes below.  You can find my full set of photos here.ipro2018_computers

The keynote on the final day was delivered by Afterburner, a consulting firm promoting a “Flawless Execution” methodology based on military strategy.  Their six steps of mission planning are: 1) determine the mission objective, 2) identify the threats, 3) identify your available and required resources, 4) evaluate lessons learned, 5) develop a course of action, and 6) plan for contingencies.   The audience participated in exercises to illustrate how easily attention can be channelized, meaning that you focus on one thing at the expense of everything else.  Channelized attention was the cause of a commercial airliner crash.  To avoid being distracted by minor things (deadlines, cost, etc.), keep track of what it is most important to pay attention to (customers).ipro2018_tiana

Tiana Van Dyk described her firm’s 1.5 year transition from Summation to Ipro’s Eclipse, including moving 325 cases over.  Substantial time and preparation are needed to avoid problems and overcome resistance to change.  Staff should not be allowed to access the new system without undergoing training.  Case studies are useful to convince people to use new analytics tools.  Start small with new analytics tools (email threading and near-dupe), then use clustering to remove some junk (football and LinkedIn emails),ipro2018_fun and finally TAR.  Use sampling to demonstrate that things are working.  Learn everything you can about the technology you have.  Missteps can set you back terribly, causing bad rumors and fear.  Continuous communication is important to minimize panic when there is a problem.

There were also talks on new functionality in the Ipro software.  I gave a short presentation on howipro2018_pool Ipro’s transition to the Clustify engine would improve TAR.  There were several opportunities for Ipro customers to give feedback about the functionality they would like to see.

Highlights from the East Coast eDiscovery & IG Retreat 2015

This was the second year that Ing3nious has held a retreat on the east coast, with other events organized by Chris LaCour held in California going back five years. east_coast_2015_beach The event was held at the Wequassett Resort in Cape Cod.  As always, the event was well-organized and the location was beautiful.  Luckily, the weather was fantastic.  My notes below only capture a small amount of the information presented. There were often two simultaneous sessions, so I couldn’t attend everything.

Keynote: Away with Words: The Myths and Misnomers of Conventional Search Strategies

Thomas Barnett started the keynote by asking the audience to suggest keyword searches to find items discussing the meaning of existence.  He then said that he had in mind “to be, or not to be” and pointed out that it contains only stop words.  He then described unsupervised (clustering) and supervised (predictive coding) machine learning.  He talked about entity extraction, meaning the identification of dates and names of people and companies in a document.  He talked about sentiment analysis and how a person might change their language when they are doing something wrong.  He also pointed out that a product may have different names in different countries, which can make it easy to miss things with keyword search.

Advancing Discovery: What if Lawyers are the Problem?

I couldn’t attend this one.

Turbulent Sea in the Safe Harbor.  Is There a Lifeboat for Transfers of EU Data to the US?

Max Schrems complained to the Irish Data Protections Commissioner 22 times about the Safe Harbor Privacy Principles failing to protect the privacy of E.U. citizens’ data when companies move the data to the U.S..  After Snowden released information on NSA data collection, Schrems complained a 23rd time.  Ultimately, a judge found the Safe Harbor to be invalid.east_coast_2015_seminar

Companies must certify to the Department of Commerce that they will adhere to the Safe Harbor Privacy Principles.  Many e-discovery service providers were pressured to certify so they could bring data to the U.S. for discovery even though e-discovery usage of the data would involve very bad privacy violations.

Some argue that there is no other legal mechanism that could work for bringing data to the U.S. because the U.S. government can pick up everything, so no guarantees above privacy can be made.   The best option would be to get consent from the person, but it must be done in a very clear manner specifying what data and who will see it.  An employer asking an employee for consent would be seen as coercive.  It will be hard to get consent from someone if you are investigating them for criminal activity.

There is really no way to move data from Europe to the U.S. for litigation without violating the law.  Consent would be required not just from the custodian but from everyone in the emails.  Some countries (France, Germany, and Switzerland) have blocking statutes that make taking the data a criminal offense.

Ethics: eDiscovery, Social Media, and the Internet of Things

I couldn’t attend this one.

Understanding the Data Visualization Trend in Legal

I was on this panel, so I didn’t take notes.  I did mention Vischeck, which allows you to see what your graphics would look like to a color-blind person.

Information Governance – How Do You Eat an Elephant?

I couldn’t attend this one.

Email Laws, IG Policies and the “Smoking Gun”

There has been confusion over what should be considered a record.  In the past, emails that were considered to be records were printed and stored.  Now email should be considered to be a record by default.  30-day retention policies are hard to defend.  Keep deleted emails for 60 days and use analytics to identify emails that employees should not have deleted so they can be saved.  Use automated logging to show compliance.

Protecting Enterprise Data Across Partners, Providers and the Planet

I couldn’t attend this one.

Defeating Analysis Paralysis – Strategies and Success Stories for Implementing IG Policies and Using TAR / Data Analytics

Berkeley Research Group finds that most companies are still keeping everything.  The longer data is kept, the less value it has to the company and the more risk it poses (ediscovery cost and privacy issues if there is a breach).  Different departments within the company may want different retention rules.  Breaches cost the company in lawsuits and in reputation.  The E.U. requires breach notification within 24 hours.east_coast_2015_diningroom

Having employees tag documents gives low-quality tags (they aren’t lawyers), but retention based on those tags is good enough to satisfy the court.  Need employees to follow the retention policy, so keep it simple.  Some speculate that insurance providers may end up driving info governance by forcing their clients to do it.

The Coalition of Technology Resources for Lawyers found that 56% of legal departments are reporting that they use analytics.  Clustering can help with investigation and determining search terms.  Look at email domain names (e.g., nytimes.com) to cull.  Note that email journaling keeps everything.  Analytics technology has improved, so if you were disappointed in the past you might want to try it again.

How Automated Digital Discovery is Changing eDiscovery as We Know It

I couldn’t attend this one.

Creating Order Out of Chaos: Framing and Taming Data Discovery Challenges in Expedited Matters

This panel started by walking through a (hypothetical?) investigation of a head of operations who left and joined a competitor in violation of a non-compete agreement that was determined to be unenforceable.  Did he transfer company data to the competitor?

Look for evidence that USB devices were used on the company laptop.  Unfortunately, you can’t tell what was copied onto them.  Look for attempts to hide what was done, such as removal of USB insertion data from the current registry (but failing to remove from the registry snapshot).  Look at the WiFi connection history for connections to the competitor’s network.  It is very important to explain the situation to the forensics person and communicate with him/her frequently about what you each have found in order to develop a picture of what actually happened.

If you hire someone from a competitor and there is suspicion that they took data from their previous employer, ambush them and take all their devices before they have a chance to destroy anything.  This will show the judge that you were not complicit.

When investigating someone who quit on bad terms, look for deals with “special terms” or side letter deals — they may be a sign of fraud.  Be careful about any applicable European laws.  Europe says you can’t move the data to the U.S., but the SEC doesn’t care.  Can you use a review tool in the U.S. with the data in Europe?  Officially, no, but it is less bad than moving the data.  Everyone says you can’t produce the data from Europe, but everyone does.

Make sure your agreements are up to date and are written by the attorney that will litigate them.

Just Patch and Pray?

A study by Verizon found that 90% of breaches are caused by employees.  Info governance can reduce risk.  Keeping everything is risky due to e-discovery, risk of breach, and having to explain loss of old data to customers.east_coast_2015_lighthouse

Email problems include bad passwords, use of the same password on multiple websites so having one hacked can allow access to others, and getting inside the network (emailed malware).  2-factor authentication is recommended.  Don’t send an email to the SEC with BCC to the client or the client might hit reply-all and say something problematic — instead, email only the SEC and forward a copy to the client later.

Mobile technology can create discovery headaches, needs to be managed/updated/wiped remotely, and can easily be lost.  Encrypt, audit, and apply anti-malware.  BYOD should be limited to enterprise-ready devices.  Avoid insecure WiFi.  Control access to enterprise data.  Secure data in transit.  Ensure that devices get updated/upgraded.

Unaware or non-compliant employees need training.  When training to spot phishing emails, services can test the employees by sending phishing emails that report who clicked on them.

Vendors and third parties that handle enterprise data can be a problem.  Regulators require vendor oversight.  Limit access to necessary systems.  Segregate sensitive data.  Beware of payroll vendors and the possibility of identity theft from the data they hold.  Make sure cybersecurity insurance policy covers vendors.

Employees want data access from anywhere.  Encrypting email is hard — better to use collaborative workspaces.  Home networks should be protected.  Don’t use the neighbor’s Internet connection.

After having a breach, 39% of companies still don’t form a response plan.  There is no federal data breach notification law, but many states have such laws.  You may need to notify employees, customers, and the attorney general in some specific time frame.  Also notify your insurance company.

Mergers & Acquisitions: Strategy and Execution Concerns

I couldn’t attend this one.