The 2018 South Central eDiscovery and Information Governance Retreat was held at Lakeway Resort and Spa, outside of Austin. It was a full day of talks with a parallel set of talks on Cybersecurity, Privacy, and Data Protection in the adjacent room. Attendees could attend talks from either track. Below are my notes (certainly not exhaustive) from the eDiscovery and IG sessions. My full set of photos is available here.
Blowing The Whistle
eDiscovery can be used as a weapon to drive up costs for an adversary. Make requests broad and make the other side reveal what they actually have. Ask for “all communications” rather than “all Office 365 emails” or you may miss something (for example, they may use Slack). The collection may be 1% responsive. How can it be culled defensibly? Ask for broad search terms, get hit rates, and then adjust. The hit rates don’t tell how many documents were actually relevant, so use sampling. When searching for patents, search for “123 patent” instead of just “123” to avoid false positives (patent references often use just the last 3 digits). This rarely happens, but you might get the producing party to disclose top matches for the queries and examine them to give feedback on desired adjustments. You should have a standard specification for the production format you want, and you should get it to the producing party as soon as possible, or you might get 20,000 emails produced in one large PDF that you’ll have to waste time dissecting, and meta data may be lost. If keyword search is used during collection, be aware that Office 365 currently doesn’t OCR non-searchable content, so it will be missed. Demand that the producing party OCR before applying any search terms. In one production there were a lot of “gibberish” emails returned because the search engine was matching “ING” to all words ending in “ing” rather than requiring the full word to match. If ediscovery disputes make it to the judge, it’s usually not a good thing since the judge may not be very technical.
Digging Into TAR
I moderated this panel, so I didn’t take notes. We challenged the audience to create a keyword search that would work better than TAR. Results are posted here.
Beyond eDiscovery – Creating Context By Connecting Disparate Data
Beyond the custodian, who else had access to this file? Who should have access, and who shouldn’t? Forensics can determine who accessed or printed a confidential file. The Windows registry tracks how users access files. When you print, an image is stored. Figure out what else you can do with the tech you have. For example, use Sharepoint workflows to help with ediscovery. Predictive coding can be used with structured data. Favorite quote: “Anyone who says they can solve all of my problems with one tool is a big fat liar.”
Improving Review Efficiency By Maximizing The Use Of Clustering Technology
Clustering can lead to more consistent review by ensuring the same person reviews similar documents and reviews them together. The requesting party can use clustering to get an overview of what they’ve received. Image clustering identifies glyphs to determine document similarity, so it can detect things like a Nike logo, or it can be sensitive to the location on the page where the text occurs. It is important to get the noise (e.g., email footers) out of the data before clustering. Text messages and spreadsheets may cause problems. Clustering can be used for ECA or keyword generation, where it is not making final determinations for a document. It can reveal abbreviations scientists are using for technical terms. It can also be used to identify clusters that can be excluded from review (not relevant). It can be used to prioritize review, with more promising clusters reviewed first. Should you tell the other side you are using clustering to come up with keywords? No, you are just inviting controversy.
Technology Solution Update From Corporate, Law Firm And Service Provider Perspective
Migration to Office 365 and other cloud offerings can cause problems. Data can be dumped into the cloud without tracking where it went. Figuring out how to collect from the cloud can be difficult. Microsoft is always changing Office 365, making it difficult to stay on top of the changes. Favorite quote: “I’m always running to keep up. I should be skinnier, but I’m not.” Office 365 is supposed to have OCR soon. What if the cloud platform gets hacked? There can be throttling issues when collecting from One Drive by downloading everything (not using Microsoft’s tool). Rollout of cloud services should be slow to make sure everyone knows what should be put in the cloud and what shouldn’t, and to ensure that you keep track of where everything is. Be careful about emailing passwords since they may be recorded — use ephemeral communications instead of email for that. Personal devices cause problems because custodians don’t like having their devices collected. Policy is critical, but it is not a cure-all. Policy must be surrounded by communication and re-certification to ensure it is followed. Google mail is not a good solution for restricting data location since attachments are copied to the local disk when they are viewed.
Achieving GDPR Compliance For Unstructured Content
Some technology was built for GDPR while other tech was build for some other purpose like ediscovery and tweaked for GDPR, so be careful. For example. you don’t want to have to collect the data before determining whether it contains PII. The California privacy law taking effect in 2020 is similar to GDPR, so U.S. companies cannot ignore the issue. Backup tapes should be deleted after 90 days. They are for emergencies, not retention. Older backups often don’t work (e.g., referenced network addresses are no longer valid).
Escalating Cyber Risk From The IT Department To The Boardroom
One very effective way to change a company’s culture with respect to security is to break people up into white vs. black teams and hold war games where one team attacks and the other tries to come up with the best way to defend against it. You need to point out both the risk and how to fix it to get the board’s attention. Show the board a graph with the expected value lost in a breach on the vertical axis and cost to eliminate the risk on the horizontal axis — points lying above the 45 degree line are risks that should be eliminated (doing so saves money). On average, a server breach costs 28% of operating costs. Investors may eventually care if someone on the board has a security certification. It is OK to question directors, but don’t call out their b.s.. The Board cares most about what the CEO and CFO are saying. Ethical problems tend to happen when things are too siloed.