DESI (Discovery of Electronically Stored Information) is a one-day workshop within ICAIL (International Conference on Artificial Intelligence and Law), which is held every other year. The conference was held in London last month. Rumor has it that the next ICAIL will be in North America, perhaps Montreal.
I’m not going to go into the DESI talks based on papers and slides that are posted
Between presentations based on submitted papers there was a lunch where people separated into four groups to discuss specific topics. The first group focused on e-discovery users. Visualizations were deemed “nice to look at” but not always useful — does the visualization help you to answer a question faster? Another group talked about how to improve e-discovery, including attorney aversion to algorithms and whether a substantial number of documents could be missed by CAL after the gain curve had plateaued. Another group discussed dreams about future technologies, like better case assessment and redacting video. The fourth group talked about GDPR and speculated that the UK would obey GDPR.
DESI ended with a panel discussion about future directions for e-discovery. It was suggested that a government or consumer group should evaluate TAR systems. Apparently, NIST doesn’t want to do it because it is too political. One person pointed out that consumers aren’t really demanding it. It’s not just a matter of optimizing recall and precision — process (quality control and workflow) matters, which makes comparisons hard. It was claimed that defense attorneys were motivated to lobby against the federal rules encouraging the use of TAR because they don’t want incriminating things to be found. People working in archiving are more enthusiastic about TAR.
Following DESI (and other workshops conducted in parallel on the first day), ICAIL had three more days of paper presentations followed by another day of workshops. You can find the schedule is here. I only attended the first day of non-DESI presentations. There are two papers from that day that I want to point out. The first is Effectiveness Results for Popular e-Discovery Algorithms by Yang, David Grossman, Frieder, and Yurchak. They compared performance of the CAL (relevance feedback) approach to TAR for several different classification algorithms, feature types, feature weightings,
