text analytics | Clustify Blog – eDiscovery, Document Clustering, Technology-Assisted Review (Predictive Coding), Information Retrieval, and Software Development

Text Analytics Forum is part of KMWorld. It was held on November 7-8 at the JW Marriott in D.C.. Attendees went to the large KMWorld keynotes in the morning and had two parallel text analytics tracks for the remainder of the day. There was a technical track and an applications track. Most of the slides are available here. My photos, including photos of some slides that caught my attention or were not available on the website, are available here. Since most slides are available online, I have only a few brief highlights below. Next year’s KMWorld will be November 5-7, 2019.

The Think Creatively & Make Better Decisions keynote contained various interesting facts about the things that distract us and make us unproductive. Distracted driving causes more deaths than drunk driving. Attention spans have dropped from 12 seconds to 8 seconds (goldfish have a 9-second attention span). Japan has texting lanes for walking. 71% of business meetings are unproductive, and 33% of employee time is spent in meetings. 281 billion emails were sent in 2018. Don’t leave ideas and creative thinking to the few. Mistakes shouldn’t be reprimanded. Break down silos between departments.

The Deep Text Look at Text Analytics keynote explained that text mining is only part of text analytics. Text mining treats words as things, whereas text analytics cares about meaning. Sentiment analysis is now learning to handle things like: “I would have loved your product except it gave me a headache.” It is hard for humans to pick good training documents for automatic categorization systems (what the e-discovery world calls predictive coding or technology-assisted review). Computer-generated taxonomies are incredibly bad. Deep learning is not like what humans do. Deep learning takes 100,000 examples to detect a pattern, whereas humans will generalize (perhaps wrongly) from 2 examples.

The Cognitive Computing keynote mentioned that sarcasm makes sentiment analysis difficult. For example: “I’m happy to spend a half hour of my lunch time in line at your bank.” There are products to measure tone from audio and video.

The Don’t Stop at Stopwords: Function Words in Text Analytics session noted that function words, unlike content words, are added by the writer subconsciously. Use of words like “that” or “the” instead of “this” can indicate the author is distancing himself/herself from the thing being described, possibly indicating deception. They’ve used their techniques in about 20 different languages. They need at least 300 words to make use of function word frequency to build a baseline.

The Should We Consign All Taxonomies to the Dustbin? talk considered the possibility of using machine learning to go directly from problem to solution without having a taxonomy in between. He said that 100k documents or 1 million words of text are needed to get going.

Clustify Blog – eDiscovery, Document Clustering, Technology-Assisted Review (Predictive Coding), Information Retrieval, and Software Development

Thoughts on e-discovery, computers, and software development.

Tag Archives: text analytics

Highlights from Text Analytics Forum 2018