The 16th annual Ipro Innovations conference was held at the Talking Stick Resort.
The conference started with a summary of recent changes to the Ipro software line-up, how it enables a much smaller team to manage large projects, and stats on the growing customer base. They announced that Clustify will soon replace Content Analyst as their analytics engine. In the first phase, both engines will be available and will be implemented similarly, so the user can choose which one to use. Later phases will make more of Clustify’s unique functionality available. They announced an investment by ParkerGale Capital. Operations will largely remain unchanged, but there may be some acquisitions. The first evening ended with a party at Top Golf.
Ari Kaplan gave a presentation entitled “The Opportunity Maker,” where he told numerous entertaining stories about business problems and how to find opportunities. He explained that doing things that nobody else does can create opportunities. He contacts strangers from his law school on LinkedIn and asks them to meet for coffee when he travels to their town — many accept because “nobody does that.” He sends postscards to his clients when traveling, and they actually keep them. To illustrate the value of putting yourself into the path of opportunity, he described how he got to see the Mets in the World Series. He mentioned HelpAReporter.com as a way to get exposure for yourself as an expert.
One of the tracks during the breakout sessions was run by The Sedona Conference and offered CLE credits. One of the TSC presentations was “Understanding the Science & Math Behind TAR” by Maura Grossman. She covered the basics like TAR 1.0 vs. 2.0, human review achieving roughly 70% recall due to mistakes, and how TAR performs compared to keyword search. She mentioned that control sets can become stale because the reviewer’s concept of relevance may shift during the review. People tend to get pickier about relevance as the review progresses, so an estimate of the number of relevant docs taken on a control set at the beginning may be too high. She also warned that making multiple measurements against the control set can give a biased estimate about when a certain level of performance is achieved (sidenote: this is because people watch for a measure like F1 to cross a threshold to determine training completeness, which is not the best way to use a control set). She mentioned that she and Cormack have a new paper coming out that compares human review to TAR using better-reviewed data (Tim Kaine’s emails) that addresses some criticisms of their earlier JOLT study.
There were also breakout sessions where attendees could use the Ipro software with guidance from the staff in a room full of computers. I attended a session on ECA/EDA. One interesting feature that was demonstrated was checking the number of documents matching a keyword search that did not match any of the other searches performed — if the number is large, it may not be a very good search query.
