Thoughts on e-discovery, computers, and software development.
example4
Example 4: An email and its reply. Shows that short documents give artificially small MinHash similarity. Also shows why it is important to group near-dupes rather than discarding them.