Data cleansing problem statement: Data in a record are often duplicated. How do we find the duplicate probability ? [Work In Progress]