Fork me on GitHub

Random text detector

This is an interactive version of the detector of random text, as described in this blog post. The algorithm is unsupervised and it was trained on this list of 2000 most common US names combined with 20 random words.

First, try to put common names like Ema, Collin or Jacqueline - the suspicious score should be below 25. Then try random words like werpoiupoi, gdsfgsdfg or kjlkjllkj - the suspicious score should be above 25. Results are not always perfect, but the score works suprisingly well for such a simple algorithm. (The threshold of 25 was set empirically based on this training dataset.)

As described in the blog post, the algorithm can also be used to sort a list of words according to the suspicious score. Example of such sorted list can be found here.

author @mkrcah, code available on GitHub


Looks ok - suspicious score {{ result.score | number:0 }} Looks suspicious - suspicious score {{ result.score | number:0 }} Looks very random - suspicious score {{ result.score | number:0 }}

Ngram Score (adjusted-IDF)
{{ngram.ngram}} {{ngram.score | number:1 }}
Total {{result.score | number:1}}