Tagged: unix

Unix tip: count uniques without sorting

It occurred to me that you don’t need to sort a file to count the occurrences of unique lines. And that awk has associative arrays.

By way of example, I have a file with some IP addresses in it. Each line has a single IP address and nothing else.


$ wc -l some_ips.csv
222049

Here is how I previously would have counted how many times each IP occurs:


$ time cat some_ips.csv | sort | uniq -c > /dev/null

real 0m0.751s
user 0m0.750s
sys 0m0.006s

Here is how I would do it now:


$ alias just_count="awk '{c[\$0]++}END{for(x in c){print c[x], x}}'"
$ time cat some_ips.csv | just_count > /dev/null

real 0m0.110s
user 0m0.092s
sys 0m0.033s</code>

Note how much less time it takes the second way.

The order of the output may be different, but if you “sort -n” the outputs of the two commands you will see that they are the same.

Advertisements