By Vasudev Ram
Python programs to count the frequencies of words in a string or from a file are used as common examples. They are often done using dicts. Here is a small program that counts the frequencies of lines in its input. There are some uses for this functionality. I will show those, and also compare and contrast this program with other tools, later.
The program uses an OrderedDict from the collections module of the Python standard library.
The program could also be written using either a regular dict or a defaultdict (also from the collections module), or a collections.Counter, with slightly different code in each of those cases.
I also got the same output, as expected, when I ran this form of the command:
The length of the longest line can also be computed inline in the first for loop.
- Vasudev Ram - Online Python training and consulting
Get updates on my software products / ebooks / courses.
My Python posts Subscribe to my blog by emailMy ActiveState recipes
Python programs to count the frequencies of words in a string or from a file are used as common examples. They are often done using dicts. Here is a small program that counts the frequencies of lines in its input. There are some uses for this functionality. I will show those, and also compare and contrast this program with other tools, later.
The program uses an OrderedDict from the collections module of the Python standard library.
The program could also be written using either a regular dict or a defaultdict (also from the collections module), or a collections.Counter, with slightly different code in each of those cases.
from __future__ import print_functionI ran it on this input file:
"""
linefreq.py
A program to find the frequencies of input lines.
Author: Vasudev Ram
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: http://gumroad.com/vasudevram
"""
import sys
from collections import OrderedDict
def linefreq(in_fil):
counts = OrderedDict()
for line in in_fil:
counts[line] = counts.get(line, 0) + 1
print("Freq".rjust(8) + ": Line")
for line, freq in counts.items():
print(str(freq).rjust(8) + ": " + line, end="")
print('-' * (10 + max(map(len, counts))))
for line, freq in reversed(counts.items()):
print(str(freq).rjust(8) + ": " + line, end="")
def main():
sa, lsa = sys.argv, len(sys.argv)
if lsa == 1:
linefreq(sys.stdin)
elif lsa == 2:
with open(sa[1], "r") as in_fil:
linefreq(in_fil)
else:
print("Only one filename argument supported.")
if __name__ == '__main__':
main()
line 1where "line 1" occurs once, "line 2" occurs twice, etc., with this command:
line 2
line 2
line 3
line 3
line 3
line 4
line 4
line 4
line 4
$ python linefreq.py infile1.txtand got this output:
Freq: LineThe reversed lines are output just to show that it is possible to use reversed() on an OrderedDict, unlike on a dict.
1: line 1
2: line 2
3: line 3
4: line 4
-----------------
4: line 4
3: line 3
2: line 2
1: line 1
I also got the same output, as expected, when I ran this form of the command:
$ cat infile1.txt | python linefreq.pyThis line:
print('-' * (10 + max(map(len, counts))))is used to print a row of dashes as long as the longest output line from above it.
The length of the longest line can also be computed inline in the first for loop.
- Vasudev Ram - Online Python training and consulting
Get updates on my software products / ebooks / courses.
My Python posts Subscribe to my blog by emailMy ActiveState recipes