What if we wrote a python script to count all the words written by each author on marxists.org
It should be easy for all the :LIB: eral techbros of hexbear.
What if we wrote a python script to count all the words written by each author on marxists.org
It should be easy for all the :LIB: eral techbros of hexbear.
I would never willingly write python, but yeah could probably knock it out with just
curl
andwc
, no scripting necessaryhow would you extract only the text?
if this was for funsies id just call the html tags an acceptable error margin, especially since it'll scale proportionally with the amount of text and still get you to the right answer of "who wrote the most", but you could use
html2text
,tidy
, or another command line util if you cared about getting 1% closer to the actual right number of words archived on that site