Hey all, happy new year apparently! A quick service update on the old wingolog. For some time the site has been drowning in spam comments, despite my
best efforts to point a bayesian classifier at the problem.
I don't keep logs of the number of attempts at posting comments that don't pass the classifier. But what I can say is that since I put in the classifier around 4 years ago, about 2500 comments a year made it through -- enough to turn the comment section into a bit of a dump. Icky, right??
At the same time of course, that's too many comments to triage manually, so I never got around to fixing the problem. So in fact I had two problems: lots 'o spam, and lots 'o incoming spam.
With regards to the existing spam, I took a heavyhanded approach. I took a look at all good emails and URLs that people had submitted for comments prior to 2017, assuming they were triaged. Then I made a goodlist of comments since 2017 that had those comments or emails. There were very few of those -- maybe 50 or 70 or so.
Then I took a look at when comments were made relative to the posts. Turns out, 98.3% of comments were made more than 21 days after their parent post was published -- and sometimes years afterwards. I used that as a first filter, that if a post wasn't from a known poster, and was made 3 weeks or more after the post, I just classified it as spam.
The list of comments made within 3 weeks of the parent post was small enough for me to triage manually, and there I was able to save a bit of wheat from the chaff. In the end, though, the result of winnowing was less than 1% of what went in.
As I don't really want to babysit this wobsite, I'll keep this policy in place in the future -- comments will be open for a while after articles are posted. Hopefully that should keep the ol' wingolog in tidy shape going forward, while still permitting people to comment -- something I have really appreciated in the past.
So happy 2021 to everybody, may the vaccine gods shine upon your shoulders, and happy hacking :)