azurelunatic: Dreamwidth antispam: a dreamsheep holding a hammer, the better to smack spammers with. (spamhammer)
Azure Jane Lunatic (Azz - bolt of blue - infovore) ([personal profile] azurelunatic) wrote in [site community profile] dw_antispam2012-01-17 12:59 am
Entry tags:

Yearly Stats: 2011

It's been an interesting year in [site community profile] dw_antispam!

Every week, more or less, I pull the spam statistics for all items reported as spam sitewide. (Eventually that part of my job may be replaced by a very small script.) In some weeks I am able to review all reports, and some weeks I only have a chance to look over the top few. When possible, I make a note of how many of the reports were actual spam, and how many of them were other things that made their way into the spam reporting system. (For example, anonymous insults are certainly unpleasant and deserve deletion, but are not actually the commercially-motivated, high-volume sort of thing that the antispam system is designed for, and thus not actionable by the antispam team. Some reports, while not spam as such, were forwarded to developers who were better able to address the specific problem, such as comments that "broke" the page for other readers.)

The numbers for each item here are, in order: valid reports, invalid reports, and total reports. (When exact numbers were unavailable and the old reports had been cleared, I skewed in the direction of counting unknown/uncertain items as valid; if entirely unknown, I left the invalid number as 0.)

During some weeks, for one reason or another, I was not able to pull the reports as usual; in the interests of not having the numbers wildly out of whack, I kept the numbers the same as the previous or next week. I have noted in my source data which weeks were the result of estimates, and made a note with each total.

These numbers only take into account the spam that is deleted-and-reported, so the numbers for spam actually received across the service are assuredly higher, due to spam in abandoned journals, spam that is being deliberately saved, and spam that the journal owner either hasn't yet found the time/energy to delete or is unlikely to find the time/energy to remove at all.


TOTALS
Valid spam reports sitewide in 2011: ~4,800
Invalid (non-spam) reports in 2011: ~200
Total spam reports sitewide in 2011: ~5,000

Total registered user spammers in 2011: 16

Year Weekly Average
Valid: 90
Invalid: 4
Total: 94
Maximum reported registered user spammers in any week: 4

In an average week, 10-20 pieces of reported spam are reported by a single user. This does mean that spammers are singling out some users to barrage more than others. A rise in your personal spam does not mean that spam is necessarily up for the whole site, just that you are the unlucky user who is getting a lot of it this week.


The vast majority of spam reports are of anonymous comments. The breakdown (weeks without data were excluded from this):

Anonymous comments: 3735
OpenID comments: 62
Registered user comments, entries, and private messages: 122, of which 71 were valid; that's 58% of reports that were valid, and 42% that were not actual spam.

The vast majority of anonymous spammers are defeated by CAPTCHAs.
Most OpenID spammers originate from LiveJournal. Many of their spam comments are not left on Dreamwidth directly, but imported along with a journal.
A relatively significant proportion of the registered user spammers (most of whom are from open registration periods) were caught due to what I like to call "flagrantly notable" spamming -- spam directed at official areas of the site, where it comes directly to the attention of people who will issue the smackdown.


I've pulled the numbers from my weekly reports into a spreadsheet, for the curious, with some commentary:
https://docs.google.com/spreadsheet/ccc?key=0AhtWr7PvrMa4dEpFOTlRNDFtaV8xRGx0WkZmSGdwSkE
pne: A picture of a plush toy, halfway between a duck and a platypus, with a green body and a yellow bill and feet. (Default)

[personal profile] pne 2012-01-18 11:48 am (UTC)(link)
These numbers only take into account the spam that is deleted-and-reported, so the numbers for spam actually received across the service are assuredly higher, due to spam in abandoned journals, spam that is being deliberately saved, and spam that the journal owner either hasn't yet found the time/energy to delete or is unlikely to find the time/energy to remove at all.

Also, spam that is deleted the "classic" way (without reporting it as spam).
jeshyr: Blessed are the broken. Harry Potter. (Default)

[personal profile] jeshyr 2012-01-22 11:07 am (UTC)(link)
That's a nice low level of false reports - good to see!

Yay antispam team :)