New adventures in blog spam

So I was just talking about the dark appreciation I have for new developments in spam tactics, and then, bam, innovation on my very doorstep:

A heavy “tramadol” spam assault—using the content of the apparent discussion of that attack as the content for the spams.

Some background: my general (hobbyist) understanding of Bayesian filtering is that such filtering depends on the identification of words more likely to appear in spam. So a spam can be identified not merely by fixed strings of spamlike phrases, but as gestalts that contain a probabilistically unusual number of spammy tokens. Simply rearranging the words in your spam missive—or synonymizing the words therein—is no longer an effective feint against filters.
So how do you set up a Bayesian filter? You create a corpus of spam, and analyze the contents thereof, and compare that to a corpus of non-spam. Then you test each message you receive for proportional presence of spammy tokens or phrases—too high a spammishness level, and the message gets nuked.

So how do you then fight a Bayesian filter, as a motivated spammer? You deliver novel, non-spammish text. But then the Bayesian filterers will start working your new “novel” text into their databases—it’s an ongoing struggle.

So the savvy spammer needs sources of constantly novel text. I noted previously my conflicted delight at the appearance of Markov Chain-generated pseudo-text—I’m fond of such stochastic methods in general, and used Markov-based deconstruction of Mozart compositions as part of a college project—where spammers incorporated the text of classic novels, for example, and chopped them up using Markovian methods.

But this latest thing is possibly more clever yet than even the Markov tactic: use inherently fresh content for your spam content. Use the reaction to your own current spam campaign to seed your spam with fresh, uncompensated-for plausible text, and thereby eliminate even the vulnerability of a probabilistic profile of commonly-bespammed classic literature.

The evidence so far:

Two days ago, I saw this show up in spam of a number of Meficomp post comment moderation queues:

I always have terrible trouble with comment-related plugins that require me to put some line in the comment loop; I can never seem to find the right spot. Can anyone tell me where I should put the php line in my comments loop? I haven not modified anything much, and I would be very grateful. Thanks!

But then, just a couple days later, a new round of spam, apparently quoting a genuine bloggish reaction the the previous goddam round of spam:

I got the same tramadol attack… well, not the same, because it was only about 20 comments instead of 90, and i t have any filtering set up, and I just deleted them one at a time… hmm.. the only thing really in common was that it was about tramadol… what filter do you have set up that caught them all?

Clever little bastards, you have to give them that.

[post-script: have I mentioned how much I hate the behavior of the WordPress WYSIWYG-ish editor? It is worse than shit. It is Hitler shit.]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>