Collaborative filters
Anti-spam technique: Collaborative filters | |
---|---|
Date of first use: | ???? |
Effectiveness: | Varies |
Popularity: | High |
Difficulty of implementation: | Easy |
Where implemented: | MTA/MUA |
Harm: | Varies |
Collaborative filters rely on users marking traffic as either spam or not-spam. Aggregate scores summarizing the judgments of lots of users are then used as input to simple content filters.
Advantages
Actual human intelligence is harnessed in deciding whether something is spam or not, so many simple spammer tricks are ineffective. (Examples: Spam messages as graphical images, variant spellings.)
Disadvantages
If anyone can become a user of the system, spammers will sign up in order to attempt to game it. By using web scripting tools and anonymous proxies, they will attempt to create large numbers of dummy accounts with which to upvote their spam--millions, if necessary. There are techniques to attempt to adjust for this by rating the raters (example: Slashdot meta-moderation), but they are only partially successful, as spammers can easily vote legitimately part of the time, or rate their other accounts as accurate.
Collaborative filters can also easily become systems for collaborative censorship. For example, organized political censorship by teams of collaborators has been observed on Digg.
Another disadvantage is that large numbers of participating users are required for the system to work. Perhaps the most successful example is Google Gmail, which achieves accuracy because of the extremely large user population.
In addition, collaborative filtering suffers from a bootstrapping problem. Collaborative filters alone are generally insufficient; the initial volume of spam is so high that it's not worth wading through it all to upvote or downvote anything, so users give up, so the collaborative filtering doesn't work, so the spam volume remains high.