Collaborative filters: Difference between revisions

From ASRG
Jump to navigationJump to search
(Added an initial cut at a description.)
 
No edit summary
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
{{ast
{{ast
|date=????
|date=2001?
|difficult=Easy
|difficult=Easy
|popular=High
|popular=High
Line 8: Line 8:
}}
}}


Collaborative filters rely on users marking traffic as either spam or not-spam. Aggregate scores summarizing the judgments of lots of users are then used as input to simple content filters.
Collaborative filters rely on users marking traffic as either spam or not-spam. Aggregate scores summarizing the judgments of lots of users are then used as input to simple content filters.  Some versions such as [http://razor.sourceforge.net/ Vipul's Razor] also include reports of reporters, so that reliable reporters have more influence, and the effect of sloppy or dishonest reporters is limited.  


== Advantages ==
== Advantages ==
Line 16: Line 16:
== Disadvantages ==
== Disadvantages ==


If anyone can become a user of the system, spammers will sign up in order to attempt to game it. By using web scripting tools and anonymous proxies, they will attempt to create large numbers of dummy accounts with which to upvote their spam. There are techniques to attempt to adjust for this by rating the raters (example: Slashdot meta-moderation), but they are only partially successful, as spammers can easily vote legitimately part of the time.
If anyone can become a user of the system, spammers will sign up in order to attempt to game it. By using web scripting tools and anonymous proxies, they will attempt to create large numbers of dummy accounts with which to upvote their spam--[http://www.mediapost.com/publications/?fa=Articles.showArticle&art_aid=130320 millions, if necessary]. There are techniques to attempt to adjust for this by rating the raters (example: Slashdot meta-moderation), but they are only partially successful, as spammers can easily vote legitimately part of the time, or rate their other accounts as accurate.


Collaborative filters can also easily become systems for collaborative censorship. For example, [http://blogs.alternet.org/oleoleolson/2010/08/05/massive-censorship-of-digg-uncovered/ organized political censorship by teams of collaborators] has been observed on Digg.
Collaborative filters can also easily become systems for collaborative censorship. For example, [http://blogs.alternet.org/oleoleolson/2010/08/05/massive-censorship-of-digg-uncovered/ organized political censorship by teams of collaborators] has been observed on Digg.
Line 23: Line 23:


In addition, collaborative filtering suffers from a bootstrapping problem. Collaborative filters alone are generally insufficient; the initial volume of spam is so high that it's not worth wading through it all to upvote or downvote anything, so users give up, so the collaborative filtering doesn't work, so the spam volume remains high.
In addition, collaborative filtering suffers from a bootstrapping problem. Collaborative filters alone are generally insufficient; the initial volume of spam is so high that it's not worth wading through it all to upvote or downvote anything, so users give up, so the collaborative filtering doesn't work, so the spam volume remains high.
== Examples ==
* [http://razor.sourceforge.net/ Vipul's Razor]
* [http://www.cloudmark.com/en/serviceproviders/authority-spamassassin.html/ Cloudmark Authority]
* Report spam buttons at AOL, Hotmail, Yahoo! Mail, Google gmail, and other mail systems

Latest revision as of 13:51, 2 October 2010

Anti-spam technique: Collaborative filters
Date of first use: 2001?
Effectiveness: Varies
Popularity: High
Difficulty of implementation: Easy
Where implemented: MTA/MUA
Harm: Varies


Collaborative filters rely on users marking traffic as either spam or not-spam. Aggregate scores summarizing the judgments of lots of users are then used as input to simple content filters. Some versions such as Vipul's Razor also include reports of reporters, so that reliable reporters have more influence, and the effect of sloppy or dishonest reporters is limited.

Advantages

Actual human intelligence is harnessed in deciding whether something is spam or not, so many simple spammer tricks are ineffective. (Examples: Spam messages as graphical images, variant spellings.)

Disadvantages

If anyone can become a user of the system, spammers will sign up in order to attempt to game it. By using web scripting tools and anonymous proxies, they will attempt to create large numbers of dummy accounts with which to upvote their spam--millions, if necessary. There are techniques to attempt to adjust for this by rating the raters (example: Slashdot meta-moderation), but they are only partially successful, as spammers can easily vote legitimately part of the time, or rate their other accounts as accurate.

Collaborative filters can also easily become systems for collaborative censorship. For example, organized political censorship by teams of collaborators has been observed on Digg.

Another disadvantage is that large numbers of participating users are required for the system to work. Perhaps the most successful example is Google Gmail, which achieves accuracy because of the extremely large user population.

In addition, collaborative filtering suffers from a bootstrapping problem. Collaborative filters alone are generally insufficient; the initial volume of spam is so high that it's not worth wading through it all to upvote or downvote anything, so users give up, so the collaborative filtering doesn't work, so the spam volume remains high.

Examples