Collaborative filters: Difference between revisions
(Added an initial cut at a description.) |
No edit summary |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
{{ast | {{ast | ||
|date= | |date=2001? | ||
|difficult=Easy | |difficult=Easy | ||
|popular=High | |popular=High | ||
Line 8: | Line 8: | ||
}} | }} | ||
Collaborative filters rely on users marking traffic as either spam or not-spam. Aggregate scores summarizing the judgments of lots of users are then used as input to simple content filters. | Collaborative filters rely on users marking traffic as either spam or not-spam. Aggregate scores summarizing the judgments of lots of users are then used as input to simple content filters. Some versions such as [http://razor.sourceforge.net/ Vipul's Razor] also include reports of reporters, so that reliable reporters have more influence, and the effect of sloppy or dishonest reporters is limited. | ||
== Advantages == | == Advantages == | ||
Line 16: | Line 16: | ||
== Disadvantages == | == Disadvantages == | ||
If anyone can become a user of the system, spammers will sign up in order to attempt to game it. By using web scripting tools and anonymous proxies, they will attempt to create large numbers of dummy accounts with which to upvote their spam. There are techniques to attempt to adjust for this by rating the raters (example: Slashdot meta-moderation), but they are only partially successful, as spammers can easily vote legitimately part of the time. | If anyone can become a user of the system, spammers will sign up in order to attempt to game it. By using web scripting tools and anonymous proxies, they will attempt to create large numbers of dummy accounts with which to upvote their spam--[http://www.mediapost.com/publications/?fa=Articles.showArticle&art_aid=130320 millions, if necessary]. There are techniques to attempt to adjust for this by rating the raters (example: Slashdot meta-moderation), but they are only partially successful, as spammers can easily vote legitimately part of the time, or rate their other accounts as accurate. | ||
Collaborative filters can also easily become systems for collaborative censorship. For example, [http://blogs.alternet.org/oleoleolson/2010/08/05/massive-censorship-of-digg-uncovered/ organized political censorship by teams of collaborators] has been observed on Digg. | Collaborative filters can also easily become systems for collaborative censorship. For example, [http://blogs.alternet.org/oleoleolson/2010/08/05/massive-censorship-of-digg-uncovered/ organized political censorship by teams of collaborators] has been observed on Digg. | ||
Line 23: | Line 23: | ||
In addition, collaborative filtering suffers from a bootstrapping problem. Collaborative filters alone are generally insufficient; the initial volume of spam is so high that it's not worth wading through it all to upvote or downvote anything, so users give up, so the collaborative filtering doesn't work, so the spam volume remains high. | In addition, collaborative filtering suffers from a bootstrapping problem. Collaborative filters alone are generally insufficient; the initial volume of spam is so high that it's not worth wading through it all to upvote or downvote anything, so users give up, so the collaborative filtering doesn't work, so the spam volume remains high. | ||
== Examples == | |||
* [http://razor.sourceforge.net/ Vipul's Razor] | |||
* [http://www.cloudmark.com/en/serviceproviders/authority-spamassassin.html/ Cloudmark Authority] | |||
* Report spam buttons at AOL, Hotmail, Yahoo! Mail, Google gmail, and other mail systems |
Latest revision as of 13:51, 2 October 2010
Anti-spam technique: Collaborative filters | |
---|---|
Date of first use: | 2001? |
Effectiveness: | Varies |
Popularity: | High |
Difficulty of implementation: | Easy |
Where implemented: | MTA/MUA |
Harm: | Varies |
Collaborative filters rely on users marking traffic as either spam or not-spam. Aggregate scores summarizing the judgments of lots of users are then used as input to simple content filters. Some versions such as Vipul's Razor also include reports of reporters, so that reliable reporters have more influence, and the effect of sloppy or dishonest reporters is limited.
Advantages
Actual human intelligence is harnessed in deciding whether something is spam or not, so many simple spammer tricks are ineffective. (Examples: Spam messages as graphical images, variant spellings.)
Disadvantages
If anyone can become a user of the system, spammers will sign up in order to attempt to game it. By using web scripting tools and anonymous proxies, they will attempt to create large numbers of dummy accounts with which to upvote their spam--millions, if necessary. There are techniques to attempt to adjust for this by rating the raters (example: Slashdot meta-moderation), but they are only partially successful, as spammers can easily vote legitimately part of the time, or rate their other accounts as accurate.
Collaborative filters can also easily become systems for collaborative censorship. For example, organized political censorship by teams of collaborators has been observed on Digg.
Another disadvantage is that large numbers of participating users are required for the system to work. Perhaps the most successful example is Google Gmail, which achieves accuracy because of the extremely large user population.
In addition, collaborative filtering suffers from a bootstrapping problem. Collaborative filters alone are generally insufficient; the initial volume of spam is so high that it's not worth wading through it all to upvote or downvote anything, so users give up, so the collaborative filtering doesn't work, so the spam volume remains high.
Examples
- Vipul's Razor
- Cloudmark Authority
- Report spam buttons at AOL, Hotmail, Yahoo! Mail, Google gmail, and other mail systems