# SARE HTML Ruleset for SpamAssassin - ruleset 0 # Version: 01.03.10 # Created: 2004-03-31 # Modified: 2006-06-03 # Usage instructions, documentation, and change history in 70_sare_html0.cf #@@# Revision History: Full Revision History stored in 70_sare_html.log #@@# 01.03.09: May 31 2006 #@@# Minor score tweaks based on recent mass-checks #@@# Moved file 0 to file 2: SARE_HTML_EHTML_OBFU #@@# Moved file 0 to file 2: SARE_HTML_HEAD_AFFIL #@@# Moved file 0 to file 2: SARE_HTML_LEAKTHRU1 #@@# Moved file 0 to file 2: SARE_HTML_LEAKTHRU2 #@@# Moved file 0 to file 2: SARE_HTML_ONE_LINE3 #@@# Moved file 0 to file 2: SARE_HTML_POB1200 #@@# Moved file 0 to file 2: SARE_HTML_URI_HIDADD #@@# Moved file 0 to file 2: SARE_HTML_URI_LOGOGEN #@@# Moved file 0 to file 2: SARE_HTML_URI_OFF #@@# Moved file 0 to file 2: SARE_HTML_USL_B7 #@@# Moved file 0 to file 2: SARE_HTML_USL_B9 #@@# Moved file 0 to file 2: SARE_PHISH_HTML_01 #@@# Added file 0: SARE_HTML_FLOAT1 #@@# 01.03.10: June 3 2006 #@@# Minor score tweaks based on recent mass-checks #@@# Added file 0 SARE_HTML_LINKWARN #@@# Added file 0 SARE_HTML_SPANNER # License: Artistic - see http://www.rulesemporium.com/license.txt # Current Maintainer: Bob Menschel - RMSA@Menschel.net # Current Home: http://www.rulesemporium.com/rules/70_sare_html0.cf # # Usage: This family of files, 70_sare_html*.cf, contain rules that test HTML strings within emails # (except URIs, which are handled in the 70_sare_uri*.cf family of files). # # File 0: 70_sare_html0.cf -- These are html rules that hit at least 10 spam and no ham. # While SARE cannot guarantee they never will hit ham, they have not hit ham in any SARE mass-check, against tens of thousands of ham. # This is a rules file we expect any/all email systems using SpamAssassin to benefit from. # # File 1: 70_sare_html1.cf -- These are html rules that meet one of the follow criteria: # a) Rules that do, or in the past have hit ham during SARE mass-check tests # b) Rules that hit no ham and currently do not hit more than 10 spam in any single mass-check run. # If the rules hit ham, they hit at last 10 spam to each 1 ham. # If the rules hit ham, they hit fewer than 100 ham # With few exceptions these rules score significantly less than the rules in file 0. # Systems which are very sensitive to false positives and/or need to be very careful about resource use may want to exclude this ruleset, # pick and choose among its rules, or lower their scores. # Systems that use this file 1 should ALSO use file 0. # # File 2: 70_sare_html2.cf -- These html rules hit no spam at this time, but they are considered "safe" rules that should never hit ham. # These are primarily rules that test for specific html seen only in spam, or similar types of "pretty darn sure" rules. # Systems which are very sensitive to SpamAssassin overhead may want to exclude this ruleset file to avoid its overhead, # but systems with plenty of resources that want to be aggressive against spam may benefit from this ruleset file. # # File 3: 70_sare_html3.cf -- These are html rules that hit a significant amount of ham during SARE mass-check tests. # Systems which are very sensitive to false positives or to SA resource usage should NOT install this ruleset. # # File 4: 70_sare_html4.cf -- These are html rules that meet one of the following criteria: # a) They hit over 100 ham during SARE mass-check tests, but still hit enough spam to be worth while to aggressively anti-spam systems. # b) They hit no emails at this time, but have been recommended by anti-spam sources. # Again, systems which are very sensitive to false positives or to SA resource usage should NOT install this ruleset. # # eng: 70_sare_html_eng.cf -- These are html rules which work well within the English language, but are liable to cause false # positives in other languages. They include rules which test for letter combinations. Systems that # receive ham in languages other than English should NOT use this file. # # x30: 70_sare_html_x30.cf -- These are html rules which have been incorporated into SpamAssassin 3.0.x, # or which duplicate or greatly overlap 3.0.x rules. # Systems which have installed SpamAssassin 3.0.x should therefore NOT use this file. # # arc: 70_sare_html_arc.cf -- These are html rules that once were published in other files, but which have since lost all value. # They either hit too much ham (without hitting enough spam to make it worth while), or they don't hit any spam. # SARE regularly runs mass-checks on these rules to see if any of them are worth reviving, but # we expect that nobody will be running these rules in any production system. # ######## ###################### ################################################## ######## ###################### ################################################## # Rules renamed or moved ######## ###################### ################################################## meta SARE_HTML_ALT_WAIT2 __SARE_HEAD_FALSE meta SARE_HTML_BADOPEN __SARE_HEAD_FALSE meta SARE_HTML_BAD_FG_CLR __SARE_HEAD_FALSE meta SARE_HTML_COLOR_B __SARE_HEAD_FALSE meta SARE_HTML_COLOR_NWHT3 __SARE_HEAD_FALSE meta SARE_HTML_FONT_INVIS2 __SARE_HEAD_FALSE meta SARE_HTML_FSIZE_1ALL __SARE_HEAD_FALSE meta SARE_HTML_GIF_DIM __SARE_HEAD_FALSE meta SARE_HTML_HTML_AFTER __SARE_HEAD_FALSE meta SARE_HTML_HTML_DBL __SARE_HEAD_FALSE meta SARE_HTML_HTML_TBL __SARE_HEAD_FALSE meta SARE_HTML_IMG_ONLY __SARE_HEAD_FALSE meta SARE_HTML_JVS_HREF __SARE_HEAD_FALSE meta SARE_HTML_MANY_BR10 __SARE_HEAD_FALSE meta SARE_HTML_MANY_BR10 __SARE_HEAD_FALSE meta SARE_HTML_NO_BODY __SARE_HEAD_FALSE meta SARE_HTML_NO_HTML1 __SARE_HEAD_FALSE meta SARE_HTML_P_JUSTIFY __SARE_HEAD_FALSE meta SARE_HTML_TITLE_SEX __SARE_HEAD_FALSE meta SARE_HTML_URI_2SLASH __SARE_HEAD_FALSE meta SARE_HTML_URI_AXEL __SARE_HEAD_FALSE meta SARE_HTML_URI_BADQRY __SARE_HEAD_FALSE meta SARE_HTML_URI_FORMPHP __SARE_HEAD_FALSE meta SARE_HTML_URI_HREF __SARE_HEAD_FALSE meta SARE_HTML_URI_MANYP2 __SARE_HEAD_FALSE meta SARE_HTML_URI_MANYP3 __SARE_HEAD_FALSE meta SARE_HTML_URI_NUMPHP3 __SARE_HEAD_FALSE meta SARE_HTML_URI_OBFU4 __SARE_HEAD_FALSE meta SARE_HTML_URI_OBFU4a __SARE_HEAD_FALSE meta SARE_HTML_URI_PARTID __SARE_HEAD_FALSE meta SARE_HTML_URI_RID __SARE_HEAD_FALSE meta SARE_HTML_USL_MULT __SARE_HEAD_FALSE meta SARE_HTML_FONT_EBEF __SARE_HEAD_FALSE meta SARE_HTML_URI_DEFASP __SARE_HEAD_FALSE meta SARE_HTML_INV_TAGA __SARE_HEAD_FALSE meta SARE_HTML_EHTML_OBFU __SARE_HEAD_FALSE meta SARE_HTML_HEAD_AFFIL __SARE_HEAD_FALSE meta SARE_HTML_LEAKTHRU1 __SARE_HEAD_FALSE meta SARE_HTML_LEAKTHRU2 __SARE_HEAD_FALSE meta SARE_HTML_ONE_LINE3 __SARE_HEAD_FALSE meta SARE_HTML_POB1200 __SARE_HEAD_FALSE meta SARE_HTML_URI_HIDADD __SARE_HEAD_FALSE meta SARE_HTML_URI_LOGOGEN __SARE_HEAD_FALSE meta SARE_HTML_URI_OFF __SARE_HEAD_FALSE meta SARE_HTML_USL_B7 __SARE_HEAD_FALSE meta SARE_HTML_USL_B9 __SARE_HEAD_FALSE meta SARE_PHISH_HTML_01 __SARE_HEAD_FALSE ######## ###################### ################################################## rawbody __SARE_HTML_HAS_A eval:html_tag_exists('a') rawbody __SARE_HTML_HAS_BR eval:html_tag_exists('br') rawbody __SARE_HTML_HAS_DIV eval:html_tag_exists('div') rawbody __SARE_HTML_HAS_FONT eval:html_tag_exists('font') rawbody __SARE_HTML_HAS_IMG eval:html_tag_exists('img') rawbody __SARE_HTML_HAS_P eval:html_tag_exists('p') rawbody __SARE_HTML_HAS_PRE eval:html_tag_exists('pre') rawbody __SARE_HTML_HAS_TITLE eval:html_tag_exists('title') rawbody __SARE_HTML_HBODY m''i rawbody __SARE_HTML_BEHTML m''i rawbody __SARE_HTML_BEHTML2 m'^'i rawbody __SARE_HTML_EFONT m'^'i rawbody __SARE_HTML_EHEB m'^'i rawbody __SARE_HTML_CMT_CNTR /
/i describe SARE_HTML_CMT_MONEY HTML Comment seems to mention money score SARE_HTML_CMT_MONEY 0.100 #counts SARE_HTML_CMT_MONEY 0s/0h of 98542 corpus (76935s/21607h RM) 05/12/04 #counts SARE_HTML_CMT_MONEY 0s/0h of 29365 corpus (5882s/23483h JH) 08/14/04 TM2 SA3.0-pre2 ######## ###################### ################################################## # Image tag tests ######## ###################### ################################################## rawbody SARE_HTML_GIF_NUM /\.gif\d{2,}/i describe SARE_HTML_GIF_NUM HTML contains tracking numbers after .gif score SARE_HTML_GIF_NUM 0.100 #counts SARE_HTML_GIF_NUM 0s/0h of 98542 corpus (76935s/21607h RM) 05/12/04 #counts SARE_HTML_GIF_NUM 0s/0h of 29365 corpus (5882s/23483h JH) 08/14/04 TM2 SA3.0-pre2 ######## ###################### ################################################## # Paragraphs, breaks, and spacings ######## ###################### ################################################## rawbody SARE_HTML_BR_MANY /
{5}/i describe SARE_HTML_BR_MANY Too many sequential identical HTML tags score SARE_HTML_BR_MANY 0.555 #stype SARE_HTML_BR_MANY spamp #counts SARE_HTML_BR_MANY 0s/0h of 689155 corpus (348140s/341015h RM) 09/18/05 #max SARE_HTML_BR_MANY 2s/0h of 258858 corpus (114246s/144612h RM) 05/27/05 #counts SARE_HTML_BR_MANY 0s/0h of 29365 corpus (5882s/23483h JH) 08/14/04 TM2 SA3.0-pre2 #counts SARE_HTML_BR_MANY 0s/0h of 54067 corpus (16890s/37177h JH-3.01) 06/18/05 #counts SARE_HTML_BR_MANY 0s/0h of 47221 corpus (42968s/4253h MY) 06/18/05 rawbody __SARE_HTML_MANY_BR05 /
\s*
\s*
\s*
\s*
\s*
/i meta SARE_HTML_MANY_BR05 __SARE_HTML_MANY_BR05 && HTML_MESSAGE describe SARE_HTML_MANY_BR05 Tooo many
's! score SARE_HTML_MANY_BR05 0.500 #hist SARE_HTML_MANY_BR05 Contrib by Matt Keller June 7 2004 #note SARE_HTML_MANY_BR05 Remove HTML_MESSAGE test increases spam 4% but doubles ham #hist SARE_HTML_MANY_BR05 this and SARE_HTML_MANY_BR10 obsolete SARE_HTML_TD_BR4 = FR_WICKED_SPAM_?? #counts SARE_HTML_MANY_BR05 0s/0h of 114422 corpus (81069s/33353h RM) 01/16/05 #alone SARE_HTML_MANY_BR05 2051s/43h of 66351 corpus (40971s/25380h RM) 08/21/04 #counts SARE_HTML_MANY_BR05 0s/0h of 54283 corpus (17106s/37177h JH-3.01) 02/13/05 #max SARE_HTML_MANY_BR05 755s/2h of 38858 corpus (15368s/23490h JH-SA3.0rc1) 08/22/04 #counts SARE_HTML_MANY_BR05 0s/0h of 26326 corpus (22886s/3440h MY) 02/15/05 ######## ###################### ################################################## # Javascript and object tests ######## ###################### ################################################## rawbody SARE_HTML_JVS_POPUP /'i rawbody __SARE_HTML_BEHTML m''i rawbody __SARE_HTML_BEHTML2 m'^'i rawbody __SARE_HTML_EFONT m'^'i rawbody __SARE_HTML_EHEB m'^'i rawbody __SARE_HTML_CMT_CNTR /