Spam – 내 맘대로 보는 세상

spam filtering with transport/router of exim

I used to classify spam mail with procmail, but procmail is excuted only for local users. So when you use the .forward file, spam mail is also forwarded to the other email addresses.
It’s not so quite import but I don’t like this behavior. so I change the exim configuration to run bogofilter with the transport/router of exim.
first let’s configure transport like this.
bogofilter: driver = pipe command = /usr/sbin/exim -oMr bogodone -bS use_bsmtp = true transport_filter = /usr/bin/bogofilter -d /etc/bogofilter/ -e -p log_output = true return_path_add = false temp_errors = * home_directory = "/tmp" current_directory = "/tmp" message_prefix = "" message_suffix = ""
after that you need to declare the router which uses this transport. Order is important in router. I put it right after the system_alias router.
bogofilter: domains = +local_domains no_verify condition = ${if !eq {$received_protocol}{bogodone} {1}{0}} driver = accept transport = bogofilter
With this configuration, bogofilter will be excuted but it put only an additional header like “X-bogofilter: …” so we need another transport/router to classify the spam mails.
Here’s the transport to do that.
spam_delivery: driver = appendfile directory = /home/$local_part/.maildir/.Spam maildir_format delivery_date_add envelope_to_add return_path_add
With this transport, we can put the spam mails to the .Spam folder within $HOME/.maildir. If you use the IMAP protocol, then you can simply check the spam mails by accessing Spam folder. But with POP3 protocol there’s no way to check it. So if you use POP3 then use the header filtering rules of client software instead.
And we also need to add a router which uses this transport. The order is important in here too. I put it below the bogofilter router.
removingspam: driver = accept check_local_user condition = ${if match {$h_X-Bogosity:} {Spam, tests=bogofilter} {1}{0}} transport = spam_delivery
Spam mail has the header, X-Bogosity: Spam, test=bogofilter …, so we can classify the spam mail easily.
To confirm it working, check the mail log. I checked /var/log/mail/current because I use metalog but in almosts linux distributions syslogd is included so check the /var/log/messages file.

$ # tail -f /var/log/mail/current |grep R=
Feb 26 16:48:31 [exim] 2008-02-26 16:48:31 1JRkT5-0001pz-Nx => mailaddr R=procmail T=procmail
Feb 26 16:48:35 [exim] 2008-02-26 16:48:35 1JTuY0-00081q-7u => mailaddr R=removingspam T=spam_delivery
Feb 26 16:48:44 [exim] 2008-02-26 16:48:44 1JQWAs-0002po-CR => mailaddr R=bogofilter T=bogofilter
Feb 26 16:48:45 [exim] 2008-02-26 16:48:45 1JTuY5-000826-EK => mailaddr R=removingspam T=spam_delivery

The name of router/transport is located after R= and T= . The name of router appears after R=, and the name of transport appears after T=. According to this log, we can say that removespam or procmail router is used after the bogofilter excuted.

exim 의 transport, router 를 이용한 스팸 필터링

익숙한 걸 사용하려다보니 procmail 을 이용해서 bogofilter 를 수행하는 방법을 사용해 왔지만, procmail 은 로컬 유져에 한해서 실행되게 되므로, alias 나 .forward 를 사용하게 되는 경우 스팸 필터링을 하지 않게 된다.
하여튼 이게 좀 신경쓰여서 transport 와 router 를 이용해서 bogofilter 를 수행하도록 설정해봤다.
우선 transport 를 다음과 같이 설정해보자.
bogofilter: driver = pipe command = /usr/sbin/exim -oMr bogodone -bS use_bsmtp = true transport_filter = /usr/bin/bogofilter -d /etc/bogofilter/ -e -p log_output = true return_path_add = false temp_errors = * home_directory = "/tmp" current_directory = "/tmp" message_prefix = "" message_suffix = ""
그리고 이 transport 를 이용하는 router 를 만든다. 참고로 router 는 순서에 민감하므로 삽입할 위치를 잘 조절해야 한다. 나같은 경우는 system_alias 다음에 선언해두었다. (alias 를 사용하는 주소들중 로컬 유져에게 전달되지 않는 건 mailman 과 관련된 것들 밖에 없는데 이거야 뭐 어짜피 인증된 사용자가 보낸 메일만 받으니 상관 없겠다는 마음으로…-_-;; )
bogofilter: domains = +local_domains no_verify condition = ${if !eq {$received_protocol}{bogodone} {1}{0}} driver = accept transport = bogofilter
여기까지만 하게 되면 bogofilter 를 수행하기는 하지만 이를 이용해서 메일을 옮긴다거나 하는 동작은 하지 않게 된다. 그러므로 이런 동작을 시키기 위한 transport 와 router 를 또 추가해주자.
역시나 transport 먼저…
spam_delivery: driver = appendfile directory = /home/$local_part/.maildir/.Spam maildir_format delivery_date_add envelope_to_add return_path_add
이렇게 하면 자신의 홈 디렉토리의 .maildir 아래 .Spam 이란 디렉토리를 만들고, 그 디렉토리에 스팸 메일을 저장하게 된다. IMAP 으로 접속하면 Spam 메일들을 확인할 수 있기 때문에 이렇게 했는데, POP3 만 사용하는 거라면 그냥 제목에 [Spam] prefix 를 붙이게 하는 것도 나쁘지 않을 듯…
그 다음엔 이 transport 를 이용하는 router! 역시나 어디다 위치시킬지 잘 생각해야 한다. 나같은 경우엔 bogofilter router 바로 아래에 이걸 위치시켜놓았다.
removingspam: driver = accept check_local_user condition = ${if match {$h_X-Bogosity:} {Spam, tests=bogofilter} {1}{0}} transport = spam_delivery
스팸 메일은 bogofilter 에 의해 X-Bogosity: Spam, test=bogofilter … 식의 헤더가 추가되기 때문에 이렇게 할 경우 스팸을 쉽게 분류해낼 수 있다.
잘 됐는지 확인은 메일로그를 이용해서 확인하면 된다. 나같은 경우는 metalog 를 사용하니 /var/log/mail/current 를 이용해서 확인해야 했는데 대부분의 경우 syslogd 를 사용할테니 /var/log/message 를 확인하면 될 것 같다.

$ # tail -f /var/log/mail/current |grep R=
Feb 26 16:48:31 [exim] 2008-02-26 16:48:31 1JRkT5-0001pz-Nx => 메일주소 R=procmail T=procmail
Feb 26 16:48:35 [exim] 2008-02-26 16:48:35 1JTuY0-00081q-7u => 메일주소 R=removingspam T=spam_delivery
Feb 26 16:48:44 [exim] 2008-02-26 16:48:44 1JQWAs-0002po-CR => 메일주소 R=bogofilter T=bogofilter
Feb 26 16:48:45 [exim] 2008-02-26 16:48:45 1JTuY5-000826-EK => 메일주소 R=removingspam T=spam_delivery

유심해서 봐야할건 R=, T= 다음에 나오는 것들이다. R= 다음에 나오는 것은 사용된 router 를 의미하고, T= 다음에 나오는건 transport 를 의미한다. 위의 로그를 보면 bogofilter 를 수행한 뒤 removingspam router 를 이용해서 spam_delivery trasport 가 수행되기도 하고 혹은 이를 통과해서 procmail transport 가 수행되기도 하는 걸 확인할 수 있다.
기본으로는 procmail transport 가 없으니 원랜 local_delivery 가 나올 수도 있겠고 뭐 하여튼 router 나 transport 이름은 사용자가 맘대로 지으면 되는거라 상황에 따라 다 다를 듯…
스팸 없는 세상이 올 때까지 ㅠ.ㅠ 오늘도 삽질…

요 며칠 삽질기 -_-! with Exim

어째 요새 관리해야할 서버가 늘어버렸네요. (전 언픽스 하나로 족한데 ㅠ.ㅠ) 하여튼!! 요 며칠 사이 gentoo + exim + procmail + spf + srs + clamav + bogofilter + dovecot 를 시도해봤습니다.

사실 계속 제가 맡아서 할 게 아니라 길어야 일 년 정도 만져줄 서버기 때문에 젠투가 아닌 다른 배포판을 생각했었는데, spf 와 srs 를 지원할 수 있도록 하면서 기본으로 제공되는 패키지를 이용할 수 있는 조합이 몇 가지 되질 않더군요. exim 에서 spf 와 srs 는 experimental 로 되어 있기 때문에 바이너리 배포판에선 기본으로 적용이 되어 있질 않고, postfix + milter 조합에서는 srs 를 제공할 수 없기 때문에 남은 선택은 sendmail + milter 조합 밖에 없는데 sendmail 을 사용하기는 싫었거든요.

하여튼! exim 에 procmail 을 붙이는 방법은 아래와 같습니다.

Sender Rewrite Scheme

spf 는 예전에 써놨던 글에서 충분히 설명해놨듯이 메일의 도메인값과 발송지 값을 이용해 스팸을 필터링해내기 위한 기술입니다. 예를 들어 From address 에 nospam@mytears.org 가 있을 경우 mytears.org 도메인의 txt 정보를 읽어오고, 거기에 적혀있는 61.109.245.78 에서 온 메일만을 정상적인 메일로 판단하게 되는데, 이렇게 spf 가 적용되어 있는 경우 .forward 나 alias 등을 이용해서 메일을 포워딩 시킬 경우 문제가 있을 수 있습니다.
.forwards 를 통해 nospam@mytears.org 를 nospam@gmail.com 으로 리다이렉션을 시킨 경우를 생각해봅시다.

sender@somedomain 에서 nospam@mytears.org 로 메일을 보냄

mytears 메일 서버에서 nospam@mytears.org 사용자에 대한 .forward 파일을 읽어서 거기 적혀있는 nospam@gmail.com 으로 메일을 포워딩 시킴

gmail 메일 서버에서 from address 인 somedomain 의 txt 에서 spf 정보를 읽어옴

spf 로 지정된 아이피와 메일을 포워딩시킨 서버의 아이피가 동일하지 않으므로 spf 정보가 맞지 않다고 판단함

spf 정책에 따라 다르겠지만 대부분 스팸이라고 판단하게 됨

결국 제대로 메일이 전해지지 않음

이걸 해결하기 위한 방법으로 srs 라는 게 있습니다. 이걸 적극적으로 적용한 예로는 구글이 있습니다.
exim 의 경우 기본적으로 spf, srs 를 지원하기 때문에 빌드할 때 옵션을 잘 조절해주면 쉽게 적용할 수 있습니다. sendmail 의 경우 milter 를 이용하면 srs 를 적용할 수 있구요. 하지만 postfix 에서는 이를 사용할 수 없는 방법이 없습니다. (구버젼의 postfix 라면 패치를 통해 사용할 수 있지만 이 패치가 계속 유지되질 않고 있습니다.)
참고
spf: http://www.openspf.org/
srs: http://www.openspf.org/SRS
p.s) 그나저나 spf 가 나온지 꽤 오랜 시간이 흘렀음에도 대학 메일 서버들 중에 spf 를 지원하는 곳은 별로 없군요.

스팸 필터 성능 공개 -_-v

정확하겐 24시간 치는 아니고, 22:04:17 ~ 19:21:04 까지의 결과입니다. 🙂
Continue reading 스팸 필터 성능 공개 -_-v

Domain key

Domain key 는 MS 의 Sender ID 와 비슷한 기술로써 메일의 위변조를 막기 위해 야후에서 개발한 기술입니다. 현재 dreamwiz, yahoo, gmail 등에서 사용되고 있습니다.
http://kr.antispam.yahoo.com/domainkeys
공개키, 비밀키를 이용하는 방식으로 dns 의 text 영역에 공개키를 넣어두고, 메일 본문과 헤더는 비밀키를 이용해서 디지털 사이닝 하는 방식으로 동작하기 때문에 dns 와 smtp server 에서 모두 지원을 해야 사용이 가능합니다.
Continue reading Domain key

postfix 에서 spf 사용하기

postfix 에서 spf 를 사용하는 방법은 크게 아래와 같이 세가지로 나눌 수 있겠습니다. spf 가 무엇인지에 대해서는 아랫 글을 참고하시기 바랍니다.
http://b.mytears.org/2005/07/226
Continue reading postfix 에서 spf 사용하기

Block Image only spam mail…

오늘은 수업이 2시에 시작하는데 어쩌다보니 아침 8시에 일어나게 되서… 그걸 기념하고자 스팸 룰을 하나 더 추가했습니다.
다들 이미지 하나 딸랑 있는 메일을 보신 적이 있을거라 생각합니다. 정상적으로 의미가 담긴 메일 중에는 그런 성의 없는 메일이 많지 않으므로 이걸 이용하기 위해 간단한 스크립트를 하나 작성했습니다. 스크립트는 메일 내용에서 <!–….–> 식의 주석을 먼저 제거하고, script, style, title 등 안에 별 의미 없는 스트링을 제거한 후 마지막으로 태그들을 제거하는 역할을 합니다. (title 에 나오는 내용이 의미없다는 말은 아니지만 메일에선 의미 없다고 봐도 될거라고 생각합니다.)
위의 스크립트를 통해 html 이 모두 제거된 후 white space 를 제외하고 스트링이 남아있는지 없는지를 판단합니다. 이미지 만으로 이루어진 메일의 경우엔 bogofilter 등에서 활용할 만한 내용이 없기 때문에 이런 식의 처리를 추가하게 된 것인데 효과는 얼마나 될지 모르겠습니다.
뭐 하튼 태그를 제거하는 스크립트는 아래 url 에서 보실 수 있습니다. 정규표현식으로 처리했기 때문에 무척 간단해보일 겁니다 (그러고 보니 예전에 오토마타로 짜놓았던 녀석이 있었군요 -_-)
http://mytears.org/resources/mysrc/php/removehtml/strip_html.phps
p.s) 생각보다 효과가 탁월합니다. -_- 많이들 걸리는군요.

bogofilter

요 근래 늘어가는 스팸에 이마에 주름이 하나 둘 늘어가고 있던 찰나에 bsdforum 을 구경갔다가 방준영 님이 쓰신 bogofilter 관련 글을 보게 되었습니다. 사용법을 보니 자동으로 스팸 훈련을 시키는 것이 아닌 듯 싶어 왠지 모르게 호감이 가더군요. 🙂
http://bogofilter.sourceforge.net/
참고로 bogofilter는 베이시안 룰 기반의 스팸 필터입니다.
사용을 위해선 우선 스팸 메일들을 .Spam에 몰아두고, 스팸이 아닌 메일을 .Ham폴더에 몰아둔 뒤 아래와 같은 명령어를 이용해서 학습을 시켜야 합니다.

cd .Spam
for x in *;do bogofilter -S < $x ; done
cd ..
cd .Ham
for x in *;do bogofilter -N < $x; done

cd .Spam

for x in *;do bogofilter -S < $x ; done

cd ..

cd .Ham

for x in *;do bogofilter -N < $x; done

-S옵션은 이 메일이 스팸 메일이라는 것을 의미하고, -N은 이 메일이 스팸이 아니라는 것을 의미합니다.
만약 A라는 스팸 메일을 햄이라고 잘못 학습시킨 경우 아래와 같은 명령을 이용해서 결과를 바로 잡을 수 있습니다. (A는 메일 파일 이름)

bogofilter -Ns < A ; done

1	bogofilter -Ns < A ; done

거꾸로 A라는 햄 메일을 스팸이라고 잘못 학습 시켰다면 아래와 같은 명령어를 이용해야합니다.

bogofilter -Sn < A ; done

1	bogofilter -Sn < A ; done

제 메일함에 쌓여있던 이메일(햄: 약300통, 스팸: 약1,000통)을 기준으로 스팸/햄 확률을 업데이트한 뒤 3일째 테스트 중입니다. 3일동안 온 30개의 메일 중에 스팸 26통을 정확하게 걸러주었네요.
아래는 나중에 시스템에 적용시켰을 때 사용자들이 Spam 폴더를 지우지 않더라도 알아서 정리해주기 위해 사용하게 될 스크립트 초안입니다~

#!/bin/sh
for x in `cat /etc/passwd|awk -F: '{ print $6 }'`;do
	if [ ${x:0:6} == "/home/" ]; then
		if [ -d "$x/.maildir/.Spam" ]; then
			// this will be replaced to tmpwatch
			echo $x;
		fi
	fi
done

#!/bin/sh

for x in `cat /etc/passwd|awk -F: '{ print $6 }'`;do

if [ ${x:0:6} == "/home/" ]; then

if [ -d "$x/.maildir/.Spam" ]; then

// this will be replaced to tmpwatch

echo $x;

done

제가 참고한 bsdforum 의 방준영 님의 글은 아래 링크에서 볼 수 있습니다 😉
http://bsdforum.or.kr/viewtopic.php?t=33
그리고 bogofilter 없는 상태에서 스팸을 걸러낸 결과는 아래와 같습니다.

# 전체 받은 메일 수
unfix aqua # cat /var/log/procmail|grep ^[\\[F]|wc -l
5077
# Spf 에 걸린 메일
unfix aqua # cat /var/log/procmail|grep ^\\[Spf|wc -l
2482
# Broken Multipart
unfix aqua # cat /var/log/procmail|grep ^\\[Fake|wc -l
1499
# Bad-mailer
unfix aqua # cat /var/log/procmail|grep ^\\[BadMailer|wc -l
315
# Spam-word 를 포함한 경우
unfix aqua # cat /var/log/procmail|grep ^\\[Spam|wc -l
141
# Bad-library
unfix aqua # cat /var/log/procmail|grep ^\\[BadLibrary|wc -l
21
# 내용, 제목이 모두 없는 메일
unfix aqua # cat /var/log/procmail|grep ^\\[EmptyMail|wc -l
# 바이러스 메일
unfix aqua # cat /var/log/procmail|grep ^\\[Virus|wc -l
2

6일부터 12일까지의 5077 개의 메일 중에 Spf 에 2482 개, Fake/NoHTML + Fake/NoPlain 에 걸린게 1499 개, BadMailer 에 315개, SpamWord 에 141 개, BadLibrary 에 21개, Virus 가 2개 되겠습니다.
결과적으로 4460/5077 개로 87.8% 가 스팸으로 판정되었습니다.
그리고 bogoutil 을 통해 확인해본 결과 나름 인코딩 관련된 처리를 알아서 처리하고 있는 듯 싶길래 BadMailer, SpamWord, BadLibrary 에서 처리하던 부분을 bogofilter 로 대체시킬까 싶어서 관련된 rule 은 주석처리를 해 놓았습니다.

Spam filter #2

오랫만에 코드가 손에 잡히길래… 몇 일 전에 구상해 놨던 spam filter 를 실제 구현해봤습니다. php 와 pecl-mailparse 덕분에 아주 간단히 구현할 수 있었습니다.

text/html, text/plain 뽑아내기 (base64_decode, qprint_decode 는 자동으로 됨)

urldecode

convert to utf-8

decode html entities

딱 위에 얘기한 대로만 구현했습니다. 실제 적용 사례는 아래 링크를…
http://mytears.org/resources/mysrc/php/Parsemail.phps
http://mytears.org/procmailrc
현재는 메일 내용을 full buffering 하고 있는데, 몇십 메가씩 되는 메일들을 처리하게 되면 메모리를 엄청나게 쓸 지도 모르겠군요. tempnam 등을 이용해서 임시파일을 만드는 방법으로 천천히 전환해야겠네요 흐흐흐