Overview
When an MMS message is sent from a mobile phone to an email address it passes through the Network providers MMS gateway and is converted to a MIME encoded email. At this stage the provider usually adds some form of advertising/signature message and optionally encodes the message with HTML so it appears nicely formatted when it arrives in your inbox. Unfortunately, this isn’t particularly useful when automated processing of MMS messages is required, especially when each provider modifies the message in it’s own way.
The MMS mail parser approaches the problem by utilising a two stage pass method. The first pass of the message processes the email into a standard MMS::Mail::Message object. This object contains all the message attachments, mail headers and any bodytext. This will contain all the MMS information required but it may not be in the most accessible form. The MMS mail parser then attempts to determine the provider the message was sent through (usually via the ‘From’ header) and passes the MMS::Mail::Message to the MMS::Mail::Provider class for that provider. The second pass involves the Provider class removing any signatures/advertising added by the network provider and also populates text fields with their correct values (e.g. UKVodafone sets the email subject to ‘You have received a message’ and encodes the actual subject of the MMS as part of a HTML attachment).
Installation
The MMS mail parser suite has three prerequisites for installation:
MIME::Parser HTML::TableExtract (optional - only required for some parsers) MIME::Entity Class::Accessor
The easiest way to install all the required modules is to use the CPAN bundle:
perl -MCPAN -e shell 'install Bundle::MMS::Mail::Parser'
NB: This will install ALL the provider mail parsers available. This may or may not be what you require (are you wanting just to parse messages from your own phone ? If so you only require one provider mail parser). The base required class set is:
<ul><li>MMS::Mail::Message</li> <li>MMS::Mail::Message::Parsed</li> <li>MMS::Mail::Parser</li> <li>MMS::Mail::Provider</li></ul>
Creating your own parser
Assuming you have everything installed, the next step is to write a parser. For a small installation the most common recipe for passing email to a Perl application is to use procmail. A typical .procmailrc entry would be:
:0: *^TO.*secret@yourdomain.com* |/some/dir/mmsparser.pl
A basic parser
This example can be used with the procmail recipe to retrieve the subject and bodytext of an MMS mail message and print it to STDOUT. It provides a simple recipe for parser usage:
use MMS::Mail::Parser;
my $mms = MMS::Mail::Parser->new();
my $message = $mms->parse(\*STDIN);
if (defined($message)) {
my $parsed = $mms->provider_parse;
print $parsed->header_subject."\n";
print $parsed->body_text."\n";
} else {
print STDERR "Invalid message\n";
}
This parser receives input from STDIN and parses the message to produce an MMS::Mail::Message object if the message was valid, otherwise an undef value is returned. The MMS can then be accessed via the newly created MMS::Mail::Message object. The second stage parse is invoked via the provider_parse method and passes the MMS::Mail::Message object from the first stage to an MMS::Mail::Provider static class method that ‘knows’ the structure of the mail and can interpret it appropriately. This results in a MMS::Mail::Message::Parsed object that has any provider signature/advertising removed.
What to do when it goes wrong
The MMS mail parser class maintains an error stack that stores all the errors that occur during a parse. The strategy of the class is never to die/carp but to return an undef value if an error occurs. The parser object can then be interrogated via the error method to return a complete list of errors as an array reference. The parser object also has a lasterror method to pop the last reported error off the error stack:
my $message = $mmsparser->parse(\*STDIN);
unless (defined($message)) {
print STDERR $mmsparser->last_error."\n";
exit(0);
}
For more detailed information on MMS mail parsing, the debug property can be set either via the class new constructor or the object debug property. A value of 0 turns off debug output (the default) and a value of 1 enables it. The debug output is passed to STDERR. It provides output similar to that illustrated below:
Created MIME::Parser Created MMS::Mail::Message Parsing MIME Message Parsed Headers Recursing through message parts Message contains text/plain Message contains image/jpeg Adding attachment to stack Preparing to loop through multipart stack Parsed message Parsed message is valid Created Provider Parser Returning MMS::Mail::Message::Parsed
Using your own MIME::Parser
The first stage parse uses MIME::Parser to parse the email. MIME::Parser has a number of configuration options and a MIME parser object can be passed to the MMS mail parser either via the new constructor or the mime_parser method:
my $parser = new MIME::Parser; $parser->output_to_core(1); my $mmsparser = MMS::MailParser->new(mimeparser=>$parser);
This would set the MIME parser used by the MMS mail parser to store all messages in memory rather than the default (keep all messages on disk) which should provide faster parsing but has a much heavier memory footprint.
Stripping unwanted characters from the MMS::Mail::Message
In many cases a single line of text is required for a photo description (without any new line or carriage return characters). This can be achieved by using the strip_characters method provided by the MMS::Mail::Parser class. The strip_characters property is passed to any MMS::Mail::Message and MMS::Mail::Message::Parsed object created by the parser, these objects then use this data in a character class regular expression /[]/m to strip any of the supplied characters from the header fields and the body text of the message. A typical strip expression is shown below:
$mmsparser->strip_characters("\r\n");
For a more granular approach to instance property modification the cleanse_map method can be used to set regular expressions or subroutine references to be run against the header fields or body text properties. An example using both a regular expression and a subroutine reference is shown below:
sub header_parse {
my $data =shift;
$data =~ s/\n//g;
return $data;
}
my $cm = { header_subject => \&header_parse,
body_text => 's/\'//g'
};
$mmsparser->cleanse_map($cm)
A more complete example
The simple parser recipe above illustrates how to retrieve some of the MMS text and passes it to STDOUT. What about all pictures I hear you cry ??!
In this instance the example application recieves an MMS email message via procmail, parses it and posts the results to a Movable Type Photogallery image upload CGI so the images can be displayed in an MT gallery. The application can also be used with any email that has mage attachments and as such is a convenient posting interface for creating MT based image gallereis.
The application can be downloaded here. The README.txt included provides install, usage and disclaimer information.
Once the application has gathered the command line and configuration options it performs a two stage parse of the MMS using the output_dir method to specify that all decoded message attachments are stored in /tmp rather than the directory the application was invoked from. This ensures that as long as /tmp has enough disk space there won’t be any temporary files unable to be created due to disk resource constraints. It also specifies a cleanse_map to clean the header subject and body text fields of any unwanted characters.
At this point we have a parsed MMS mail message and it can now be operated on. The message is confirmed to be valid and checked to confirm that at least one image is attached. If images are attached then WWW::Mechanize is used to login to Movable type and then each image is uploaded to a Movable Type CGI. The option to generate unique filenames is provided and demonstrates how to access the parsed image files.
Writing your own provider class
There are many mobile phone network operators around the world and
only one of me (as far as I know at least). The format of the MMS
messages sent via the MMS gateways are subject to the whims of the
mobile operators and their marketing junkies people so there is always the possibility that the provider mail parser classes may become out of date.
If you see a new message format or a provider that isn’t supported then send an example to mmssample (at) monkeyhelper dot com and I’ll endeavour to get a provider class up really quickly. I will include the message as part of a test suite (phone number removed) – so please make the attachments viewer friendly ! Alternatively you could write your own provider class and save me the trouble :)
A provider class has an OO interface and is always a sub class of MMS::Mail::Provider. The main work horse of the class is the parse method and the only one that provider class authors should have to implement typically.
The parse method recieves an MMS::Mail::Message as an input and transforms it to return an MMS::Mail::Message parsed. The other responsibilities of the parse method are to populate the images and videos properties of the MMS::Mail::Message::Parsed object (using the addvideo and addimage methods or setting the properties in one go using the images and videos methods), populating the mobile_number property using the super class retrieve_phone_number method and populating the header_subject and body_text properties. The retrieve_photo_number method splits the from header on the @ symbol and returns the first element replacing any leading + with 00. The example below removes ‘Marketing Junkie’ from the message body text but otherwise just populates the pictures, videos, phone_number and header_subject properties of the MMS::Mail::Message::Parsed object.
sub parse {
my $self = shift;
my $message = shift;
my $parsed = new MMS::Mail::Message::Parsed($message);
# Populate header_subject
$parsed->header_subject($message->header_subject);
# Remove 'Marketing Junkie' lines
my @lines = split /\n/, $message->header_text;
my $text = '';
foreach my $line (@lines) {
$text .= $line unless $line =~/Marketing Junkie/;
}
$parsed->body_text = $text;
# Populate video and image quick access properties
$parsed->images($parsed->retrieve_attachments('^image'));
$parsed->videos($parsed->retrieve_attachments('^video'));
$parsed->phone_number($self->retrieve_phone_number($message->header_from))
return $parsed;
}
When a new MMS::Mail::Message::Parsed object is created and passed an MMS::Mail::Message it automagically clones the data in the object passed to it. This means the new MMS::Mail::Message::Parsed object already has all it’s header properties populated and has all the attachments the original object had accessible via it’s attachments method – the exception to this rule is the header_subject and body_text properties. The reason header_subject and body_text are exluded is that these are the fields typically manipulated by the MMS network providers and thus automatically ‘inheriting’ the values from the parent MMS::Mail::Message object could result in unwanted text in these fields.
If you have written a provider class for a specific purpose and don’t want to submit it for inclusion in the Bundle::MMS::Mail::Parser suite, then you can still utilise it by using the provider method of the MMS::Mail::Parser class. This method allows you to force the second stage provider_parse method to use the provider class of your choice. After the first stage parse you have an MMS::Mail::Message object which can be queried, so if you were receiving MMS messages from a small/new provider that wasn’t supported by the Bundle::MMS::Mail::Parser suite, you could easily write your own parser and enable it to be used when appropriate. Imagine your new MMS mail message type was sent from mms.newprovider.co.uk, then you could write a provider detection routine similar to that shown below:
use MMS::Mail::Parser;
use YOURPACKAGE::NewProvider;
my $mms = MMS::Mail::Parser->new();
my $message = $mms->parse(\*STDIN);
if (defined($message)) {
if ($message->header_from = /mms.newprovider.co.uk/) {
my $newprovider = new YOURPACKAGE::NewProvider;
$parsed->provider($newprovider);
}
my $parsed = $mms->provider_parse;
print $parsed->header_subject."\n";
print $parsed->body_text."\n";
} else {
print STDERR "Invalid message\n";
}
Summary
I hope this gives a good overview to help you in creating your own MMS mail based web services and please let me know how you use the software. Finally, that plea again, if you are on a Network that doesn’t have a provider mail parser class then please send me an example MMS (with pictures/video) and I’ll write one and add it to the suite. I’m also working on compiling a complete set of example messages for all networks so there is a reference set for everyone – have fun !