April 3, 2011, 10:24 p.m.
IT

Regex and posix utilities simplifies life

I had a small problem just now. In trying to convert email from Outlook 2011 on the Mac to mbox format, I used the brilliant eml2mbox Ruby script on the directory of files I created by dragging and dropping all email to the Finder, to convert them from eml format to mbox. I did this otherwise MailSteward imports them with a billion different mailbox names, as it uses the subject line as the mailbox.

I had several emails however that were sent from the iPhone, and their headers were wrong. Look at an example:

waldo@waldopcm fixed $ head Zipped.eml
From: Waldo Nell
    <IMCEAEX-_O=FHB_OU=FIRST+20ADMINISTRATIVE+20GROUP_CN=RECIPIENTS_CN=hhh@xxx.xx>
To: Some One
    <IMCEAEX-_O=FHB_OU=FIRST+20ADMINISTRATIVE+20GROUP_CN=RECIPIENTS_CN=RRRR@ddd.yy>
Content-Class: urn:content-classes:message
Date: Wed, 26 Nov 2008 10:44:58 -0600
Subject: Zipped
Thread-Topic: Zipped
Thread-Index: AclP5lEwDZMQ7G1VMUiyvRhJDoqWmw==
Message-ID: <A552D92C.2D3%hhh@xxx.xx>
...

As you can see, the first four lines should actually look like this:

waldo@waldopcm fixed $ head Zipped.eml
From: Waldo Nell  <IMCEAEX-_O=FHB_OU=FIRST+20ADMINISTRATIVE+20GROUP_CN=RECIPIENTS_CN=hhh@xxx.xx>
To: Some One <IMCEAEX-_O=FHB_OU=FIRST+20ADMINISTRATIVE+20GROUP_CN=RECIPIENTS_CN=RRRR@ddd.yy>
Content-Class: urn:content-classes:message
...

Here is the regex I used on a file containing the list of file names that were incorrect (applied in vi):

%s/^\(.*)$/head -n 2 "\1" | tr  -d '\\r\\n' > "fixed\/\1"; echo "" >> 
    "fixed\/\1"; head -n 4 "\1" | tail -n 2 | tr  -d '\\r\\n' >> 
    "fixed\/\1"; echo "" >> "fixed\/\1"; awk 'FNR>4' "\1" >> "fixed\/\1"/g

Cool hey?