A number of people have reported problems receiving mail from the "Grahamstown Parents' Network" mailing list.

We suspect that the reason that Rhodes seems to be singled out is simply that we have the largest number of users in Grahamstown, and consequently we receive more email directed at the GPN than other organisations. However, the problem is not actually specific to Rhodes, nor is it caused by the configuration of Rhodes' mail servers. The root cause of these problems is that the mailing list software Sue Powers is using to implement this mailing list does not correctly comply with Internet standards, and behaves in a way that is typical of mass-marketing or SPAM-generation software.

The following is some of the correspondence between ourselves and Sue Powers with respect to delivery of mail to the GPN "mailing list".

QUOTE(Rhodes @ Mar 26 2012, 3:49:PM)
Our logs show a definite change in configuration between 14 February and 21 February:

2012-02-14 12:41:22 [78132] 1RxFoj-000KKC-KV <= bounce_232575@directreach.co.za H=mail.bizsuite.co.za []:62048 I=[]:25 P=smtp S=1358937 id=dd1fab9eedf94196b103a5cc0763b31d@smokesignal.co.za T="Grahamstown Parents' Network - Newsletter - Tuesday, 14 February 2012" from <mailto:bounce_232575@directreach.co.za> <bounce_232575@directreach.co.za> for y.irwin@ru.ac.za

2012-02-21 14:04:42 [91880] 1RzoSG-000Ntw-Er <= bounce_234656@directreach.co.za H=mail.itbsoftware.net (FIREHAWK) []:52655 I=[]:25 P=smtp S=188288 id=cb8a971f1e53445aae706afe214a335d@smokesignal.co.za T="Grahamstown Parents' Network - Newsletter - Tuesday, 21 February 2012" from <mailto:bounce_234656@directreach.co.za> <bounce_234656@directreach.co.za> for y.irwin@ru.ac.za

Notice that the remote servers' IP address and name has changed from mail.bizsuite.co.za to mail.itbsoftware.net. There are other significant changes between the two that might partially explain the problem (most importantly, that the remote host no longer provides and standards-compliant HELO message - viz FIREHAWK above).

So the question becomes what did Sue Powers' ISP do between 14 February and 21 February? She'll need to take this up with them.

QUOTE(ITB Software @ 26 March 26 2012, 05:34 PM)
Thanks for this info. The retry interval that guy has suggested we increase is, by default, intentionally set to retry every 30 minutes, and fail after 2 hours (ie: 4 attempts) as our configuration is not a “standard mail server”. This is a bulk delivery server, intended to deliver marketing emails as rapidly as possible. So, if we retry messages for more than a few hours, then there is potential that other users mail gets slowed down, which could have negative impacts for time sensitive emails being sent from the system. I have reset your setup to retry every 30 minutes for 12 hours. (24 Attempts)

The other issue (HELO) was a temporary issue that was occurring when we initially upgraded our servers, this should now be resolved (as of early March)

QUOTE(Rhodes @ Mar 29 2012, 5:03:PM)
On Thu Mar 29 09:07:21 2012, Powers Family wrote:
> I’m afraid the issue hasn’t been resolved though. There are a number
> of Rhodes staff who are telling me they didn’t receive Tuesday’s
> Newsletter. Is there anything else that can be done please?

I'm guessing the problem relates to the way they're using VERP.

For each message that's sent to your list, their server would appear to be trying to open several hundred concurrent connections to our mail server. By looks of things, they're using one connection per recipient. They'll be doing this because they're sending every message with a VERP envelope (see http://en.wikipedia.org/wiki/Variable_enve...#Disadvantages)

Thus this mailing list is effectively contributing to a denial of service attack against our incoming mail exchanger.

To mitigate the risk this causes, we limit the maximum number of parallel connections we will process to 100. By opening more connections than this, their mail server is triggering a temporary error:

2012-03-27 14:59:45 [1225] temporarily refused connection from mail.itbsoftware.net []:60368 I=[]:25: connected=102 max=120 reserve=20

This error will continue until such time as the total connection count drops. The effect of this is that mail will trickle through (for each connection that closes, we'll allow another to open).

The limit of 100 has historically proven a very reasonable compromise -- we have thousands of users subscribed to thousands of different mailing lists, and this is the only one we're aware of that behaves this way.

QUOTE(Rhodes @ Apr 03 2012, 10:30:PM)
Against [our] better judgement, [we] made a change to our mail configuration mid-way through last week that might have affected these deliveries. What [we] did was tell our mail server to continue to accept & queue mail when it was overloaded, rather than try to defer the mail.

This will inevitably increase the amount of SPAM that Rhodes users receive, since it is now possible to defeat our SPAM detection by simply sending a trying to deliver enough SPAM. However, given the way the mail server you're using behaves, this is about the only way to make things work.

I looked through our logs today, and notice that we did not defer a single delivery attempt for your newsletter. However I also noticed that the remote server you're using does not correctly complete SMTP transactions:

2012-04-03 16:04:12 [7059] 1SF4HH-0001pr-9K SMTP connection lost after final dot H=mail.itbsoftware.net (mail.bizsuite.co.za) []:58414 I=[]:25 P=esmtp

If correctly complied with Internet standards, we'd expect it to issue a QUIT command to indicate it had finished sending mail rather than simply closing the connection. Again, looking back through our logs, this is a fairly recent phenomenon.

On 26 March 2012 05:34 PM, Dave Long [mailto:david@itbsoftware.co.za] wrote:
> configuration is not a “standard mail server”. This is a bulk
> delivery server, intended to deliver marketing emails as rapidly as

I really think the above says it all. This isn't a mailing list; it is a SPAM engine... Or at least, that's how it behaves. It's modus operandi is very typical of spammers, and thus triggers the same heuristic checks that are used to detect incoming SPAM.

Note that your message to the GPN itself about what is happening was factually incorrect. We'll accept mail to tens or hundreds of thousands of people subscribed to a single mailing list -- our largest internal mailing list has some twelve thousand subscribers, and usually delivers successfully to all subscribers within half an hour.

What we will not do is accept more than a hundred separate email messages from a single source concurrently. (The concurrently is the important point.)

When you send a message to a mailing list, the mailing list software usually does a combination of two things:

The first is that it batches recipients: rather than sending ten thousand separate email messages to ten thousand subscribers, it'll try and batch all the subscribers who share a common domain (@ru.ac.za in this case) and send a single email with multiple recipients (think BCC). The standards impose a limit of a hundred recipients per message, so most software will usually create one email per hundred subscribers from a given domain. (i.e. to send the same message to a ten thousand subscribers at Rhodes, you need only generate a hundred emails each with a hundred recipients.)

The second is that it sends multiple messages to the same mail server sequentially rather than in parallel. So rather than try and deliver the hundred messages above in a single go, it'll send them one after each other. At worst it'll try a handful of parallel connections.

These are both optimisations that allow a large amount of mail to be delivered to a large number of people in a relatively short time. However, even if the mailing list doesn't do the first, so long as it does the second things will usually work correctly.

Mailing list software does this stuff behind your back, so you don't need to worry about it. It's why people use mailing lists, rather than simply sending mail to a large distribution list. Simple distribution lists start to break when you have anything more than about a hundred recipients.

The "mailing list" software that you're using appears to do neither of these things, which is what's causing these problems.

Instead, to distribute a single message to ten thousand subscribers it creates and sends ten thousand separate emails. Worse than that, it appears to try and deliver those messages by opening as many concurrent connections as it can. (i.e it tries to deliver all ten thousand emails at the same time, rather than one after each other.)

When it does this to Rhodes' mail servers, we'll allow it to open up to a hundred concurrent connections. Given that most mailing list software opens less than ten simultaneous connections (even for hundreds of thousands of deliveries), this is a generous limit.

When the software tries to open the hundred and first *concurrent* connection, we issue a message that indicates that the remote mail server should back off and try again later. This isn't because we're saying that they should only send mail to a hundred people; what we're saying is don't try and send more than a hundred separate mail messages *at once*. Standards compliant mail servers will simple retry the deliveries at a later time, and those deliveries will usually succeed.

Rhodes processes hundreds of thousands of email messages a day, some coming from lists that are much larger than the GPN. Yesterday alone we accepted, processed and delivered 56581 messages, and rejected another 201613 as SPAM. Our own internal student mailing list delivered several messages to each of 4560 subscribers, whilst being subject to *exactly* the same restrictions as your mailing list.

So the idea of a small local mailing list really doesn't worry us -- I suspect facebook.com generates more mail to more recipients in a single day than your mailing list generates in a month. The problem here is that whatever software your ISP is using doesn't behave like a typical mailing list; it behaves a lot more like a typical spammer.