BTW, have you tried "kregexpeditor" ?
Nik
Am Mittwoch, 23. März 2016 schrieb E. Liddell:
On Wed, 23 Mar 2016 15:58:39 +0900
Michele Calgaro <michele.calgaro(a)yahoo.it> wrote:
On 2016/03/23 02:19 PM, Gene Heskett wrote:
> On Wednesday 23 March 2016 00:32:17 Michele Calgaro wrote:
>
>> On 2016/03/23 12:44 PM, Gene Heskett wrote:
>>> Greetings;
>>>
>>> I use mailfilter as a prefilter in front of fetchmail to nuke some
>>> spam while its still on the server.
>>>
>>> But its missing hits on what I suspect is the From: or Return-Path:
>>> strings that have quotation marks in the string because the string
>>> is being spec'd by being surrounded by "show this name" bs.
>>>
>>> I've added the character < as part of the string its to search for,
>>> so the search string now looks like "From:.*<*\.unwanted-tld".
Does
>>> this stand that famous snow balls chance in hell of working well
>>> with or without a quoted "some funkity name" in front of the real
>>> url with the <> around it?
>>>
>>> I just love the lack of documentation on how this string comparison
>>> stuff works as shown by the man pages for grep and regex. All sorts
>>> of control options are well covered, but figureing out how to write
>>> a search expression must be one of the worlds better guarded
>>> secrets.
>>>
>>> So if someone could show me, or give a url that actually has the
>>> full docs, I'd be greatfull.
>>>
>>> Thanks.
>>>
>>> Cheers, Gene Heskett
>>
>> Hi Gene,
>> "From:.*<*\.unwanted-tld" will match a string like this (I have
put
>> one section per line to be cleaer): From:
>> whatever character
>> 0 or more <
>> .unwanted-tld
>>
> I thought I wanted 1 only, but the way these lowlifes change addresses
> and names hourly, they may remove the <> surrounding the real source
> address and screw me up. But the fact that they often put dbl-qoutes
> around the throwaway part of the url, is I think screwing me regardless.
>
> What we need is the ability to specify the quote character by the first
> non-space character after the DENY =, which is currently a "^ or a <>
> which apparently inverts the logic. So a typical line would be
>
> DENY = "^From:.*<*\.bid"
>
> Substitute any of the new tld's for bid that gets obnoxious. Like xyz,
> or .pro, heck that new list is several dozen tld's.
>
> But AFAIK, we're stuck with the dblquote wrapper around the string to
> match. Grrrr.
>
>> It is greedy, so it will scan until the last < if there are more than
>> one. Not sure if this is what you need or not. If you can post an
>> example of what you need to match, I can workout another regex if
>> required.
>>
> Try this:
>
> "-Bed Bugs-" <-BedBugs-(a)agma69.top>
>
> with Return-Path.* or From.* in front of it. Or does that - sign, 4 of
> them, need escaping with a \ ? IDK.
Hyphens should only need an escape if within a character class, denoted by
square brackets.
I
converted about 3 lines of the filterdata file that way, and I'm now
waiting for the next blast of spam to serve as test data. mailfilter is
a picky twit, but that hasn't given it a tummy ache either, so I am
hopefull.
PS: by the way, the internet is full of excellent
documentation about
regex ;-) For example "http://www.regular-expressions.info/"
Cheers, Gene Heskett
Hi Gene,
so if I understand correctly, you already had a set of rules like
DENY = "^From:.*\.bid" (bid stands for any tld of yuor choice)
but it was missing some entries because of the "..." entry before the domain.
So you put the < in the string as well.
Right?
Assuming so, it surprises me that the original version missed some entries, since the
additional "..." field would have
already been matched by the .* part of the pattern.
I think there is a different reason for missing entries. Perhaps a black character before
"From:"? Could it be?
You could try this other version:
DENY = "^\s*From:.*\.bid" which ignores any separator before From:
That would also sweep up, say, fred(a)mail.bidders.com, or
"I.bid" <ibid(a)nowhere.org>
or
DENY = "^\s*From:.*\.bid>" which also makes explicit that the tld is
followed by a >.
I'd cover the example as
^\W*((From:)|(Return-Path:)).*\.bid\W*$
which works out to zero or more non-word characters at the beginning of the string,
followed by "From:" or "Return-Path:" followed by zero or more
unknowns, followed
by ".bid", followed by zero or more non-word characters, followed by the end of
the
string. "Word" characters are alphanumerics, some connectors like _-, and
possibly
some non-ASCII depending on the implementation, so "non-word" covers stuff
like
punctuation and whitespace. Marking the end of the string makes it more likely
you're
getting the TLD and not some random bit in the middle that was designed as a parser
torture-test.
If you want to get really silly,
^\W*((From:)|(Return-Path:)).*\.[^cCoOnN][a-zA-Z][a-zA-Z]+\W*$
ought to catch the majority of TLDs with a 3+ ASCII character extension
that isn't .com, .org, or .net, but without a larger sample of "good" and
"bad" addresses, I can't guarantee no false positives.
I write a lot of regexes in my day job (which is not to say that I get them right the
first time, every time!) Assuming a Perl-compatible implementation (which most
of them are, more or less), "man perlre" is a decent reference for the
complicated
bits. Just scroll past the section on modifiers.
E. Liddell
---------------------------------------------------------------------
To unsubscribe, e-mail: trinity-users-unsubscribe(a)lists.pearsoncomputing.net
For additional commands, e-mail: trinity-users-help(a)lists.pearsoncomputing.net
Read list messages on the web archive:
http://trinity-users.pearsoncomputing.net/
Please remember not to top-post:
http://trinity.pearsoncomputing.net/mailing_lists/#top-posting
--
Please do not email me anything that you are not comfortable also sharing with the NSA.