Hopeing I can find a regex expert here

List overview All Threads
Download

newer

older

curiosity v cat

Why do I see the entire contents...

Gene Heskett

23 Mar 2016 23 Mar '16

3:44 a.m.

Greetings;

I use mailfilter as a prefilter in front of fetchmail to nuke some spam while its still on the server.

But its missing hits on what I suspect is the From: or Return-Path: strings that have quotation marks in the string because the string is being spec'd by being surrounded by "show this name" bs.

I've added the character < as part of the string its to search for, so the search string now looks like "From:.*<*.unwanted-tld". Does this stand that famous snow balls chance in hell of working well with or without a quoted "some funkity name" in front of the real url with the <> around it?

I just love the lack of documentation on how this string comparison stuff works as shown by the man pages for grep and regex. All sorts of control options are well covered, but figureing out how to write a search expression must be one of the worlds better guarded secrets.

So if someone could show me, or give a url that actually has the full docs, I'd be greatfull.

Thanks.

Cheers, Gene Heskett

-- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Genes Web page http://geneslinuxbox.net:6309/gene

Show replies by date

Michele Calgaro

23 Mar 23 Mar

4:32 a.m.

New subject: [trinity-users] Hopeing I can find a regex expert here

On 2016/03/23 12:44 PM, Gene Heskett wrote:

...

Greetings;

I use mailfilter as a prefilter in front of fetchmail to nuke some spam while its still on the server.

But its missing hits on what I suspect is the From: or Return-Path: strings that have quotation marks in the string because the string is being spec'd by being surrounded by "show this name" bs.

I've added the character < as part of the string its to search for, so the search string now looks like "From:.*<*.unwanted-tld". Does this stand that famous snow balls chance in hell of working well with or without a quoted "some funkity name" in front of the real url with the <> around it?

I just love the lack of documentation on how this string comparison stuff works as shown by the man pages for grep and regex. All sorts of control options are well covered, but figureing out how to write a search expression must be one of the worlds better guarded secrets.

So if someone could show me, or give a url that actually has the full docs, I'd be greatfull.

Thanks.

Cheers, Gene Heskett

Hi Gene, "From:.*<*.unwanted-tld" will match a string like this (I have put one section per line to be cleaer): From: whatever character 0 or more < .unwanted-tld

It is greedy, so it will scan until the last < if there are more than one. Not sure if this is what you need or not. If you can post an example of what you need to match, I can workout another regex if required.

Cheers Michele

PS: by the way, the internet is full of excellent documentation about regex ;-) For example "http://www.regular-expressions.info/"

Gene Heskett

5:19 a.m.

New subject: [trinity-users] Hopeing I can find a regex expert here

On Wednesday 23 March 2016 00:32:17 Michele Calgaro wrote:

...

On 2016/03/23 12:44 PM, Gene Heskett wrote:

...
Greetings;

I use mailfilter as a prefilter in front of fetchmail to nuke some spam while its still on the server.

But its missing hits on what I suspect is the From: or Return-Path: strings that have quotation marks in the string because the string is being spec'd by being surrounded by "show this name" bs.

I've added the character < as part of the string its to search for, so the search string now looks like "From:.*<*.unwanted-tld". Does this stand that famous snow balls chance in hell of working well with or without a quoted "some funkity name" in front of the real url with the <> around it?

I just love the lack of documentation on how this string comparison stuff works as shown by the man pages for grep and regex. All sorts of control options are well covered, but figureing out how to write a search expression must be one of the worlds better guarded secrets.

So if someone could show me, or give a url that actually has the full docs, I'd be greatfull.

Thanks.

Cheers, Gene Heskett

Hi Gene, "From:.*<*.unwanted-tld" will match a string like this (I have put one section per line to be cleaer): From: whatever character 0 or more < .unwanted-tld

I thought I wanted 1 only, but the way these lowlifes change addresses and names hourly, they may remove the <> surrounding the real source address and screw me up. But the fact that they often put dbl-qoutes around the throwaway part of the url, is I think screwing me regardless.

What we need is the ability to specify the quote character by the first non-space character after the DENY =, which is currently a "^ or a <> which apparently inverts the logic. So a typical line would be

DENY = "^From:.*<*.bid"

Substitute any of the new tld's for bid that gets obnoxious. Like xyz, or .pro, heck that new list is several dozen tld's.

But AFAIK, we're stuck with the dblquote wrapper around the string to match. Grrrr.

...

It is greedy, so it will scan until the last < if there are more than one. Not sure if this is what you need or not. If you can post an example of what you need to match, I can workout another regex if required.

Try this:

"-Bed Bugs-" -BedBugs-@agma69.top

with Return-Path.* or From.* in front of it. Or does that - sign, 4 of them, need escaping with a \ ? IDK.

Thanks Michelle.

...

Cheers Michele

I converted about 3 lines of the filterdata file that way, and I'm now waiting for the next blast of spam to serve as test data. mailfilter is a picky twit, but that hasn't given it a tummy ache either, so I am hopefull.

...

PS: by the way, the internet is full of excellent documentation about regex ;-) For example "http://www.regular-expressions.info/"

Cheers, Gene Heskett

-- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Genes Web page http://geneslinuxbox.net:6309/gene

Michele Calgaro

6:58 a.m.

New subject: [trinity-users] Hopeing I can find a regex expert here

On 2016/03/23 02:19 PM, Gene Heskett wrote:

...

On Wednesday 23 March 2016 00:32:17 Michele Calgaro wrote:

...
On 2016/03/23 12:44 PM, Gene Heskett wrote:

...
Greetings;

I use mailfilter as a prefilter in front of fetchmail to nuke some spam while its still on the server.

But its missing hits on what I suspect is the From: or Return-Path: strings that have quotation marks in the string because the string is being spec'd by being surrounded by "show this name" bs.

I've added the character < as part of the string its to search for, so the search string now looks like "From:.*<*.unwanted-tld". Does this stand that famous snow balls chance in hell of working well with or without a quoted "some funkity name" in front of the real url with the <> around it?

I just love the lack of documentation on how this string comparison stuff works as shown by the man pages for grep and regex. All sorts of control options are well covered, but figureing out how to write a search expression must be one of the worlds better guarded secrets.

So if someone could show me, or give a url that actually has the full docs, I'd be greatfull.

Thanks.

Cheers, Gene Heskett

Hi Gene, "From:.*<*.unwanted-tld" will match a string like this (I have put one section per line to be cleaer): From: whatever character 0 or more < .unwanted-tld

I thought I wanted 1 only, but the way these lowlifes change addresses and names hourly, they may remove the <> surrounding the real source address and screw me up. But the fact that they often put dbl-qoutes around the throwaway part of the url, is I think screwing me regardless.

What we need is the ability to specify the quote character by the first non-space character after the DENY =, which is currently a "^ or a <> which apparently inverts the logic. So a typical line would be

DENY = "^From:.*<*.bid"

Substitute any of the new tld's for bid that gets obnoxious. Like xyz, or .pro, heck that new list is several dozen tld's.

But AFAIK, we're stuck with the dblquote wrapper around the string to match. Grrrr.

...
It is greedy, so it will scan until the last < if there are more than one. Not sure if this is what you need or not. If you can post an example of what you need to match, I can workout another regex if required.

Try this:

"-Bed Bugs-" -BedBugs-@agma69.top

with Return-Path.* or From.* in front of it. Or does that - sign, 4 of them, need escaping with a \ ? IDK.

Thanks Michelle.

...
Cheers Michele

I converted about 3 lines of the filterdata file that way, and I'm now waiting for the next blast of spam to serve as test data. mailfilter is a picky twit, but that hasn't given it a tummy ache either, so I am hopefull.

...
PS: by the way, the internet is full of excellent documentation about regex ;-) For example "http://www.regular-expressions.info/"

Cheers, Gene Heskett

Hi Gene, so if I understand correctly, you already had a set of rules like DENY = "^From:.*.bid" (bid stands for any tld of yuor choice) but it was missing some entries because of the "..." entry before the domain. So you put the < in the string as well. Right?

Assuming so, it surprises me that the original version missed some entries, since the additional "..." field would have already been matched by the .* part of the pattern. I think there is a different reason for missing entries. Perhaps a black character before "From:"? Could it be? You could try this other version: DENY = "^\s*From:.*.bid" which ignores any separator before From: or DENY = "^\s*From:.*.bid>" which also makes explicit that the tld is followed by a >.

By the way, by "missing some entries" you mean that it is not filtering all the spam or that it is filtering some good emails as well?

Final note, your current modified version if no different from the original, since <* (0 or more <) is preceded by .* (any sequence of character). Perhaps you wanted to make <.*, but it would make no difference either, except for being morerestrictive (i.e. there must be a < somewhere before the forbidden tld).

Cheers Michele

E. Liddell

11:22 a.m.

New subject: [trinity-users] Hopeing I can find a regex expert here

On Wed, 23 Mar 2016 15:58:39 +0900 Michele Calgaro michele.calgaro@yahoo.it wrote:

...

On 2016/03/23 02:19 PM, Gene Heskett wrote:

...
On Wednesday 23 March 2016 00:32:17 Michele Calgaro wrote:

...
On 2016/03/23 12:44 PM, Gene Heskett wrote:

...
Greetings;

I use mailfilter as a prefilter in front of fetchmail to nuke some spam while its still on the server.

But its missing hits on what I suspect is the From: or Return-Path: strings that have quotation marks in the string because the string is being spec'd by being surrounded by "show this name" bs.

I've added the character < as part of the string its to search for, so the search string now looks like "From:.*<*.unwanted-tld". Does this stand that famous snow balls chance in hell of working well with or without a quoted "some funkity name" in front of the real url with the <> around it?

I just love the lack of documentation on how this string comparison stuff works as shown by the man pages for grep and regex. All sorts of control options are well covered, but figureing out how to write a search expression must be one of the worlds better guarded secrets.

So if someone could show me, or give a url that actually has the full docs, I'd be greatfull.

Thanks.

Cheers, Gene Heskett

Hi Gene, "From:.*<*.unwanted-tld" will match a string like this (I have put one section per line to be cleaer): From: whatever character 0 or more < .unwanted-tld

I thought I wanted 1 only, but the way these lowlifes change addresses and names hourly, they may remove the <> surrounding the real source address and screw me up. But the fact that they often put dbl-qoutes around the throwaway part of the url, is I think screwing me regardless.

What we need is the ability to specify the quote character by the first non-space character after the DENY =, which is currently a "^ or a <> which apparently inverts the logic. So a typical line would be

DENY = "^From:.*<*.bid"

Substitute any of the new tld's for bid that gets obnoxious. Like xyz, or .pro, heck that new list is several dozen tld's.

But AFAIK, we're stuck with the dblquote wrapper around the string to match. Grrrr.

...
It is greedy, so it will scan until the last < if there are more than one. Not sure if this is what you need or not. If you can post an example of what you need to match, I can workout another regex if required.

Try this:

"-Bed Bugs-" -BedBugs-@agma69.top

with Return-Path.* or From.* in front of it. Or does that - sign, 4 of them, need escaping with a \ ? IDK.

Hyphens should only need an escape if within a character class, denoted by square brackets.

...

...
I converted about 3 lines of the filterdata file that way, and I'm now waiting for the next blast of spam to serve as test data. mailfilter is a picky twit, but that hasn't given it a tummy ache either, so I am hopefull.

...
PS: by the way, the internet is full of excellent documentation about regex ;-) For example "http://www.regular-expressions.info/"

Cheers, Gene Heskett

Hi Gene, so if I understand correctly, you already had a set of rules like DENY = "^From:.*.bid" (bid stands for any tld of yuor choice) but it was missing some entries because of the "..." entry before the domain. So you put the < in the string as well. Right?

Assuming so, it surprises me that the original version missed some entries, since the additional "..." field would have already been matched by the .* part of the pattern. I think there is a different reason for missing entries. Perhaps a black character before "From:"? Could it be? You could try this other version: DENY = "^\s*From:.*.bid" which ignores any separator before From:

That would also sweep up, say, fred@mail.bidders.com, or "I.bid" ibid@nowhere.org

...

or DENY = "^\s*From:.*.bid>" which also makes explicit that the tld is followed by a >.

I'd cover the example as

^\W*((From:)|(Return-Path:)).*.bid\W*$

which works out to zero or more non-word characters at the beginning of the string, followed by "From:" or "Return-Path:" followed by zero or more unknowns, followed by ".bid", followed by zero or more non-word characters, followed by the end of the string. "Word" characters are alphanumerics, some connectors like _-, and possibly some non-ASCII depending on the implementation, so "non-word" covers stuff like punctuation and whitespace. Marking the end of the string makes it more likely you're getting the TLD and not some random bit in the middle that was designed as a parser torture-test.

If you want to get really silly,

^\W*((From:)|(Return-Path:)).*.[^cCoOnN][a-zA-Z][a-zA-Z]+\W*$

ought to catch the majority of TLDs with a 3+ ASCII character extension that isn't .com, .org, or .net, but without a larger sample of "good" and "bad" addresses, I can't guarantee no false positives.

I write a lot of regexes in my day job (which is not to say that I get them right the first time, every time!) Assuming a Perl-compatible implementation (which most of them are, more or less), "man perlre" is a decent reference for the complicated bits. Just scroll past the section on modifiers.

E. Liddell

Dr. Nikolaus Klepp

11:49 a.m.

New subject: [trinity-users] Hopeing I can find a regex expert here

BTW, have you tried "kregexpeditor" ?

Nik

Am Mittwoch, 23. März 2016 schrieb E. Liddell:

...

On Wed, 23 Mar 2016 15:58:39 +0900 Michele Calgaro michele.calgaro@yahoo.it wrote:

...
On 2016/03/23 02:19 PM, Gene Heskett wrote:

...
On Wednesday 23 March 2016 00:32:17 Michele Calgaro wrote:

...
On 2016/03/23 12:44 PM, Gene Heskett wrote:

...
Greetings;

I use mailfilter as a prefilter in front of fetchmail to nuke some spam while its still on the server.

But its missing hits on what I suspect is the From: or Return-Path: strings that have quotation marks in the string because the string is being spec'd by being surrounded by "show this name" bs.

I've added the character < as part of the string its to search for, so the search string now looks like "From:.*<*.unwanted-tld". Does this stand that famous snow balls chance in hell of working well with or without a quoted "some funkity name" in front of the real url with the <> around it?

I just love the lack of documentation on how this string comparison stuff works as shown by the man pages for grep and regex. All sorts of control options are well covered, but figureing out how to write a search expression must be one of the worlds better guarded secrets.

So if someone could show me, or give a url that actually has the full docs, I'd be greatfull.

Thanks.

Cheers, Gene Heskett

Hi Gene, "From:.*<*.unwanted-tld" will match a string like this (I have put one section per line to be cleaer): From: whatever character 0 or more < .unwanted-tld

I thought I wanted 1 only, but the way these lowlifes change addresses and names hourly, they may remove the <> surrounding the real source address and screw me up. But the fact that they often put dbl-qoutes around the throwaway part of the url, is I think screwing me regardless.

What we need is the ability to specify the quote character by the first non-space character after the DENY =, which is currently a "^ or a <> which apparently inverts the logic. So a typical line would be

DENY = "^From:.*<*.bid"

Substitute any of the new tld's for bid that gets obnoxious. Like xyz, or .pro, heck that new list is several dozen tld's.

But AFAIK, we're stuck with the dblquote wrapper around the string to match. Grrrr.

...
It is greedy, so it will scan until the last < if there are more than one. Not sure if this is what you need or not. If you can post an example of what you need to match, I can workout another regex if required.

Try this:

"-Bed Bugs-" -BedBugs-@agma69.top

with Return-Path.* or From.* in front of it. Or does that - sign, 4 of them, need escaping with a \ ? IDK.

Hyphens should only need an escape if within a character class, denoted by square brackets.

...
...
I converted about 3 lines of the filterdata file that way, and I'm now waiting for the next blast of spam to serve as test data. mailfilter is a picky twit, but that hasn't given it a tummy ache either, so I am hopefull.

...
PS: by the way, the internet is full of excellent documentation about regex ;-) For example "http://www.regular-expressions.info/"

Cheers, Gene Heskett

Hi Gene, so if I understand correctly, you already had a set of rules like DENY = "^From:.*.bid" (bid stands for any tld of yuor choice) but it was missing some entries because of the "..." entry before the domain. So you put the < in the string as well. Right?

Assuming so, it surprises me that the original version missed some entries, since the additional "..." field would have already been matched by the .* part of the pattern. I think there is a different reason for missing entries. Perhaps a black character before "From:"? Could it be? You could try this other version: DENY = "^\s*From:.*.bid" which ignores any separator before From:

That would also sweep up, say, fred@mail.bidders.com, or "I.bid" ibid@nowhere.org

...
or DENY = "^\s*From:.*.bid>" which also makes explicit that the tld is followed by a >.

I'd cover the example as

^\W*((From:)|(Return-Path:)).*.bid\W*$

which works out to zero or more non-word characters at the beginning of the string, followed by "From:" or "Return-Path:" followed by zero or more unknowns, followed by ".bid", followed by zero or more non-word characters, followed by the end of the string. "Word" characters are alphanumerics, some connectors like _-, and possibly some non-ASCII depending on the implementation, so "non-word" covers stuff like punctuation and whitespace. Marking the end of the string makes it more likely you're getting the TLD and not some random bit in the middle that was designed as a parser torture-test.

If you want to get really silly,

^\W*((From:)|(Return-Path:)).*.[^cCoOnN][a-zA-Z][a-zA-Z]+\W*$

ought to catch the majority of TLDs with a 3+ ASCII character extension that isn't .com, .org, or .net, but without a larger sample of "good" and "bad" addresses, I can't guarantee no false positives.

I write a lot of regexes in my day job (which is not to say that I get them right the first time, every time!) Assuming a Perl-compatible implementation (which most of them are, more or less), "man perlre" is a decent reference for the complicated bits. Just scroll past the section on modifiers.

E. Liddell

To unsubscribe, e-mail: trinity-users-unsubscribe@lists.pearsoncomputing.net For additional commands, e-mail: trinity-users-help@lists.pearsoncomputing.net Read list messages on the web archive: http://trinity-users.pearsoncomputing.net/ Please remember not to top-post: http://trinity.pearsoncomputing.net/mailing_lists/#top-posting

-- Please do not email me anything that you are not comfortable also sharing with the NSA.

Gene Heskett

1:35 p.m.

New subject: [trinity-users] Hopeing I can find a regex expert here

On Wednesday 23 March 2016 07:22:03 E. Liddell wrote:

...

On Wed, 23 Mar 2016 15:58:39 +0900

Michele Calgaro michele.calgaro@yahoo.it wrote:

...
On 2016/03/23 02:19 PM, Gene Heskett wrote:

...
On Wednesday 23 March 2016 00:32:17 Michele Calgaro wrote:

...
On 2016/03/23 12:44 PM, Gene Heskett wrote:

...
Greetings;

I use mailfilter as a prefilter in front of fetchmail to nuke some spam while its still on the server.

But its missing hits on what I suspect is the From: or Return-Path: strings that have quotation marks in the string because the string is being spec'd by being surrounded by "show this name" bs.

I've added the character < as part of the string its to search for, so the search string now looks like "From:.*<*.unwanted-tld". Does this stand that famous snow balls chance in hell of working well with or without a quoted "some funkity name" in front of the real url with the <> around it?

I just love the lack of documentation on how this string comparison stuff works as shown by the man pages for grep and regex. All sorts of control options are well covered, but figureing out how to write a search expression must be one of the worlds better guarded secrets.

So if someone could show me, or give a url that actually has the full docs, I'd be greatfull.

Thanks.

Cheers, Gene Heskett

Hi Gene, "From:.*<*.unwanted-tld" will match a string like this (I have put one section per line to be cleaer): From: whatever character 0 or more < .unwanted-tld

I thought I wanted 1 only, but the way these lowlifes change addresses and names hourly, they may remove the <> surrounding the real source address and screw me up. But the fact that they often put dbl-qoutes around the throwaway part of the url, is I think screwing me regardless.

What we need is the ability to specify the quote character by the first non-space character after the DENY =, which is currently a "^ or a <> which apparently inverts the logic. So a typical line would be

DENY = "^From:.*<*.bid"

Substitute any of the new tld's for bid that gets obnoxious. Like xyz, or .pro, heck that new list is several dozen tld's.

But AFAIK, we're stuck with the dblquote wrapper around the string to match. Grrrr.

...
It is greedy, so it will scan until the last < if there are more than one. Not sure if this is what you need or not. If you can post an example of what you need to match, I can workout another regex if required.

Try this:

"-Bed Bugs-" -BedBugs-@agma69.top

with Return-Path.* or From.* in front of it. Or does that - sign, 4 of them, need escaping with a \ ? IDK.

Hyphens should only need an escape if within a character class, denoted by square brackets.

...
...
I converted about 3 lines of the filterdata file that way, and I'm now waiting for the next blast of spam to serve as test data. mailfilter is a picky twit, but that hasn't given it a tummy ache either, so I am hopefull.

...
PS: by the way, the internet is full of excellent documentation about regex ;-) For example "http://www.regular-expressions.info/"

Cheers, Gene Heskett

Hi Gene, so if I understand correctly, you already had a set of rules like DENY = "^From:.*.bid" (bid stands for any tld of yuor choice) but it was missing some entries because of the "..." entry before the domain. So you put the < in the string as well. Right?

Assuming so, it surprises me that the original version missed some entries, since the additional "..." field would have already been matched by the .* part of the pattern. I think there is a different reason for missing entries. Perhaps a black character before "From:"? Could it be? You could try this other version: DENY = "^\s*From:.*.bid" which ignores any separator before From:

That would also sweep up, say, fred@mail.bidders.com, or "I.bid" ibid@nowhere.org

...
or DENY = "^\s*From:.*.bid>" which also makes explicit that the tld is followed by a >.

I'd cover the example as

^\W*((From:)|(Return-Path:)).*.bid\W*$

which works out to zero or more non-word characters at the beginning of the string, followed by "From:" or "Return-Path:" followed by zero or more unknowns, followed by ".bid", followed by zero or more non-word characters, followed by the end of the string. "Word" characters are alphanumerics, some connectors like _-, and possibly some non-ASCII depending on the implementation, so "non-word" covers stuff like punctuation and whitespace. Marking the end of the string makes it more likely you're getting the TLD and not some random bit in the middle that was designed as a parser torture-test.

If you want to get really silly,

^\W*((From:)|(Return-Path:)).*.[^cCoOnN][a-zA-Z][a-zA-Z]+\W*$

ought to catch the majority of TLDs with a 3+ ASCII character extension that isn't .com, .org, or .net, but without a larger sample of "good" and "bad" addresses, I can't guarantee no false positives.

I write a lot of regexes in my day job (which is not to say that I get them right the first time, every time!) Assuming a Perl-compatible implementation (which most of them are, more or less), "man perlre" is a decent reference for the complicated bits. Just scroll past the section on modifiers.

E. Liddell

Now that looks like the regex bible, Thanks a bunch. That needs printed and placed in the middle of the house little room. :)

...

To unsubscribe, e-mail: trinity-users-unsubscribe@lists.pearsoncomputing.net For additional commands, e-mail: trinity-users-help@lists.pearsoncomputing.net Read list messages on the web archive: http://trinity-users.pearsoncomputing.net/ Please remember not to top-post: http://trinity.pearsoncomputing.net/mailing_lists/#top-posting

Cheers, Gene Heskett

-- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Genes Web page http://geneslinuxbox.net:6309/gene

Gene Heskett

2:03 p.m.

New subject: [trinity-users] Hopeing I can find a regex expert here

On Wednesday 23 March 2016 09:35:31 Gene Heskett wrote:

...

On Wednesday 23 March 2016 07:22:03 E. Liddell wrote:

...
On Wed, 23 Mar 2016 15:58:39 +0900

Michele Calgaro michele.calgaro@yahoo.it wrote:

...
On 2016/03/23 02:19 PM, Gene Heskett wrote:

...
On Wednesday 23 March 2016 00:32:17 Michele Calgaro wrote:

...
On 2016/03/23 12:44 PM, Gene Heskett wrote:

...
Greetings;

I use mailfilter as a prefilter in front of fetchmail to nuke some spam while its still on the server.

But its missing hits on what I suspect is the From: or Return-Path: strings that have quotation marks in the string because the string is being spec'd by being surrounded by "show this name" bs.

I've added the character < as part of the string its to search for, so the search string now looks like "From:.*<*.unwanted-tld". Does this stand that famous snow balls chance in hell of working well with or without a quoted "some funkity name" in front of the real url with the <> around it?

I just love the lack of documentation on how this string comparison stuff works as shown by the man pages for grep and regex. All sorts of control options are well covered, but figureing out how to write a search expression must be one of the worlds better guarded secrets.

So if someone could show me, or give a url that actually has the full docs, I'd be greatfull.

Thanks.

Cheers, Gene Heskett

Hi Gene, "From:.*<*.unwanted-tld" will match a string like this (I have put one section per line to be cleaer): From: whatever character 0 or more < .unwanted-tld

I thought I wanted 1 only, but the way these lowlifes change addresses and names hourly, they may remove the <> surrounding the real source address and screw me up. But the fact that they often put dbl-qoutes around the throwaway part of the url, is I think screwing me regardless.

What we need is the ability to specify the quote character by the first non-space character after the DENY =, which is currently a "^ or a <> which apparently inverts the logic. So a typical line would be

DENY = "^From:.*<*.bid"

Substitute any of the new tld's for bid that gets obnoxious. Like xyz, or .pro, heck that new list is several dozen tld's.

But AFAIK, we're stuck with the dblquote wrapper around the string to match. Grrrr.

...
It is greedy, so it will scan until the last < if there are more than one. Not sure if this is what you need or not. If you can post an example of what you need to match, I can workout another regex if required.

Try this:

"-Bed Bugs-" -BedBugs-@agma69.top

with Return-Path.* or From.* in front of it. Or does that - sign, 4 of them, need escaping with a \ ? IDK.

Hyphens should only need an escape if within a character class, denoted by square brackets.

...
...
I converted about 3 lines of the filterdata file that way, and I'm now waiting for the next blast of spam to serve as test data. mailfilter is a picky twit, but that hasn't given it a tummy ache either, so I am hopefull.

...
PS: by the way, the internet is full of excellent documentation about regex ;-) For example "http://www.regular-expressions.info/"

Cheers, Gene Heskett

Hi Gene, so if I understand correctly, you already had a set of rules like DENY = "^From:.*.bid" (bid stands for any tld of yuor choice) but it was missing some entries because of the "..." entry before the domain. So you put the < in the string as well. Right?

Assuming so, it surprises me that the original version missed some entries, since the additional "..." field would have already been matched by the .* part of the pattern. I think there is a different reason for missing entries. Perhaps a black character before "From:"? Could it be? You could try this other version: DENY = "^\s*From:.*.bid" which ignores any separator before From:

That would also sweep up, say, fred@mail.bidders.com, or "I.bid" ibid@nowhere.org

...
or DENY = "^\s*From:.*.bid>" which also makes explicit that the tld is followed by a >.

I'd cover the example as

^\W*((From:)|(Return-Path:)).*.bid\W*$

I put this one in dblquotes & will check on the next run, about a minute. no msgs to test it. The log was also showing that it was exiting, could not compile regular expression reported. For about the last hour.

Ah, next run, it bought the above sample with no complaints, processing and passing 2 msgs.

Thats great as I can remove about 1/2 of the rules by combining them so.

Thank you Michelle.

...

...
which works out to zero or more non-word characters at the beginning of the string, followed by "From:" or "Return-Path:" followed by zero or more unknowns, followed by ".bid", followed by zero or more non-word characters, followed by the end of the string. "Word" characters are alphanumerics, some connectors like _-, and possibly some non-ASCII depending on the implementation, so "non-word" covers stuff like punctuation and whitespace. Marking the end of the string makes it more likely you're getting the TLD and not some random bit in the middle that was designed as a parser torture-test.

If you want to get really silly,

^\W*((From:)|(Return-Path:)).*.[^cCoOnN][a-zA-Z][a-zA-Z]+\W*$

ought to catch the majority of TLDs with a 3+ ASCII character extension that isn't .com, .org, or .net, but without a larger sample of "good" and "bad" addresses, I can't guarantee no false positives.

I write a lot of regexes in my day job (which is not to say that I get them right the first time, every time!) Assuming a Perl-compatible implementation (which most of them are, more or less), "man perlre" is a decent reference for the complicated bits. Just scroll past the section on modifiers.

E. Liddell

Now that looks like the regex bible, Thanks a bunch. That needs printed and placed in the middle of the house little room. :)

...

To unsubscribe, e-mail:

trinity-users-unsubscribe@lists.pearsoncomputing.net For additional commands, e-mail: trinity-users-help@lists.pearsoncomputing.net Read list messages on the web archive: http://trinity-users.pearsoncomputing.net/ Please remember not to top-post: http://trinity.pearsoncomputing.net/mailing_lists/#top-posting

Cheers, Gene Heskett

Cheers, Gene Heskett

-- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Genes Web page http://geneslinuxbox.net:6309/gene

Michele Calgaro

2:11 p.m.

New subject: [trinity-users] Hopeing I can find a regex expert here

On 03/23/2016 11:03 PM, Gene Heskett wrote:

...

Thats great as I can remove about 1/2 of the rules by combining them so.

Thank you Michelle.

Well, you should thanks E. Liddell for this one ;-) Cheers Michele

Gene Heskett

9:59 p.m.

New subject: [trinity-users] Hopeing I can find a regex expert here

On Wednesday 23 March 2016 10:11:39 Michele Calgaro wrote:

...

On 03/23/2016 11:03 PM, Gene Heskett wrote:

...
Thats great as I can remove about 1/2 of the rules by combining them so.

Thank you Michelle.

Well, you should thanks E. Liddell for this one ;-) Cheers Michele

Ohhhkaay, thanks Mr. E. Liddell. :)

Cheers, Gene Heskett

-- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Genes Web page http://geneslinuxbox.net:6309/gene

Gene Heskett

24 Mar 24 Mar

3:15 p.m.

New subject: [trinity-users] Hopeing I can find a regex expert here

On Wednesday 23 March 2016 17:59:07 Gene Heskett wrote:

...

On Wednesday 23 March 2016 10:11:39 Michele Calgaro wrote:

...
On 03/23/2016 11:03 PM, Gene Heskett wrote:

...
Thats great as I can remove about 1/2 of the rules by combining them so.

Thank you Michelle.

Well, you should thanks E. Liddell for this one ;-) Cheers Michele

Ohhhkaay, thanks Mr. E. Liddell. :)

I am getting a little schmardter, but not enough. On thing that stands out is that the spams that it misses, have had another one line, first line header line inserted: ================================= From gene Thu Mar 24 09:11:22 2016 Received: from localhost by coyote.coyote.den with SpamAssassin (version 3.4.0); Thu, 24 Mar 2016 09:11:23 -0400 From: "Alliance Security" AllianceSecurity@wmthompson.download To: gheskett@wdtv.com Subject: Alliance security Solution Date: Thu, 24 Mar 2016 06:10:52 -0700 ================================== It should have triggered on the _real_ "From:" line, but didn't. Yet it did trigger on several others from that same tld.

And thats the whole thing, next is the spamassassin stuff. And except for the the real From: line, it is totally bogus, unless some A.H. has figured out how to compromize a linux email system that is NOT built like the usual linux email chain.

I'll do some more system snooping, but the two rootkit finders we have, haven't been updated in years that I'm aware of.

Thanks folks.

Cheers, Gene Heskett

-- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Genes Web page http://geneslinuxbox.net:6309/gene

Gene Heskett

23 Mar 23 Mar

1:28 p.m.

New subject: [trinity-users] Hopeing I can find a regex expert here

On Wednesday 23 March 2016 02:58:39 Michele Calgaro wrote:

...

On 2016/03/23 02:19 PM, Gene Heskett wrote:

...
On Wednesday 23 March 2016 00:32:17 Michele Calgaro wrote:

...
On 2016/03/23 12:44 PM, Gene Heskett wrote:

...
Greetings;

I use mailfilter as a prefilter in front of fetchmail to nuke some spam while its still on the server.

But its missing hits on what I suspect is the From: or Return-Path: strings that have quotation marks in the string because the string is being spec'd by being surrounded by "show this name" bs.

I've added the character < as part of the string its to search for, so the search string now looks like "From:.*<*.unwanted-tld". Does this stand that famous snow balls chance in hell of working well with or without a quoted "some funkity name" in front of the real url with the <> around it?

I just love the lack of documentation on how this string comparison stuff works as shown by the man pages for grep and regex. All sorts of control options are well covered, but figureing out how to write a search expression must be one of the worlds better guarded secrets.

So if someone could show me, or give a url that actually has the full docs, I'd be greatfull.

Thanks.

Cheers, Gene Heskett

Hi Gene, "From:.*<*.unwanted-tld" will match a string like this (I have put one section per line to be cleaer): From: whatever character 0 or more < .unwanted-tld

I thought I wanted 1 only, but the way these lowlifes change addresses and names hourly, they may remove the <> surrounding the real source address and screw me up. But the fact that they often put dbl-qoutes around the throwaway part of the url, is I think screwing me regardless.

What we need is the ability to specify the quote character by the first non-space character after the DENY =, which is currently a "^ or a <> which apparently inverts the logic. So a typical line would be

DENY = "^From:.*<*.bid"

Substitute any of the new tld's for bid that gets obnoxious. Like xyz, or .pro, heck that new list is several dozen tld's.

But AFAIK, we're stuck with the dblquote wrapper around the string to match. Grrrr.

...
It is greedy, so it will scan until the last < if there are more than one. Not sure if this is what you need or not. If you can post an example of what you need to match, I can workout another regex if required.

Try this:

"-Bed Bugs-" -BedBugs-@agma69.top

with Return-Path.* or From.* in front of it. Or does that - sign, 4 of them, need escaping with a \ ? IDK.

Thanks Michelle.

...
Cheers Michele

I converted about 3 lines of the filterdata file that way, and I'm now waiting for the next blast of spam to serve as test data. mailfilter is a picky twit, but that hasn't given it a tummy ache either, so I am hopefull.

...
PS: by the way, the internet is full of excellent documentation about regex ;-) For example "http://www.regular-expressions.info/"

Cheers, Gene Heskett

Hi Gene, so if I understand correctly, you already had a set of rules like DENY = "^From:.*.bid" (bid stands for any tld of yuor choice) but it was missing some entries because of the "..." entry before the domain. So you put the < in the string as well. Right?

Assuming so, it surprises me that the original version missed some entries, since the additional "..." field would have already been matched by the .* part of the pattern. I think there is a different reason for missing entries. Perhaps a black character before "From:"? Could it be? You could try this other version: DENY = "^\s*From:.*.bid" which ignores any separator before From: or DENY = "^\s*From:.*.bid>" which also makes explicit that the tld is followed by a >.

I'll do that for the top 4 or 5 entries to see what effect it has.

...

By the way, by "missing some entries" you mean that it is not filtering all the spam or that it is filtering some good emails as well?

Two consequitive spams ending in the desired hit, it nukes one and passes the other, so I was looking for what the diff was. Kmail shows the raw message with a tap on the v key, and I can't see any trash characters in from of the From etc lines.

But I see in the logs, that I an nuking posts from a valuable contributor, Seems Dr. Klepp is coming in from a .biz address, so I'll have to remove that filter line, and my apologies Nick if I have seemed to have ignored you. I'd druther have the spam than miss your helpfull msgs.

...

Final note, your current modified version if no different from the original, since <* (0 or more <) is preceded by .* (any sequence of character). Perhaps you wanted to make <.*, but it would make no difference either, except for being morerestrictive (i.e. there must be a < somewhere before the forbidden tld).

Cheers Michele

Cheers, Gene Heskett

-- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Genes Web page http://geneslinuxbox.net:6309/gene

3434

Age (days ago)

3435

Last active (days ago)

users@trinitydesktop.org

11 comments

4 participants

tags (0)

participants (4)

Dr. Nikolaus Klepp
E. Liddell
Gene Heskett
Michele Calgaro