On 2021-10-14 06:41:30 E. Liddell wrote:
On Wed, 13 Oct 2021 16:02:14 -0500
J Leslie Turriff <jlturriff(a)mail.com> wrote:
On 2021-10-13 13:07:13 E. Liddell wrote:
On Wed, 13 Oct 2021 16:46:20 +0000
That being said, test 9 is a raw grep being performed on an XML file.
This means that it could easily be latching onto something in a
comment, because following the full XML spec for determining whether a
given line is inside a comment or not using a simple text-matching tool
is . . . well, let's say it isn't something I'd want to try, and I deal
in regexes a fair amount in my day job. It really needs to be run
through a full parser that constructs a DOM tree.
Filter to throw away comments first, then filter for what it should look
for.
Correctly throwing away comments isn't as simple as tossing away everything
between a start marker and an end marker, though, because if the comment
marker is inside a CDATA section, it doesn't actually affect whether or not
the text is a comment. I suspect a comment marker found between quotes in
a text-format attribute value doesn't count either, but I'd have to check
the spec to be sure. And there may be more quirks that I've forgotten.
(Oh, and you could *easily* embed the value the grep expression is looking
for in the file without triggering the grep by using CDATA, now that I
think about it.)
There's a reason that man perlfaq6 contains the following:
How do I match XML, HTML, or other nasty, ugly things with a regex?
Do not use regexes. Use a module and forget about the regular
expressions.
E. Liddell
Yeah. I know little about it, but IIRC, XML was supposed to make everything so much
easier... :-D
Leslie
--
Operating System: Linux
Distribution: openSUSE Leap 15.3 x86_64
Desktop Environment: Trinity
Qt: 3.5.0
TDE: R14.0.10
tde-config: 1.0