Ticket #1162 (new defect)

Opened 19 months ago

[PATCH] get_tag() regex bug fix

Reported by: snarfed Owned by: caugb
Priority: normal Component: import-wodpress-1x
Severity: normal Keywords: wordpress-importer import wxr
Cc: briancolinger,ryan,nbachiyski

Description

hi all! i don't see a wordpress-importer component, or a way for normal users to make new components, so i picked the closest one i could find.

the tag regex in WP_Import::get_tag() has a bug that makes it overly loose, which can result in incorrect imported data. for example, this snippet of a comment in a WXR file:

<wp:comment_author_IP>1.2.3.4</wp:comment_author_IP> <wp:comment_author_email>a@…</wp:comment_author_email> <wp:comment_author>ryan</wp:comment_author>

results in this imported data:

mysql> select comment_author_IP, comment_author_email, comment_author from wp_comments where comment_post_id=22; +-------------------+----------------------+--------------------+ | comment_author_IP | comment_author_email | comment_author | +-------------------+----------------------+--------------------+ | 1.2.3.4 | a@… | 1.2.3.4 a@… ryan | +-------------------+----------------------+--------------------+

comment_author should be just 'ryan', but it's actually '1.2.3.4 a@… ryan'.

this happens because in the first part of the tag regex on wordpress_importer.php:72:

"|<$tag.*?>(.*?)</$tag>|is"

the .*? in the initial <$tag.*?> can consume opening and closing tags as well as contents. in the example above, if you call get_tag('comment_author'), the regex actually matches everything from <wp:comment_author_IP> through </wp:comment_author>. the first .*? matches '_IP', and then the inner (.*?) matches everything through the closing tag.

the patch fixes this by changing the regex to:

"|<$tag( +.*)?>(.*?)</$tag>|is"

which still handles tag attributes, if any, but requires that the opening tag is actually the requested tag string.

along with the patch, i've attached example WXR files that demonstrate this.

the patch is against svn r265279.

Attachments

wordpress_importer_get_tag_fix.patch Download (694 bytes) - added by snarfed 19 months ago.
bad.xml Download (771 bytes) - added by snarfed 19 months ago.
importing this reproduces the bug
ok.xml Download (771 bytes) - added by snarfed 19 months ago.
this is almost identical to bad.xml, but the wp:comment_author element appears first, so it doesn't reproduce the bug

Change History

Changed 19 months ago by snarfed

Changed 19 months ago by snarfed

importing this reproduces the bug

Changed 19 months ago by snarfed

this is almost identical to bad.xml, but the wp:comment_author element appears first, so it doesn't reproduce the bug

Note: See TracTickets for help on using tickets.