WordPress.org

Plugin Directory

import-wodpress-1x

Opened 8 years ago

Last modified 6 years ago

#1162 reopened defect

[PATCH] get_tag() regex bug fix

Reported by: snarfed Owned by: caugb
Priority: normal Severity: normal
Plugin: import-wodpress-1x Keywords: wordpress-importer import wxr
Cc: briancolinger, ryan, nbachiyski

Description

hi all! i don't see a wordpress-importer component, or a way for normal users to make new components, so i picked the closest one i could find.

the tag regex in WP_Import::get_tag() has a bug that makes it overly loose, which can result in incorrect imported data. for example, this snippet of a comment in a WXR file:

<wp:comment_author_IP>1.2.3.4</wp:comment_author_IP>
<wp:comment_author_email>a@…</wp:comment_author_email>
<wp:comment_author>ryan</wp:comment_author>

results in this imported data:

mysql> select comment_author_IP, comment_author_email, comment_author from wp_comments where comment_post_id=22;
+-------------------+----------------------+--------------------+
| comment_author_IP | comment_author_email | comment_author |
+-------------------+----------------------+--------------------+
| 1.2.3.4 | a@… | 1.2.3.4 a@… ryan |
+-------------------+----------------------+--------------------+

comment_author should be just 'ryan', but it's actually '1.2.3.4 a@… ryan'.

this happens because in the first part of the tag regex on wordpress_importer.php:72:

"|<$tag.*?>(.*?)</$tag>|is"

the .*? in the initial <$tag.*?> can consume opening and closing tags as well as contents. in the example above, if you call get_tag('comment_author'), the regex actually matches everything from <wp:comment_author_IP> through </wp:comment_author>. the first .*? matches '_IP', and then the inner (.*?) matches everything through the closing tag.

the patch fixes this by changing the regex to:

"|<$tag( +.*)?>(.*?)</$tag>|is"

which still handles tag attributes, if any, but requires that the opening tag is actually the requested tag string.

along with the patch, i've attached example WXR files that demonstrate this.

the patch is against svn r265279.

Attachments (3)

wordpress_importer_get_tag_fix.patch (694 bytes) - added by snarfed 8 years ago.
bad.xml (771 bytes) - added by snarfed 8 years ago.
importing this reproduces the bug
ok.xml (771 bytes) - added by snarfed 8 years ago.
this is almost identical to bad.xml, but the wp:comment_author element appears first, so it doesn't reproduce the bug

Download all attachments as: .zip

Change History (5)

@snarfed8 years ago

importing this reproduces the bug

@snarfed8 years ago

this is almost identical to bad.xml, but the wp:comment_author element appears first, so it doesn't reproduce the bug

comment:1 @garyc406 years ago

  • Cc changed from briancolinger,ryan,nbachiyski to briancolinger, ryan, nbachiyski
  • Resolution set to fixed
  • Status changed from new to closed

In [586518]:

Error: Failed to load processor CommitTicketReference
No macro or processor named 'CommitTicketReference' found

comment:2 @snarfed6 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

reopening. that fix is unrelated. garyc40, i'm guessing you meant a different bug.

Note: See TracTickets for help on using tickets.