We've been using a spare WordPress installation (via Postie) to trac job comments that aren't in Trac, and a Lilina installation to correlate and aggregate all the RSS feeds together. The mashed tools together work really, really well - Mummy, is this Web2.0? - but a misconfigured mail filter resulted in about 400 duplicate posts on the WordPress install, which didn't help.
So I needed to remove the duplicate posts from that WordPress install - they all had identical content and titles, but different timestamps and IDs (of course). Here's a quick HOWTO for anyone else who has aggregated or automated themselves into having lots of duplicate WordPress posts.
MySQL allows you to create a table from a SELECT query, and is polite enough to maintain the table structure. So, assuming your duplicate information is like mine, a query that will produce a table with a single instance of each unique post_title and post_content will look something like this:
CREATE TABLE wp_posts_NODUPLICATES
SELECT DISTINCT
MIN(ID) as ID, post_author, min(post_date) as post_date,
MIN(post_date_gmt) as post_date_gmt,
post_content, post_title, post_category, post_excerpt,
post_status, comment_status, ping_status,
post_password, post_name, to_ping, pinged,
MIN(post_modified) as post_modified, MIN(post_modified_gmt) as post_modified_gmt,
post_content_filtered, post_parent, guid, menu_order
FROM wp_posts
GROUP BY post_content, post_title ;
Once that's done, you should have a table with all of your posts, but only one of each. If your duplication differs, you'll need to tweak the above to suit your case ...
From there, you need to swap your new cleaned table in:
ALTER TABLE wp_posts RENAME wp_posts_DUPLICATES ;
ALTER TABLE wp_posts_NODUPLICATES RENAME wp_posts ;
Bingo!
Comments
Thanks!
Man, you saved me a TON of time, and really really appreciate it!
I apparently had 20k duplicate out of 52k posts.
additional columns
In my install, Wordpress 2.9.2 the wp_posts table has additional 3 columns:
| post_type | varchar(20) | NO | MUL | post | || post_mime_type | varchar(100) | NO | | | |
| comment_count | bigint(20) | NO | | 0 | |
How would you setup the CREATE statement now?
That depends.
Post new comment