Last active
August 29, 2015 14:01
-
-
Save hakanu/466a5304522fd2a0fb98 to your computer and use it in GitHub Desktop.
A very simple script which transforms wordpress dump json into ghost-import friendly format
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""A very simple script which transforms wordpress dump into ghost-import | |
friendly format. This is just a quick script for migration. | |
This script solves the postgres related problems like: | |
- RejectionError: current transaction is aborted, commands ignored until end of | |
transaction block | |
- js console: POST http://haku.io/ghost/api/v0.1/db/ 503 (Service Unavailable) | |
http://devdala.files.wordpress.com/2014/05/screen-shot-2014-05-26-at-16-43-08.png | |
Detailed explanation: | |
https://ghost.org/forum/using-ghost/8433-wordpress-migration-json-file-import-failing-with-unknown-error/ | |
Don't forget to modify file_path and subset_posts limit(currently 200). This is | |
set because ghost complains about the uplaod when the size is large. Kinda | |
chunking the posts. When you chunk the posts you need to put the related tags | |
with it. To do that you need to use intermediate table posts_tags. | |
Script skips the draft posts because they cause problem due to not having a | |
published_at attribute. Ghost can not import them. | |
http://hakanu.co | |
""" | |
import json | |
print '-----Started-----' | |
file_path = 'wp2ghost_export_1400838534.json' | |
j = json.loads(open(file_path, 'r').read()) | |
print 'len: ', len(j) | |
print 'len[data]: ', len(j['data']) | |
print 'len[meta]: ', len(j['meta']) | |
print 'len[meta]: ', len(j['data']['posts']) | |
print 'len[meta]: ', len(j['data']['tags']) | |
print 'len[meta]: ', len(j['data']['posts_tags']) | |
meta = j['meta'] | |
posts = j['data']['posts'] | |
tags = j['data']['tags'] | |
posts_tags = j['data']['posts_tags'] | |
posts_dict = {} # keyed by post_id. | |
tags_dict = {} # keyed by tag_id | |
posts_tags_dict = {} # keyed by post_id. value will be tag id. | |
for post in posts: | |
posts_dict[post['id']] = post | |
for post_tag in posts_tags: | |
posts_tags_dict[post_tag['post_id']] = post_tag | |
for tag in tags: | |
tags_dict[tag['id']] = tag | |
subset_posts = posts[0:200] # Get first 200. | |
new_posts = [] | |
new_posts_tags = [] | |
new_tags = [] | |
seen_slugs = [] # Put seen slugs here to eliminate duplications. | |
for post in subset_posts: | |
if post['status'] == 'draft': # Skip draft ones. | |
continue | |
new_posts.append(post) | |
post_id = post['id'] | |
if post_id in posts_tags_dict: | |
tag_id = posts_tags_dict[post['id']]['tag_id'] | |
new_posts_tags.append(posts_tags_dict[post['id']]) | |
# Slugs must be unique for ghost. | |
if tags_dict[tag_id]['slug'] not in seen_slugs: | |
new_tags.append(tags_dict[tag_id]) | |
seen_slugs.append(tags_dict[tag_id]['slug']) | |
else: | |
print 'no tag for post_id: ', post_id | |
new_master_dict = {} | |
new_master_dict['meta'] = meta | |
new_master_dict['data'] = {} | |
new_master_dict['data']['posts'] = new_posts | |
new_master_dict['data']['tags'] = new_tags | |
new_master_dict['data']['posts_tags'] = new_posts_tags | |
open(file_path + '_new.json','w').write(json.dumps(new_master_dict, indent=4)) | |
print '-----Finished----' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment