Migrating data in the JCR
Extracting the media files sounded simple; just write a program to iterate through nodes with media items, save the content to the filesystem and refactor the node structure. This worked fine for the current version of each node but our workspace is versioned and our application needs to work with historical versions of the nodes. Older versions of nodes are frozen and can’t be changed in Jackrabbit. Changing the node definitions and persistence managers required exporting the content and adding it to a new repository. However, as far as I can tell, the Jackrabbit import/export features does not allow you to restore old versions. This means that we would not be able to migrate our version history. We tried using the exportsysview command to export our repository as XML, run the result through a XML transformation to remove any versionHistory and baseVersion properties, switch our node type definition and re-import the data into a new repository. We excluded the binary data from our XML export and ended up with a 22MB file. When we tried to import this through Jackrabbit’s importxml we kept getting OutOfMemory exceptions from the JVM. While we eventually got the import to work on a Win64 machine using 8 GB of memory, this isn't a practical long-term solution to. 8G to import 22MB of data just doesn't cut it. What we ended up doing was writing a program that iterates through each of the nodes in the repository in the following fashion: Run an exportxml on the node but not its children, perform any filters on the XML and then run importxml to bring the data into a second repository; saving the node at this point is important. Repeat for all children. Result, success. This worked because we don’t use a lot of references and they are structured in a way that we were able to export all of the reference targets first. In any event, JackRabbit's import/export mechanism needs a longer look. |