Uploaded image for project: 'DSpace (LEGACY)'
  1. DSpace (LEGACY)
  2. DS-1120

AIP Backup & Restore : SITE AIP has a different checksum everytime when orphaned Collection/Community groups exist



    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.0, 1.7.1, 1.7.2, 1.8.0, 1.8.1
    • Fix Version/s: 1.8.2, 3.0
    • Component/s: DSpace API
    • Labels:
    • Attachments:
    • Comments:
    • Documentation Status:
      Complete or Committed


      When a DSpace instance contains one or more "orphaned" Community/Collection groups, this causes issues with the SITE AIP generation of the AIP Backup & Restore (METS) tools.

      By "orphaned" Community/Collection groups, I mean a group of the form "COMMUNITY_<ID>ADMIN" or "COLLECTION<ID>_SUBMIT" where the associated Community or Collection no longer exists in the system. Unfortunately, DSpace currently does a bad job of making sure all associated Groups are also cleaned up. This sometimes leaves several "orphaned" groups that are likely no longer in usage (unless a DSpace Admin still uses it as a sub-group of a larger group).

      When exporting a SITE AIP, the AIP Backup & Restore tool needs to translate all Community/Collection groups into a format like "COLLECTION_<handle>_ADMIN" (as the internal IDs have no meaning once the AIP is outside of DSpace, and they cannot be preserved between DSpace instances).

      When the AIP Backup & Restore tool encounters an orphaned group, it renames it to a random name like: "GROUP_<random-key>_COLLECTION_ADMIN" (because the group is orphaned, it cannot be translated into a Handle).

      Unfortunately, this random naming scheme backfires as it causes the MD5 Checksum of the SITE AIP to be different every time it is generated. This is extremely problematic as this means that the SITE AIP appears to always be different from a remote backup copy (even if the only difference is that a different <random-key> was generated for these groups).

      In essence, this is a long-winded way of saying that the AIP Backup & Restore tool needs to avoid generating random Group names on export. Rather, the exported group names need to be repeatable in every manner.

      Instead, when exporting to an AIP, I suggest renaming orphaned groups into a standard format like: "ORPHANED_COLLECTION_GROUP_<id>_ADMIN". This naming format lets Admins know that it was determined to be an orphaned group (so it likely can/should be cleaned up if it isn't being used as a sub-group elsewhere). It also insures the new group name is still unique (at least in the AIP) & repeatable, by using the old internal Object ID of its orphaned parent.

      I've attached a proposed patch to fix this issue as described above (see PackageUtils.patch). This issue (although small) is extremely problematic for folks using AIP Backup & Restore. I'd suggest we may need to do a 1.8.2 release to push this fix out sooner rather than later.




            tdonohue Tim Donohue
            tdonohue Tim Donohue
            0 Vote for this issue
            0 Start watching this issue