Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-1382

AIP Backup & Restore functionality should not duplicate unchanged files across Item Versions

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Volunteer Needed (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0
    • Fix Version/s: None
    • Component/s: DSpace API
    • Labels:
      None
    • Attachments:
      0
    • Comments:
      2
    • Documentation Status:
      Needed

      Description

      The DSpace 3.0 model for storing Item Versions in AIPs is to generate a separate AIP for each version of the Item.

      Suppose you have an Item "123/45" with old versions "123/45.1" and "123/45.2". To export all versions, you'd need to export a total of 3 AIPs (123-45.zip, 123-45.1.zip and 123-45.2.zip), one for each version.

      Although this may sound reasonable, it can lead to "ballooning storage costs" as you version Items. Since 3 AIPs are generated in the above example, each of the 3 AIPs must duplicate all content files within it. So, if the size of the initial AIP is 100KB, after 10 versions, you may be storing around 10x100KB=~1MB of content, much of it actually duplicative in nature. A few ways around this issue would be to either:
      (a) store AIPs as "unzipped" (so they could link to the same content files & avoid some content duplication), OR
      (b) generate a single AIP zip package which describes all versions of the Item (again that way you could avoid content file duplication). This single AIP zip package could either describe all versions in a single METS file, or potentially include a separate METS file for each version.

      Either option we take, this will require some (likely major) rework of the AIP format. Obviously we'd need to make it backwards compatible with past AIP formats.
      https://wiki.duraspace.org/display/DSDOC3x/DSpace+AIP+Format

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              tdonohue Tim Donohue
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated: