Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-87

XMLUI file download links break in Google search results if file 'sequence' number changes.



    • Type: Improvement
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.0, 1.5.1
    • Fix Version/s: 1.5.2
    • Component/s: XMLUI
    • Labels:
    • Environment:
      Any environment
    • Attachments:
    • Comments:


      In the XMLUI, file download links have the following structure:


      If a file is changed by a DSpace Admin (e.g. replaced with a new file of the same name), there is a chance that the sequence number for the original file will change. The problem is that the XMLUI only checks the sequence number if it is provided on the URL. So, even if the newly added file has the same name (but a different sequence number) the URLs which point to the old file will all be broken.

      This is very problematic for search engines like Google, etc. as the it returns a 404 error to the search engine.

      Here's a step by step example to clarify:
      (1) A file (myfile.pdf) has an initial sequence # of 1...it's URL ends with "myfile.pdf?sequence=1"
      (2) Google indexes the DSpace site...therefore the file URL in Google search results ends with "myfile.pdf?sequence=1"
      (3) The file is then replaced with an updated version of the same name (myfile.pdf). It is assigned a new sequence number of 2. So, the URL in DSpace now ends with "myfile.pdf?sequence=2"
      (4) All Google links still point to "myfile.pdf?sequence=1", and the XMLUI returns a 404 at that URL. However, the new URL of "myfile.pdf?sequence=2" works fine (but Google doesn't realize that the location of this file has changed...at least not until a reindex of the site is complete).

      I believe the process in the XMLUI should be to first check the sequence number, and if a corresponding file is NOT found, then the XMLUI should look to see if there are any files of the same name. If a file is found with the same name and a different sequence number, than DSpace should redirect to that new location. That way, Google (and similar) would be informed that the location of that file has changed from "myfile.pdf?sequence=1" to "myfile.pdf?sequence=2" (based on the above example).




            tdonohue Tim Donohue
            tdonohue Tim Donohue
            0 Vote for this issue
            0 Start watching this issue