Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-3331

Multi-shard solr statistics queries suppress key information

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 6.0
    • Fix Version/s: 6.1
    • Component/s: Solr
    • Labels:
      None
    • Attachments:
      0
    • Comments:
      1
    • Documentation Status:
      Needed

      Description

      I have some custom statistics reporting tools that no longer work after the upgrade to DSpace 6. While investigating the issue, I discovered the following behavior.

      Run the following query on a DSpace 5 instance.

      https://<your-server>/solr/statistics/select?q=statistics_type:view&rows=1&shards=localhost/solr/statistics

      In your results, note the presence of the id, owningComm and owningColl fields. The "id" field is the integer value of the DSpace object id.

      <response>
      <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">21</int>
      <lst name="params">
      <str name="shards">localhost/solr/statistics</str>
      <str name="q">statistics_type:view</str>
      <str name="rows">1</str>
      </lst>
      </lst>
      <result name="response" numFound="14368" start="0" maxScore="1.0188893">
      <doc>
      <str name="ip">10.212.143.233</str>
      <str name="referrer">
      https://repository-dev.library.georgetown.edu/handle/10822/761365
      </str>
      <str name="dns">10.212.143.233</str>
      <str name="userAgent">
      Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36
      </str>
      <bool name="isBot">false</bool>
      <int name="id">979</int>
      <int name="type">2</int>
      <arr name="owningColl">
      <int>2</int>
      </arr>
      <arr name="owningComm">
      <int>2</int>
      </arr>
      <date name="time">2016-04-20T14:07:27.558Z</date>
      <int name="epersonid">0</int>
      <str name="statistics_type">view</str>
      <str name="uid">f9bd7777-c6a9-45bc-b6e4-8aeff76468b0</str>
      <long name="_version_">1532153715815874560</long>
      </doc>
      </result>
      </response>
      

      If you run the same query in a DSpace 6 instance, the id, owningComm, and owningColl fields are not present.

      <response>
      <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">15</int>
      <lst name="params">
      <str name="shards">localhost/solr/statistics</str>
      <str name="q">statistics_type:view</str>
      <str name="rows">1</str>
      </lst>
      </lst>
      <result name="response" numFound="1823" start="0" maxScore="1.0146941">
      <doc>
      <str name="ip">10.212.133.132</str>
      <str name="referrer">
      https://repository-aux.library.georgetown.edu/discover
      </str>
      <str name="dns">10.212.133.132</str>
      <str name="userAgent">
      Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
      </str>
      <bool name="isBot">false</bool>
      <int name="type">2</int>
      <date name="time">2016-08-16T19:34:34.662Z</date>
      <str name="statistics_type">view</str>
      <str name="uid">05c2e3da-9631-4c04-b7ee-281758aa809c</str>
      <long name="_version_">1542864738365472768</long>
      </doc>
      </result>
      </response>
      

      If the shards parameter is removed, those data fields are present. The "id" field contains a uuid to the DSpace Object.

      https://<your-server>/solr/statistics/select?q=statistics_type:view&rows=1

      <response>
      <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">1</int>
      <lst name="params">
      <str name="q">statistics_type:view</str>
      <str name="rows">1</str>
      </lst>
      </lst>
      <result name="response" numFound="1823" start="0">
      <doc>
      <str name="ip">10.212.133.132</str>
      <str name="referrer">
      https://repository-aux.library.georgetown.edu/discover
      </str>
      <str name="dns">10.212.133.132</str>
      <str name="userAgent">
      Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
      </str>
      <bool name="isBot">false</bool>
      <str name="id">75dfaf79-246c-445a-970c-6b8534414838</str>
      <int name="type">2</int>
      <arr name="owningColl">
      <str>2cca3a8f-f4f4-41c2-be14-bcd916654a35</str>
      </arr>
      <arr name="owningComm">
      <str>953646a5-0aa4-42a6-af13-7a3fa7233e30</str>
      <str>bd2ca6a7-2aa0-4d87-9cd3-ea03f530faa0</str>
      <str>9160cfb8-a0fc-4ed8-aafd-b95b1c17c1ec</str>
      <str>6ab9d8fb-3f6c-4fed-80b1-361622412498</str>
      <str>05c57498-c7c3-48ec-9416-d7e119de1b44</str>
      </arr>
      <date name="time">2016-08-16T19:34:34.662Z</date>
      <str name="epersonid">08686050-51f2-407e-a328-ab7fdb3c118b</str>
      <str name="statistics_type">view</str>
      <str name="uid">05c2e3da-9631-4c04-b7ee-281758aa809c</str>
      <long name="_version_">1542864738365472768</long>
      </doc>
      </result>
      </response>
      

      The following page may explain why the "id" field is not present when the shards parameter is provided. https://cwiki.apache.org/confluence/display/solr/Distributed+Search+with+Index+Sharding#DistributedSearchwithIndexSharding-LimitationstoDistributedSearch

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              terrywbrady Terrence W Brady
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: