Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-2462

query.filter.spiderIp is redundant, incomplete, scales poorly

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Volunteer Needed (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: DSpace API
    • Labels:
      None
    • Attachments:
      0
    • Comments:
      1
    • Documentation Status:
      Needed

      Description

      This configuration setting is found in config/modules/solr-statistics.cfg, and implemented in org.dspace.statistics.SolrLogger#query.

      1. The code which implements this has not kept up with improvements in org.dspace.statistics.util.SpiderDetector, which now also detects spiders by DNS name patterns and user-agent patterns.

      2. If usage records are being marked isBot when they are created, then filtering records by IP is redundant.

      3. The list of spider IPs is long, and will continue to grow. Enabling this feature adds each spider IP to many Solr queries as part of a long filter query. This greatly increases the length and complexity of the query, and might eventually make the query string too long to process.

      We should remove this feature and encourage good use of isBot flagging. We should also ensure that a tool is provided for background (re)marking with isBot using SpiderDetector#isSpider, should a site need to do that.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              mwood Mark H. Wood
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated: