Spiders are currently detected by matching their IP address to one listed in the /dspace/config/spiders/ip-list-X.txt, however as spiders change IP addresses, or the ip-list is unmaintained, then many spiders can slip through, however they will usually keep their user agent or hostname intact.
I've noticed a sore point in my solr data, where msnbot is completely unfiltered by solr. They have an additional ip list: http://www.iplists.com/nw/msn.txt however it is very old, and with additional bingbots on the horizon, it would be easier to detect, and filter them out of the logs by user-agent, then to maintain all of the IP address ranges. The code to do this in SOLR is unimplemented, and this ticket is a place holder to encourage this work to filter out based on user agent / dns-hostname to be finished.
To see all of the hits from msnbot that are unfiltered, look at: http://localhost:8080/solr/statistics/select?q=dns:msnbot*&facet=true&facet.field=dns&facet.mincount=1&facet.limit=5000