Details
Description
During the filter-media (in my example it was on a txt file), the metadata of the file is also indexed in the solr field that is used during a discovery search.
This results in useless values being shown in the item result list.
This is because of a config in solr's search core which doesn't capture attributes of the file.
https://github.com/DSpace/DSpace/blob/master/dspace/solr/search/conf/solrconfig.xml#L1051
-> Because this doesn't capture the attributes of the file (which would normally put it in separate fields while on true), it simply injects these values in the fulltext field (schema.xml), having the behaviour as shown in the file.
1 Proposed solution would be to index each of these attributes separately (title, encoding, etc) separately. But this could take in a lot of unnecessary space.
Another solution could be to index all these fields in 1 separate fields, or not index these at all.
(Note; I don't know in what dspace versions this bug occurs in, I've tested it in 5.4 and 6 for the moment, it could also just be a solr problem)
Attachments
Issue Links
- is duplicated by
-
DS-4007 PDF Text Extractor can cause strings like "content-type" to show up in search snippets
-
- Closed
-
- is related to
-
DS-3090 Discovery search results contain char-set-related errors from reading the fulltext bitstream
-
- Closed
-
-
DS-2843 Streamline & add comments to our Solr configurations
-
- Volunteer Needed
-
- mentioned in
-
Page Loading...