While preparing to test a PR related to full-text indexing, I discovered behavior that surprises me.
1. File anonymous.pdf contains the text "Anonymous Access Zzzip". This file will not be embargoed
2. File limited.pdf contains the text "Limited Access Zzzip". This file will be embargoed.
I created 2 items in the following collection: http://demo.dspace.org/xmlui/handle/10673/65
- http://demo.dspace.org/xmlui/handle/10673/68 - contains "anonymous.pdf" without an embargo
- http://demo.dspace.org/xmlui/handle/10673/69 - contains "limited.pdf" with an embargo
I ran filter-media on the collection.
Note that searching for "ZZzip" will perform an extract from the document under embargo.
UPDATE FROM TIM: After discussion in DevMtg today (http://irclogs.duraspace.org/index.php?date=2017-02-22), we've determined this is a very minor security issue, however there's little to no risk involved here.
- This ONLY occurs when Items are publicly available, but one of their bitstreams is embargoed. It does not occur if the entire item (including metadata) is embargoed
- In the "worst" case scenario, users may be able to search within the embargoed bitstream. However, the most information they'd get back is a "snippet" of where their search matched (These snippets can be turned off if they are a concern – see below comment)
Longer term however, we do want to resolve this bug. It's not ideal behavior by any means, but we need a volunteer to investigate how to filter these matches from Solr search results. This may require either storing bitstream permissions in Solr, or perhaps some sort of post-filter on restricted or embargoed bitstreams of public items.
This bug does not exist in the JSPUI, as search snippets are currently XMLUI specific.