We want to share a problem related to the way users input metadata on the Input-forms and the problems this causes. By our experience this is mainly due to a copy/paste action from PDF files (usually an abstract of an article) and then some hiden characters are included in the text of the metadata.
On the user interface (JSPUI) we don't have any problem, unless when they are visible (see attachement) but this causes sometimes problems on the SOLR or on the OAI-PMH interface has the XML structure is not correct. This process invalidates the harvesting process of the repository for the item and the others items after the item with errors.
From many integrations we develop with DSpace, this problem is very usual and avoid a good interoperability. We suggest that the content could be "cleaned" to avoid these problems just after the user finish the deposit of the item. Is this possible to improve? or there is some configuration we can define to correct this?