I'm running into an issue where the search index grows at an unwieldy rate and fills up my disk, even with modest numbers of triples in the database. I have finally tracked down the source of my woes.
Imagine this situation (
which matches what is produced via the VIVO) edit: Not sure if this is the case for the triples below, but applies if vivo:relatedBy is used for anything not explicitly defined with a faux property).
Publication1 vivo:relatedBy Organization
Publication2 vivo:relatedBy Organization
The search indexer will populate the ALLTEXT fields for all the relevant data property values and object property labels for each entity. But apparently, it goes one step further on the tree when adding labels for object properties. That is, given the pseudo-triples above, the ALLTEXT field for Publication1 will include the label for itself, Organization, and Publication2.
Imagine a situation where Organization is related to thousands of publications. The search document for Publication1 will include the labels of thousands of publications. Now imagine Publication1 is related to multiple organizations, which are all related to thousands of publications. This is my situation.
I get why this extra step in indexing makes sense in some situations, since VIVO uses context nodes to define important relationships (like authorship), but it certainly doesn't make sense in this situation and probably others like it.