Details
-
Type:
Bug
-
Status: Volunteer Needed (View Workflow)
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: RDF
-
Labels:
-
Attachments:
-
Comments:0
-
Documentation Status:Needed
Description
RDF will sometimes get stuck responding to a query, and it will never respond to a query again. The remedy is to restart tomcat. If nobody queries this RDF excessively, then DSpace/RDF will be stable endlessly. However, too much, or too hard of a query will cause RDF to get stuck.
To benchmark, I setup jmeter to hit:
dspace.example.com/rdf/resource/123456789/7908
Header:
Accept: application/rdf+xml, application/xhtml+xml;q=0.3, text/xml;q=0.2, application/xml;q=0.2, text/html;q=0.3, text/plain;q=0.1, text/n3, text/rdf+n3;q=0.5, application/x-turtle;q=0.2, text/turtle;q=1
5 threads, loop count forever.
It seemed to be fine for a long while, and then extreme failure.
sample times (s): 2.4, 2.5, 2.4, 832 (fail, 503), 833 (fail, 503), 834 (fail, 503), ..., 857 (success), 858 (success), 858 (success), 859 (success), 1800 (Apache timeout), 2100 (apache timeout), 2100 (apache web timeout)...
I also ran /dspace/bin/dspace rdfizer -c -o -v
concurrently
To detect stuck threads, I altered my tomcat context for RDF, to add valve StuckThreadDetectionValve.
<?xml version="1.0" encoding="UTF-8"?>
<Context docBase="/dspace/webapps/rdf"
privileged="true" antiResourceLocking="false" antiJARLocking="false">
<Valve className="org.apache.catalina.valves.StuckThreadDetectionValve" threshold="70" interruptThreadThreshold="80"/>
</Context>
These stuck threads then begin logging like crazy:
22-Aug-2016 14:10:47.688 WARNING [ContainerBackgroundProcessor[StandardEngine[Catalina]]] org.apache.catalina.valves.StuckThreadDetectionValve$MonitoredThread.interruptIfStuck Thread "ajp-nio-8009-exec-24" (id=101) h
as been interrupted because it was active for 2,308,500 milliseconds (since 8/22/16 1:32 PM) to serve the same request for https://dspace.example.com/rdf/data/123456789/7908/turtle and was probably stuck (configu
red interruption threshold for this StuckThreadDetectionValve is 80 seconds).
java.lang.Throwable
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at org.apache.jena.riot.web.HttpOp.exec(HttpOp.java:1108)
at org.apache.jena.riot.web.HttpOp.execHttpGet(HttpOp.java:385)
at org.apache.jena.riot.web.HttpOp.execHttpGet(HttpOp.java:354)
at org.apache.jena.web.DatasetGraphAccessorHTTP.doGet(DatasetGraphAccessorHTTP.java:134)
at org.apache.jena.web.DatasetGraphAccessorHTTP.httpGet(DatasetGraphAccessorHTTP.java:128)
at org.dspace.rdf.storage.RDFStorageImpl.load(RDFStorageImpl.java:141)
at org.dspace.rdf.RDFUtil.loadModel(RDFUtil.java:44)
at org.dspace.rdf.providing.DataProviderServlet.serveNamedGraph(DataProviderServlet.java:148)
at org.dspace.rdf.providing.DataProviderServlet.processRequest(DataProviderServlet.java:140)
at org.dspace.rdf.providing.DataProviderServlet.doGet(DataProviderServlet.java:216)
Running DSpace 5.x, using the built in sesame triple store. I also happened to note that Sesame has been renamed to RDF4J, and it appears that they have just had their FIRST release of RDF4J (since the change from open rdf sesame), just this past week: http://rdf4j.org/
I think a solution that we need is to set a timeout, for max request time:
https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/rdf/storage/RDFStorageImpl.java#L61-L65
Otherwise, if the triplestore gets hung up, DSpace will wait forever, endless loop?