Avoid expensive ServiceLoader call for every XML message parsed
We observed a significant performance hotspot during XML unmarshalling that has to do with ServiceLoader
:
The reason is:
- SDCri's
JaxbMarshalling
creates a newUnmarshaller
for every XML message to parse - When
Unmarshaller.unmarshal()
is called, JAXB creates a new instance ofSAXParserFactory
(seeAbstractUnmarshallerImpl.getXMLReader()
) - During creation of the
SAXParserFactory
, the ServiceLoader API is used to determine the implementing class. This is slow (hard to debug why exactly, but it is - not only JProfiler says so, also numerous articles discussing ServiceLoader performance on the WWW).
See also this blog post which discusses this exact issue: https://davidbuccola.blogspot.com/2009/09/cache-jersey-jaxb-marshaller-and.html
You can also verify that JAXB goes down the ServiceLoader route way too often by means of counting breakpoint visits:
Also note this hint in the official JAXB documentation:
If you really care about the performance, and/or your application is going to read a lot of small documents, then creating Unmarshaller could be relatively an expensive operation. In that case, consider pooling Unmarshaller objects. Different threads may reuse one Unmarshaller instance, as long as you don't use one instance from two threads at the same time.
1st possible solution: quickfix
You could simply set a system environment variable to the class that should be used, then the ServiceLoader is skipped (cf. FactoryFinder.find()
):
System.setProperty("javax.xml.parsers.SAXParserFactory", "com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl");
If you choose this approach, this should become a configurable setting, as users may want to replace Xerces by a different XML library.
(Alternative: as users of SDCri could also set the environment variable themselves, SDCri could also just decide to do nothing here (possibly documentation?))
2nd possible solution: solving the root cause
We could also convince the JAXB folks that it is bullshit to create a factory with the same parameters over-and-over-again and fix AbstractUnmarshallerImpl
accordingly. I have created a ticket in their Github for this purpose.
3rd possible solution: re-use Unmarshaller
We also experimented with re-using the complete Unmarshaller
. This is potentially the fastest solution, as instantiating the Unmarshaller
has some more (not so expensive) steps than just figuring out which SAX parser to use.
Unfortunately, Unmarshaller
is not thread-safe, so access to the Unmarshaller
instances needs to be controlled.
Variant a) Object pool
You could choose to create an "object pool" of Unmarshallers
and access them from JaxbMarshalling ass suggested by the JAXB documentation. Unfortunately, there is currently no object pool like Commons Pool on the classpath. Also, it may be difficult to find the right size of the pool, as it should probably somehow grow/shrink with the number of connected devices.
Variant b) ThreadLocals
You could also put the Unmarshaller in a ThreadLocal
and reuse it from there if present. Unfortunately, there are quite a lot of threads that may trigger unmarshalling: potentially hundreds of of Jetty threads (qtp...), "Consumer-thread-..." (cf. DefaultGlueModule.getConsumerExecutor()
), "NetworkJobThreadPool-thread-..." (cf. DefaultDpwsModule.getNetworkJobThreadPool()
)...
I don't know if it might be a problem to have potentially hundreds of Unmarshallers
in the RAM? Maybe you have your own good ideas for this approach?