I met a performance issue with PipedInputStream/PipedOutputStream, and here is how I finally solved it.
I had a compressed XML file that I wanted to process (with StAX/JAXB), and it would be a good opportunity to put the decompression on one thread/core, and the XML processing on another. Fortunately, Java propose a nice pair of classes allowing to pass data from one thread to another: PipedInputStream and PipedOutputStream.
So I would have two threads:
- the first one would read the file, decompress it and write on the PipedOutputStream.
- the other one would parse/unmarshal the XML from an InputStream, which by chance would be a PipedInputStream
The result did not look great: it was slower that the single-threaded solution! I tried to increase the buffer size of the pipe, with limited success (I could only sometimes get the same performance level with 16Mo of buffer).
After some profiling, I notice that the PipedInputStream was spending a good amount of time in a wait() with a delay of 1 second! Then I finally get it: the reader will sometimes poll when waiting for data. But it would not prevent the program to work, it would degrade the performance silently.
To prevent that, the solution was to call explicitly «flush()» on the PipedOutputStream from time to time, it would notify the reader thread that data is available.
In my case, my decompression thread used IOUtils.copy() to pass data from the compressed stream to the pipe stream. And once in a while, when the reader had consumed all the available data, I was hit by the 1s delay. So I replaced it by a manual copy, with a flush() in after each loop.
Now the performance is very good: slightly better than decompressing with “unxz” and passing the data through a pipe.