Java: PipedInputStream contention

I met a performance issue with PipedInputStream/PipedOutputStream, and here is how I finally solved it.

I had a compressed XML file that I wanted to process (with StAX/JAXB), and it would be a good opportunity to put the decompression on one thread/core, and the XML processing on another. Fortunately, Java propose a nice pair of classes allowing to pass data from one thread to another: PipedInputStream and PipedOutputStream.

So I would have two threads:

  • the first one would read the file, decompress it and write on the PipedOutputStream.
  • the other one would parse/unmarshal the XML from an InputStream, which by chance would be a PipedInputStream

The result did not look great: it was slower that the single-threaded solution! I tried to increase the buffer size of the pipe, with limited success (I could only sometimes get the same performance level with 16Mo of buffer).

After some profiling, I notice that the PipedInputStream was spending a good amount of time in a wait() with a delay of 1 second! Then I finally got it: the reader will sometimes poll when waiting for data. But it would not prevent the program to work, it would degrade the performance silently.

To prevent that, the solution was to call explicitly «flush()» on the PipedOutputStream from time to time, it would notify the reader thread that data is available.

In my case, my decompression thread used IOUtils.copy() to pass data from the compressed stream to the pipe stream. And once in a while, when the reader had consumed all the available data, I was hit by the 1s delay. So I replaced it by a manual copy, with a flush() after each loop.

Now the performance is very good: slightly better than decompressing with « unxz » and passing the data through a pipe.

2 réflexions sur « Java: PipedInputStream contention »

  1. Damien

    面白い記事なんですよ

    Je cherchais un moyen de traiter des données sensibles stockées dans un fichier chiffré GPG, mais sans passer par la création sur le disque d’un fichier intermédiaire en clair. La solution du pipe java via le couple PipedInputStream et PipedOutputStream semble être une bonne solution. En tout cas, merci du feedback concernant le flush() de l’ouputstream

    それじゃ、また

    Répondre

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *