Go XML sax-like parsing is slow

I wanted to write a Openstreetmap XML processor in Go language, hoping that I would get a performance boost from my Python implementation. And it ended being slower. Python is using Expat (written in C) and maybe the Go module «encoding/xml» is not the state of the art of optimization.

In wrote simple programs handling the event «start element». In 10 seconds, I could parse the following amount of XML data (Athlon II X4 620):

  • PyPy: did not run because of a bug (no progressive parsing)
  • Go: 70Mo
  • Python 2.7: 210Mo
  • Python 3.2: 215Mo
  • Java 7: 460Mo
  • C++ / libxml: 675Mo

I tried to use Expat or Libxml in Go, but for the moment it is just too complicated. In Go code, It’s easy to call C functions located in shared libraries, but if you need to pass callback functions written in Go to a library written in C, you will have to do dirty things (create wrappers in a Go module having C code).

That’s a pity because the Go compiler automatically generates C wrappers for your exported Go functions, but you can not get an raw pointer to these wrappers (this way I would have been able to pass my callbacks to LibXML or Expat)… See you later, Go.

4 réflexions au sujet de « Go XML sax-like parsing is slow »

  1. Ping : Golang, Openstreetmap, threads | Fabsk.eu

    1. Fab Auteur de l’article

      Sorry, I don’t have useful code, only this snippet. If you plan to read huge OSM data, I suggest that you use the PBF (protocol buffers) format instead, it’s much faster for a program to read. I can’t remember exactly how, but there is a program that will generate all the Go structures for you for the PBF specifications. If you want to go with PBF, I can publish my code (which is 2 years old, so maybe outdated).


Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée.