Take a look at SAX, which allows you to see every single part of an XML document as you go through it. Then you can act on the text nodes and perform the necessary manipulations.
The problem with XSLT is that most implementations require an entire in-memory input tree, which is typically 10 times the size of the disk. I only know the commercial version of the Saxon XSLT transformer that can perform XSLT streaming (but that would be ideal for your needs).