You need some kind of centralized place to send packets back via multicast, otherwise you need a decentralized approach where every client connects to every other client and every client accepts multicast. What you want to avoid is to force the machines to forward their data to all other machines, which will cause O (n) to send a message to each machine (and slow I / O).
In any case, you have the same problem: how to combine audio streams. One simple mechanism to achieve this is bitwise or signals together before sending them back (either from a network port or out to your speakers), but this assumes that you have access to uncompressed and reasonably synchronized streams.