While the separation of individual speakers is a rather complex problem, you can automatically split the sound where there are pauses. This will create a series of files that are likely to be easier to manage, as the speakers often alternate between pauses.
This approach requires the Julius open source speech recognition decoding package. This is available in many Linux package repositories. I am using the multiverse Ubuntu repository.
Here is the site: http://julius.sourceforge.jp/en_index.php
Step 0: Install Julius
sudo apt-get install julius
Step 1: Segment Sound
adintool -in file -out file -filename myRecording.wav -startid 0 -freq 44100 -lv 2048 -zc 30 -headmargin 600 -tailmargin 600
-startid is the number of the initial segment to be added to the file name
-freq - sample rate of the original audio file
-lv - sound level over which voice detection will be active
-zc - these are zero crossings over which speech detection will be actively
-headmargin and -tailmargin is the amount of silence before and after each audio segment
Note that -lv and -zc should be adjusted for your specific sound recording attributes, while -headmargin and -tailmargin should be adjusted for your specific speaker styles. But the above values ​​have worked well in my voice recordings in the past.
Here is the documentation: http://julius.sourceforge.jp/juliusbook/en/adintool.html
In my experience, pre-processing sound using compression and normalization gives better results and requires less adjustment of Julius arguments. These initial steps are recommended but not required.
This approach requires the open source SOX toolkit. It is also available in many Linux package repositories. I am using the Ubuntu universe repository.
Here is the site: http://sox.sourceforge.net
Step -2: Install SoX
sudo apt-get install sox
Step -1: Preprocessor Sound
sox myOriginalRecording.wav myRecording.wav gain -b -n -8 compand 0.2,0.6 4:-48,-32,-24 0 -64 0.2 gain -b -n -2
Please note that it may take some time to fully understand the options for compand. But the above values ​​have worked well in my voice recordings in the past.
Here is the documentation: http://sox.sourceforge.net/sox.html
Although this will not give you an identification of each speaker, it will greatly simplify the task of doing this by ear, which may be the only option for a while. But I hope you find a practical solution if it is already available.