First, you need to define the boundaries of the problem you are asking for. If you are really talking about dynamically scaling horizontally, where you rotate up and down servers based on the total load, then this is an even more complicated problem than just figuring out where to route the last incoming new socket connection.
To solve this problem, you should have a way to "move" the socket from one host to another so that you can clear the connections from the host you want to unscrew (I assume that true dynamic scaling is both up and down). The usual way I saw this is to attract a collaborating client, where you tell the client to reconnect, and when he reconnects, load balancing on another server so that you can clear the one you would like to unscrew. If your client already has auto-join logic (for example, socket.io), you can just close the server and the client will automatically connect.
Regarding load balancing for incoming client connections, you must decide which load metric you want to use. Ultimately, you need an account for each server process that tells you how busy you think you are, so that you can make new connections to the least busy server. A rudimentary account will simply be the number of current connections. If you have a large number of connections to the server process (tens of thousands), and there are no special reasons in your application that some of them may be much more busy than others, then the law of large numbers probably averages the load, so you can get away with how many connections on each server. If the use of connections is not so fair or even, you may also need some time, a moving average processor load along with the total number of connections.
If you are going to download the balance on several physical servers, you will need a load balancer or a proxy service that everyone connects to initially, and this proxy server can view the metrics for all currently running servers in the pool and assign a connection to the one with the lowest current account. This can be done using a proxy scheme or (more scalable) using redirection so that the proxy server goes out of the way after the initial assignment.
Then you can also have a process that regularly checks your load rating (however you decide to calculate it) on all servers in the cluster and decides when to spin a new server or when it is spinning or when it is too far from the balance on a given server, and this the server needs to be told to disconnect several connections, forcing them to rebalance.
I do not understand how to efficiently route new socket connections to the server of your choice with a low number of sockets.
As described above, you are either using a proxy scheme or a redirect scheme. At a higher cost during connection, I prefer a redirect scheme because it is more scalable at startup and creates fewer points of failure for an existing connection. All clients connect to the inbound gateway server, which is responsible for knowing the current load estimate for each server in the farm and based on this assigns the inbound connection to the host with the lowest score, and this new connection is then redirected to reconnect to one of the specific servers in your farm.
I also saw load balancing done exclusively with a custom DNS implementation. The client requests an IP address for farm.somedomain.com and this custom DNS server provides them with the IP address of the host they want to assign. Each client looking for an IP address for farm.somedomain.com can get a different IP address. You start hosts up or down by adding or removing them from a custom DNS server, and this is a custom DNS server that must contain logic for knowing the load balancing logic and current load ratings of all running hosts.