I just listed and will talk about several possible solutions that I hope will help you solve. A join table contains two or three columns. A foreign key to the left table, say a , and a foreign key to the right table, say b . An optional column is the row identifier for the "join table", for example id .
Solution 1: Columns a,b . No clustered index (heap), indices on (a,b) and (b,a)
Both columns are stored in three places. It supports queries on both a and b , and search b does not require bookmark searches, since a part of the index (b,a) . A decent choice, but triple storage seems like a waste. The heap is not needed, but must be supported during insert and update requests.
Solution 2: Columns a, b . Cluster index on (a,b) , index on (b,a)
All data is stored twice. Can search on a and b without searching for bookmarks. This will be the best approach. He trades disk storage for speed.
Solution 3: Columns a, b . Cluster index on (a,b)
All data is saved only once. It can serve as a search on a , but not on b . To move from the table from right to left, you will need to scan the table. It rates for disk space. (A hash join is mentioned in your question. A hash join always performs a full scan.)
Solution 4: Columns id, a, b . Cluster index (id) , index on (a) and (b)
Searching a or b requires a bookmark search. Both a and b are stored twice on disk, once in their own index and once in a cluster key. This is the worst decision I could think of.
This list is by no means exhaustive. Solution 2 would be a good default choice. I would go for it if another solution turned out to be much better in tests.