In both versions of pop :
T *pop() { T *p = root; root = root->next; return p; }
and
T *pop() { return __sync_lock_test_and_set(&root, root->next); }
You already have an error, which is that you are not checking that your list / stack is not empty before reading from the supposed root of the node.
This binds the issue you mentioned about the need to dereference root in order to move on to the next before test_and_set even happens. It essentially becomes a test_and_then_test_and_set test, where and_then means more than one step is required.
Your first version of pop should be:
T *pop() { T *p = root; if (root) { root = root->next; } return p; }
and, as I am sure, you can see that this adds even more steps to the mix.
source share