This is just a marketing decoration.
I did not find references to "TSX-NI," either on the Internet, or in the Intel manuals, or in the ISA extension guide for Intel.
Intel Quote [ 1 ]
Intel Transaction Sync Extensions (Intel TSX) come in two flavors: HLE and RTM.
Thanks to their implementation, these two aspects are separated from each other (or can be maintained separately from the other), and only RTM introduces new instructions.
Therefore, they are probably related to RTM.
I believe that HLE was introduced at first, and there should be processors that support HLE but not RTM (the opposite, although it may seem implausible).
So perhaps this is the right marketing way to say, "This processor supports our latest TSX features!"
For reference, I wrote a brief introduction to the two parts of Intel TSX, based on the assumption that “TSX-NI” refers to “TSX RTM”.
The full link can be found in Intel Guide 1 - Chapter 15.
Hle
The HLE (Hardware Lock Elision) part is backward compatible.
We can still check its availability with CPUID.07H.EBX.HLE [bit 4], but it is implemented by changing the semantics of the repne
/ repe
for instructions.
This function consists of two "new" prefixes: xacquire
and xrelease
. Now the CPU can enter a transactional state when each read is added to the read set of transactions, and each record is added to the set of transaction records and is not executed in memory.
Granularity is the size of the cache line.
If a stream reads from a read-set or writes to either a read-set or a record set of another stream, the transaction is aborted. The CPU restores the architectural state, as it was at the beginning of the transaction, and re-executes the instructions without transactions.
If the transaction completes successfully, all recorded memory will be committed atomically.
Transactions are divided into xacquire
and xrelease
.
They can be entered, but there is a limit on the depth (above which the transaction is aborted) and the number of different locks that can be canceled (it is exceeded that the CPU will not overcome new locks, but will not stop the transaction).
When a nested transaction is aborted, the CPU restarts the most external transaction.
xacquire
(operation code F2
, the same as repne
) is used before an instruction that will receive a lock (i.e. write to the lock) and marks the start of the transaction.
This read is not added to the recordset (or it cannot be concurrency, since each thread writes a lock and that will immediately abort the subsequent transaction).
Instead, it is added to the read set.
xrelease
(opcode F3
) is used before the statement, which will release the lock and mark the end of the transaction.
xrelease
must be used with the same lock used with xacquire
to pair with it and complete the transaction.
xacquire
can only be used with the lock
d version of these instructions: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCHG8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, XCHG
.
xrelease
with the same instructions plus MOV mem, reg
and MOV mem, imm
without the lock
prefix.
The new xtest
command is available if HLE (or RTM) is present; it sets ZF if the processor is not inside a transaction.
RTM
RTM (limited transactional memory) does not support backward compatibility.
It can be tested with CPUID.07H.EBX.RTM [bit 11].
It introduces three new instructions: xbegin
, xend
and xabort
.
This is just a new interface to the already defined and general transactional execution capability.
xbegin
must point to the backup code as a relative offset.
This code is executed whenever a transaction is not executed.
In such cases, eax
contains the cause of the interrupt.
xend
completes the transaction and instructs the CPU to complete it.
xabort
allows the programmer to explicitly discard the transaction using a special error code.
Intel makes no warranties regarding the processor’s ability to successfully complete a transaction.
While HLE has many very specific conditions, RTM is a "best effort" feature, which is why it is a requirement for backup code.
RTM is lower level than HLE; it allows the programmer to use transactional memory with or without locks.
Mixing HLE and RTM
Intel Quote:
The behavior when HLE and RTM are nested together - HLE inside RTM or RTM inside HLE is a specific implementation. However, in all cases, the implementation will support the semantics of HLE and RTM. An implementation may choose to ignore HLE hints when used in RTM regions, and may result in a transaction abort when RTM instructions are used inside HLE regions. In the latter case, the transition from transactional to non-transactional execution occurs smoothly as the processor will re-execute the HLE region without actually elixing, and then follow the RTM instructions.