This is allowed, but you can get a huge performance hit, as blocking may not be possible to keep inside the cache and may go into full bus lock (full system stop, effective).
See, for example, https://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-for-multi-core-intel-em64t-and-ia32-architectures :
In the days of Intel 486 processors, the lock prefix used to approve the lock on the bus along with a big hit in the performance. Starting with the Intel Pentium Pro architecture, a bus lock translates to a cache lock. A lock will still be approved on the bus in most modern architectures if the lock is in non-shared memory or if the lock extends beyond the cache line of the cache line. Both of these scenarios are unlikely, so most lock prefixes will be converted to cache locks, which are much cheaper.
It may vary depending on the processor specification, but note that another consideration is that the border of the line of intersection can also mean crossing the page border, which is even harder to maintain (and therefore even more likely to downgrade).
Leeor source share