The CPU does not see your tags, it goes from instruction to instruction.
If the current instruction is not some kind of jump ( call and ret are also some types of transitions) - after the processor is executed with the current instruction, it will go to the next one, following it.
When you execute call my_function , it will execute all the instructions inside the function, then after executing ret it will return to the next command after call .
And the next instruction will be the first my_function statement again, doing it a second time ... After hitting ret second time, it really gets lost, who knows where ( ret will take the value at the top of the stack and use it as the address of the next command, so everything that was on the stack during the second ret happens where your code now works ...)
The assembly source is not just a group of instructions, but you also position them in memory and control the flow of code by placing one instruction after another. The CPU will execute them sequentially, line by line, just like you wrote them (except when you change the code stream using some kind of jump, then you can jump over several lines of the source).
So, if you want the CPU to stop after your main "finished" and you create a bootloader, i.e. there is nothing to return (no OS or something like that), you will create a dead end at the end of the main infinite loop, for example:
dead_end_loop: pause ; give CPU hint this is idling loop ; so it will save power by switching off some circuitry jmp dead_end_loop
And this "end of the main" right after the call my_function . "my_function" itself should be defined outside of "main", for example, after this infinite loop stop.
You may have missed what jmp $ and what the target was in the source. The $ symbol in this case means the assembler "address of the current command / line", so jmp $ can be translated to "go to the same line", which means that it is an infinite loop, the processor will never execute anything other than this jmp $ ( unless it has been configured to handle certain interrupt signals, then any such external signal will cause the CPU to switch execution to a specific interrupt handler code, since the programmer / OS has configured it before entering an infinite loop).
Another idea: you can check https://schweigi.imtqy.com/assembler-simulator/ and the “step” over the example several times to see how the processor does not work, t see the source code, but only bytes of machine code (visible on the right side as “memory”) and how it moves from one instruction to the next, how IP changes, etc ....