What is the use of self-editing code?

Is there any real use for self-modifying code ?

I know that they can be used to create worms / viruses, but I was wondering if there are good reasons why a programmer might have to use self-editing code.

Any ideas? Hypothetical situations are also welcome.

+46
executable
Feb 05 '09 at 16:34
source share
14 answers

It turns out that the Wikipedia entry “ self-modifying code ” has an excellent list:

  • Semi-automatic optimization of a state-dependent cycle.
  • Generation of runtime code or specialization of the algorithm at runtime or load time (which is popular, for example, in the field of real-time graphics), for example, a general sort utility for preparing code for execution of a key comparison described in a specific call.
  • Changing the built-in state of an object or simulating a high-level construction of shutters.
  • Setting the address of the subprogram address , as is usually done when loading dynamic libraries, or, for each call, correcting the subprogram internal links to its parameters in order to use their actual addresses. Whether this is considered a “self-modifying code” or not is a case of terminology.
  • Evolutionary computing systems such as genetic programming.
  • Hiding the code prevents reverse engineering since it uses a disassembler or debugger.
  • Hiding code to avoid detection with virus and spyware scanning software and.
  • Filling 100% of the memory (in some architectures) with a rewind pattern of repeated operation codes, erase all programs and data , or write down the hardware .
  • Compression of the code to be unpacked and executed at runtime, for example, when memory or disk space is limited.
  • Some very limited instruction sets leave no choice but to use self-modifying code to achieve certain functionality . For example, "One Instruction Set Computer", which uses only subtract-and-branches, if the "instruction" cannot do indirect copying (something like the equivalent of "* a = ** b" in C programming language) without using self-modifying code.
  • Resiliency Modification Instructions

Due to the fact that hacker hackers use a self-modifying code:

Over the course of several firmware updates, DirectTV slowly collected the program on its smart card to destroy cards that were hacked in order to illegally receive unpaid channels. See the Jeff Coding Horror Article on the Black Sunday Hack for more information.

+46
Feb 05 '09 at 16:37
source share

I saw the self-modifying code used for:

  • speed optimization due to the fact that the program writes more code for itself on the fly

  • deliberation, do complex restructuring

+12
Feb 05 '09 at 16:37
source share

In the old days, when RAM was limited, self-modifying code was used to save memory. Currently, for example, application compression utilities, such as UPX , are used to unpack / modify native code after loading a compressed image of the application.

+11
Feb 05 '09 at 16:40
source share

Since Commodore 64 does not have many registers and has a 1 MHz processor. When you need to read a memory address offset by a value, it is easier to change the source.

@Reader: LDA $C000 STA $D020 INC Reader+1 JMP Reader 

The last time I wrote self-modifying code :-)

+6
Feb 05 '09 at 17:02
source share

The assembler languages ​​of the 1960s used self-modifying code to implement function calls without a stack.

Knuth, v1, 1ed p. 182:

 MAX100 STJ EXIT ;Subroutine linkage ENT3 100 ;M1. Initialize JMP 2F 1H CMPA X,3 ;M3. Compare JGE *+3 2H ENT2 0,3 ;M4. Change m LDA X,3 ;(New maximum found) DEC3 1 ;M5. Decrease k J3P 1B ;M2. All tested? EXIT JMP * ;Return to main program 

In a larger program containing this encoding as a subroutine, one “JMP MAX100” command will cause register A to be set to the current maximum location value X + 1 - X + 100, and the maximum position will appear in rI2. The subroutine in this case is reached by the instructions "MAX100 STJ EXIT", and then "EXIT JMP *". Due to how the J-register works, the exit instruction will then move to the place following the place where the original reference to the MAX100 was made.

Edit: It may be difficult to understand what is happening, even with a brief explanation here. On the MAX100 STJ EXIT line, MAX100 STJ EXIT , MAX100 is the label for the instruction (and therefore for the procedure as a whole), STJ means STORE in the jump register (where we just came from), EXIT means the memory location marked as “EXIT” is the target of STORE. EXIT , we will see later the label of the last instruction. So this is code rewriting! But many instructions (including STJ here) implicitly rewrite only part of the operand of a command word. Thus, JMP remains untouched, and * is a dummy marker, since there really is nothing significant in it, it is only overwritten.




Self-modifying code is also used where indirect addressing is unavailable, and yet the address you need is right there in the register. PDP-1 LISP:

 dap .+1 ;deposit address part of accumulator in (IP+1) lac xy ;load accumulator with (ADDRESS) [xy is a dummy symbol, just like * above] 

These two commands execute ACC := (ACC) by changing the operand of the load command.

Modifications like these are relatively safe, and on ancient architectures they are needed.

+6
Sep 17 '11 at 5:53
source share

Many reasons. Above my head:

  • Runtime class construction and metaprogramming. For example, having a factory class that accepts a connection to an SQL table and generates a client class specialized for that table (with accessories for columns, search methods, etc.).

  • Then, of course, there is a well-known example of bitblt and regular expressions.

  • Dynamic optimization based on information RT a la tracing JITs

  • Specialization of a subtype of common ada style functions in an accretionary environment.

- MarkusQ

+5
Feb 05 '09 at 17:37
source share

Because it is really great, and sometimes this reason.

+5
Jan 6
source share

Dynamic linking is a kind of self-modification (correction of absolute and / or relative transition points) ... which is usually done by the O / S program loader.

+4
Feb 05 '09 at 16:37
source share

Artificial Intelligence?

+4
Feb 05 '09 at 20:09
source share

Neural networks are a kind of self-modifying code.

Then there are evolutionary algorithms that change themselves.

+3
Feb 05 '09 at 16:38
source share

LOL - two times I wrote self-modifying code:

  • the first time I learned assembly language, before I understood indirect indexed access
  • randomly like pointer errors in assembly language and C

I can imagine that there may be scenarios where self-modifying code will be more efficient than alternatives, but nothing worth the obvious. In general, this should be avoided - debugging a nightmare, etc. - if you are not intentionally trying to get confused, as mentioned above.

+3
Feb 05 '09 at 16:54
source share

Mike Abrash describes the Pixomatic code generator for Dr. Dobb Journal a while ago: http://www.ddj.com/architect/184405807 . This is a software 3D dx7 (?) Compatible rasterizer.

+2
Feb 05 '09 at 16:51
source share

Applications that implement their own scripting languages ​​often do this. For example, database servers often compile stored procedures (or queries) in this way.

+1
Feb 05 '09 at 16:49
source share

Dynamic code generation in SwiftShader is a form of self-modifying code that allows it to efficiently implement Direct3D 9 on the processor.

0
Feb 06 '09 at 13:58
source share



All Articles