This is achieved by having a web server in the Webdriver add-on for Firefox. Then the language bindings make a call like REST-ish to do something, such as clicks, text input, etc.
You can see the REST interface here . We call this the Json Wire protocol. We also have more native events through the advanced user interaction API . This makes more native clicks and text input at the OS level.
, .