What are the main differences between a user space driver and a kernel driver?
User space drivers run in user space. Kernel drivers run in kernel space.
What are the limitations of both of them?
The kernel driver can do everything the kernel can do, so you can say that it has no limits. But kernel drivers are much more difficult to "prove correct" and debug. This makes it easy to enter race conditions or use the kernel function in the wrong context or with the wrong lock. Things seem to be working for a while, but will cause problems (including crashing the entire system) along the way. Drivers should also be careful when reading all user input (both from the device and user space), because sometimes invalid data can cause crashes.
A user space driver usually requires a small amount of kernel spacing to do this. Usually this "padding" provides a simpler API. For example, the FUSE layer allows people to write file systems in any language. They can be installed, read / written, then unmounted. The gasket must also protect the kernel from all invalid input.
User space drivers have many limitations. For example, the kernel reserves some memory for use during emergencies, but this is not available to users. During memory pressure, the kernel will kill random user space programs, but will never destroy kernel threads. User space programs may be replaced, which may result in your device becoming unavailable for several seconds. (Kernel code cannot be replaced.) Several context switches are required to run code in user space. This waste is a lot of CPU time. If your device is a 300 baud modem, no one will notice. But if it is a gigabit Ethernet card, and each packet must get into your user driver before it reaches the real user, the system will have bottlenecks.
User space programs are also “harder” to use because you need to install this software for user space, which often has many dependencies in the library. Kernel modules "just work."
Why are user space drivers commonly used and currently preferred over kernel drivers?
Question: "Does this complexity really have to be at the core?"
I worked for a company that made USB keys that talked about a particular protocol. We could write a complete kernel driver, but instead just wrote our program on top of libUSB.
Advantages: the program was portable between Linux, Mac, Win. Do not worry about our code and GPL.
Disadvantages: if the device is needed to transfer data to a PC and quickly get an answer, there is no guarantee that this will happen. For example, if we needed a real-time control loop on a PC, it would be harder to have a limited response time. (Perhaps not entirely impossible on Linux.)
If there is a way to do this in user space, I would try this first. Only if there are significant performance bottlenecks or significant difficulty in storing it in user space, do you move it. Even then, consider the “laying” approach and / or the “emulator” approach (where your kernel module makes your device look like a serial port or block device.)
On the other hand, if there are already several kernel modules similar to what you want, then start there.