Intel Skylake has a single "unified" scheduler (the picture to the left of the WikiChip).
AMD Zen uses separate schedulers for each complete runtime device and one scheduler for floating point runtime blocks (the figure to the right of the WikiChip, which took it from AMD's presentation).


What are the advantages and disadvantages of any design?
How does this affect micro-optimization of x86 code? (I know that this can be quite complicated and subtle, see, for example, How is x86 uops planned, exactly? For the corresponding question, which at the time of writing was more specifically for Intel processors with its unified scheduler).