Morten has already pointed out the theory in
his answer . With two examples, I will demonstrate the problems you encountered when using the generated clock instead of the clock.
Clock distribution
First you need to make sure that the clock (almost) simultaneously arrives at all the destination triggers. Otherwise, even a simple shift register with two steps like this will fail:
process(clk_gen) begin if rising_edge(clk_gen) then tmp <= d; q <= tmp; end if; end if;
The assumed behavior of this example is that q gets the value d after two rising edges of the generated clock signal clock_gen . If the generated clock is not buffered by the global synchronization buffer, then the delay will be different for each destination trigger, since it will be routed through general-purpose routing. Thus, the behavior of the shift register can be described as follows with some obvious delays:
library ieee; use ieee.std_logic_1164.all; entity shift_reg is port ( clk_gen : in std_logic; d : in std_logic; q : out std_logic); end shift_reg; architecture rtl of shift_reg is signal ff_0_q : std_logic := '0'; -- output of flip-flop 0 signal ff_1_q : std_logic := '0'; -- output of flip-flop 1 signal ff_0_c : std_logic; -- clock input of flip-flop 0 signal ff_1_c : std_logic; -- clock input of flip-flop 1 begin -- rtl -- different clock delay per flip-flop if general-purpose routing is used ff_0_c <= transport clk_gen after 500 ps; ff_1_c <= transport clk_gen after 1000 ps; -- two closely packed registers with clock-to-output delay of 100 ps ff_0_q <= d after 100 ps when rising_edge(ff_0_c); ff_1_q <= ff_0_q after 100 ps when rising_edge(ff_1_c); q <= ff_1_q; end rtl;
The next test bench is simply fed to “1” at input d , so q should be “0” after 1 clock with edge “1” after two clock edges.
library ieee; use ieee.std_logic_1164.all; entity shift_reg_tb is end shift_reg_tb; architecture sim of shift_reg_tb is signal clk_gen : std_logic; signal d : std_logic; signal q : std_logic; begin -- sim DUT: entity work.shift_reg port map (clk_gen => clk_gen, d => d, q => q); WaveGen_Proc: process begin -- Note: registers inside DUT are initialized to zero d <= '1'; -- shift in '1' clk_gen <= '0'; wait for 2 ns; clk_gen <= '1'; -- just one rising edge wait for 2 ns; assert q = '0' report "Wrong output" severity error; wait; end process WaveGen_Proc; end sim;
But, the simulation form shows that q already getting “1” after the first edge of the clock (by 3.1 ns), which is not the intended behavior. This is because FF 1 already sees the new value from FF 0 when the clock arrives there.

This problem can be solved by distributing the generated hours through the clock tree, which has a low angle. To access one of the FPGA sync shafts, you must use a global sync buffer, such as BUFG on Xilinx FPGAs.
Data transfer
The second problem is the transmission of multi-bit signals between two clock domains. Suppose we have 2 registers with 2 bits each. Register 0 is synchronized by the original clock, and register 1 is synchronized by the generated clock. The generated clock is already distributed in the clock tree.
Register 1 simply selects from register 0. But now different wire delays for both bits of the register between them play an important role. They were explicitly modeled in the following design:
library ieee; use ieee.std_logic_1164.all; library unisim; use unisim.vcomponents.all; entity handover is port ( clk_orig : in std_logic; -- original clock d : in std_logic_vector(1 downto 0); -- data input q : out std_logic_vector(1 downto 0)); -- data output end handover; architecture rtl of handover is signal div_q : std_logic := '0'; -- output of clock divider signal bufg_o : std_logic := '0'; -- output of clock buffer signal clk_gen : std_logic; -- generated clock signal reg_0_q : std_logic_vector(1 downto 0) := "00"; -- output of register 0 signal reg_1_d : std_logic_vector(1 downto 0); -- data input of register 1 signal reg_1_q : std_logic_vector(1 downto 0) := "00"; -- output of register 1 begin -- rtl -- Generate a clock by dividing the original clock by 2. -- The 100 ps delay is the clock-to-output time of the flip-flop. div_q <= not div_q after 100 ps when rising_edge(clk_orig); -- Add global clock-buffer as well as mimic some delay. -- Clock arrives at (almost) same time on all destination flip-flops. clk_gen_bufg : BUFG port map (I => div_q, O => bufg_o); clk_gen <= transport bufg_o after 1000 ps; -- Sample data input with original clock reg_0_q <= d after 100 ps when rising_edge(clk_orig); -- Different wire delays between register 0 and register 1 for each bit reg_1_d(0) <= transport reg_0_q(0) after 500 ps; reg_1_d(1) <= transport reg_0_q(1) after 1500 ps; -- All flip-flops of register 1 are clocked at the same time due to clock buffer. reg_1_q <= reg_1_d after 100 ps when rising_edge(clk_gen); q <= reg_1_q; end rtl;
Now just load the new data value "11" through register 0 using this test bench:
library ieee; use ieee.std_logic_1164.all; entity handover_tb is end handover_tb; architecture sim of handover_tb is signal clk_orig : std_logic := '0'; signal d : std_logic_vector(1 downto 0); signal q : std_logic_vector(1 downto 0); begin -- sim DUT: entity work.handover port map (clk_orig => clk_orig, d => d, q => q); WaveGen_Proc: process begin -- Note: registers inside DUT are initialized to zero d <= "11"; clk_orig <= '0'; for i in 0 to 7 loop -- 4 clock periods wait for 2 ns; clk_orig <= not clk_orig; end loop; -- i wait; end process WaveGen_Proc; end sim;
As can be seen from the next simulation output, the output of register 1 switches to the intermediate value “01” at 3.1 ns, because the input of register 1 ( reg_1_d ) still changes when the rising edge is generated by the clock. The intermediate value was not intended and could lead to undesirable behavior. The correct value is visible only until the next rising edge of the generated clock.

To solve this problem, you can use:
- special codes where only one bit is flipped at a time, for example, a gray code or
- FIFO cross hours or
- Acknowledgment with individual control bits.