An alternative method of creating low clock frequencies in VHDL

In the past, I asked a question about resetting and how to divide a high clock frequency up to a series of lower frequency frequencies of rectangular frequencies, where each output harmonizes each other, for example, the first output is 10 Hz, the second is 20 Hz, etc.

I got some really useful answers, apparently recommending an agreement to use a pin to synchronize the clock to create lower frequencies.

The alternative that happened to me; using the number of n bits, which is constantly increasing, and takes the last x bits of the number as the output of the clock, where x is the number of outputs.

It works in synthesis for me, but I'm curious to know that I have never seen it mentioned anywhere on the Internet or on SO, I miss something that means its really a terrible idea, and I'm just creating problems for later?

I know that the limitations on this are that I can only produce frequencies that are an input frequency divided by a power of 2, and so most of the time it will only approximate the desired output frequency (but it will still be of the correct order). Is this restriction the only reason this is not recommended?

Thank you very much!

David

library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; library UNISIM; use UNISIM.VComponents.all; use IEEE.math_real.all; ENTITY CLK_DIVIDER IS GENERIC(INPUT_FREQ : INTEGER; --Can only divide the input frequency by a power a of 2 OUT1_FREQ : INTEGER ); PORT(SYSCLK : IN STD_LOGIC; RESET_N : IN STD_LOGIC; OUT1 : OUT STD_LOGIC; --Actual divider is 2^(ceiling[log2(input/freq)]) OUT2 : OUT STD_LOGIC); --Actual output is input over value above END CLK_DIVIDER; architecture Behavioral of Clk_Divider is constant divider : integer := INPUT_FREQ / OUT1_FREQ; constant counter_bits : integer := integer(ceil(log2(real(divider)))); signal counter : unsigned(counter_bits - 1 downto 0) := (others => '0'); begin proc : process(SYSCLK) begin if rising_edge(SYSCLK) then counter <= counter + 1; if RESET_N = '0' then counter <= (others => '0'); end if; end if; end process; OUT1 <= counter(counter'length - 1); OUT2 <= not counter(counter'length - 2); end Behavioral; 
+5
source share
2 answers

Functionally, the two outputs OUT1 and OUT2 can be used as a clock, but this method of creating a clock is not scalable and can cause implementation problems, so this is a bad habit. However, of course, it is important to understand why this is so.

The reason why it does not scale is that each signal used as a clock in the FPGA must be distributed through a special clock grid where latency and skew are clearly defined, so all triggers and memory on each clock cycle are updated synchronously. The number of such watch networks is very limited, usually in the range of 10 to 40 in the FPGA device, and some restrictions on use and location make it more critical for planning the use of watch networks. Therefore, it is usually required to reserve the clock grid only for real asynchronous clocks, where there is no alternative than using the clock network.

The reason this can cause problems is because the clocks created on the basis of the bits in the counter do not have a guaranteed time ratio. Therefore, if you want to move data between these clock domains, additional synchronization restrictions are required to ensure that domain forwarding (CDC) processing is correct. This is done using constraints for synthesis and / or statistical time analysis (STA), and it is usually a little difficult to get right, so using a design methodology that simplifies STA is a habit that saves development time.

So, in designs where you can use a common clock, and then generate synchronous synchronization enable signals, this should be the preferred approach. For the particular design described above, the clock can be generated simply by detecting the transition '0' to '1' corresponding counter bit, and then approve the clock to be turned on for one cycle in which the transition is detected. Then a single clock network can be used, as well as 2 clock cycles, for example CE1 and CE2 , and no special STA restrictions are required.

+3
source
Morten has already pointed out the theory in his answer . With two examples, I will demonstrate the problems you encountered when using the generated clock instead of the clock.

Clock distribution

First you need to make sure that the clock (almost) simultaneously arrives at all the destination triggers. Otherwise, even a simple shift register with two steps like this will fail:

 process(clk_gen) begin if rising_edge(clk_gen) then tmp <= d; q <= tmp; end if; end if; 

The assumed behavior of this example is that q gets the value d after two rising edges of the generated clock signal clock_gen . If the generated clock is not buffered by the global synchronization buffer, then the delay will be different for each destination trigger, since it will be routed through general-purpose routing. Thus, the behavior of the shift register can be described as follows with some obvious delays:

 library ieee; use ieee.std_logic_1164.all; entity shift_reg is port ( clk_gen : in std_logic; d : in std_logic; q : out std_logic); end shift_reg; architecture rtl of shift_reg is signal ff_0_q : std_logic := '0'; -- output of flip-flop 0 signal ff_1_q : std_logic := '0'; -- output of flip-flop 1 signal ff_0_c : std_logic; -- clock input of flip-flop 0 signal ff_1_c : std_logic; -- clock input of flip-flop 1 begin -- rtl -- different clock delay per flip-flop if general-purpose routing is used ff_0_c <= transport clk_gen after 500 ps; ff_1_c <= transport clk_gen after 1000 ps; -- two closely packed registers with clock-to-output delay of 100 ps ff_0_q <= d after 100 ps when rising_edge(ff_0_c); ff_1_q <= ff_0_q after 100 ps when rising_edge(ff_1_c); q <= ff_1_q; end rtl; 

The next test bench is simply fed to “1” at input d , so q should be “0” after 1 clock with edge “1” after two clock edges.

 library ieee; use ieee.std_logic_1164.all; entity shift_reg_tb is end shift_reg_tb; architecture sim of shift_reg_tb is signal clk_gen : std_logic; signal d : std_logic; signal q : std_logic; begin -- sim DUT: entity work.shift_reg port map (clk_gen => clk_gen, d => d, q => q); WaveGen_Proc: process begin -- Note: registers inside DUT are initialized to zero d <= '1'; -- shift in '1' clk_gen <= '0'; wait for 2 ns; clk_gen <= '1'; -- just one rising edge wait for 2 ns; assert q = '0' report "Wrong output" severity error; wait; end process WaveGen_Proc; end sim; 

But, the simulation form shows that q already getting “1” after the first edge of the clock (by 3.1 ns), which is not the intended behavior. This is because FF 1 already sees the new value from FF 0 when the clock arrives there.

shift register simulation result

This problem can be solved by distributing the generated hours through the clock tree, which has a low angle. To access one of the FPGA sync shafts, you must use a global sync buffer, such as BUFG on Xilinx FPGAs.

Data transfer

The second problem is the transmission of multi-bit signals between two clock domains. Suppose we have 2 registers with 2 bits each. Register 0 is synchronized by the original clock, and register 1 is synchronized by the generated clock. The generated clock is already distributed in the clock tree.

Register 1 simply selects from register 0. But now different wire delays for both bits of the register between them play an important role. They were explicitly modeled in the following design:

 library ieee; use ieee.std_logic_1164.all; library unisim; use unisim.vcomponents.all; entity handover is port ( clk_orig : in std_logic; -- original clock d : in std_logic_vector(1 downto 0); -- data input q : out std_logic_vector(1 downto 0)); -- data output end handover; architecture rtl of handover is signal div_q : std_logic := '0'; -- output of clock divider signal bufg_o : std_logic := '0'; -- output of clock buffer signal clk_gen : std_logic; -- generated clock signal reg_0_q : std_logic_vector(1 downto 0) := "00"; -- output of register 0 signal reg_1_d : std_logic_vector(1 downto 0); -- data input of register 1 signal reg_1_q : std_logic_vector(1 downto 0) := "00"; -- output of register 1 begin -- rtl -- Generate a clock by dividing the original clock by 2. -- The 100 ps delay is the clock-to-output time of the flip-flop. div_q <= not div_q after 100 ps when rising_edge(clk_orig); -- Add global clock-buffer as well as mimic some delay. -- Clock arrives at (almost) same time on all destination flip-flops. clk_gen_bufg : BUFG port map (I => div_q, O => bufg_o); clk_gen <= transport bufg_o after 1000 ps; -- Sample data input with original clock reg_0_q <= d after 100 ps when rising_edge(clk_orig); -- Different wire delays between register 0 and register 1 for each bit reg_1_d(0) <= transport reg_0_q(0) after 500 ps; reg_1_d(1) <= transport reg_0_q(1) after 1500 ps; -- All flip-flops of register 1 are clocked at the same time due to clock buffer. reg_1_q <= reg_1_d after 100 ps when rising_edge(clk_gen); q <= reg_1_q; end rtl; 

Now just load the new data value "11" through register 0 using this test bench:

 library ieee; use ieee.std_logic_1164.all; entity handover_tb is end handover_tb; architecture sim of handover_tb is signal clk_orig : std_logic := '0'; signal d : std_logic_vector(1 downto 0); signal q : std_logic_vector(1 downto 0); begin -- sim DUT: entity work.handover port map (clk_orig => clk_orig, d => d, q => q); WaveGen_Proc: process begin -- Note: registers inside DUT are initialized to zero d <= "11"; clk_orig <= '0'; for i in 0 to 7 loop -- 4 clock periods wait for 2 ns; clk_orig <= not clk_orig; end loop; -- i wait; end process WaveGen_Proc; end sim; 

As can be seen from the next simulation output, the output of register 1 switches to the intermediate value “01” at 3.1 ns, because the input of register 1 ( reg_1_d ) still changes when the rising edge is generated by the clock. The intermediate value was not intended and could lead to undesirable behavior. The correct value is visible only until the next rising edge of the generated clock.

handover simulation output

To solve this problem, you can use:

  • special codes where only one bit is flipped at a time, for example, a gray code or
  • FIFO cross hours or
  • Acknowledgment with individual control bits.
+2
source

Source: https://habr.com/ru/post/1238381/


All Articles