"Extended arithmetic" is a drawback of the C language family. It is not possible to get an integer processor overflow flag, so there is no portable way to write an optimal 128-bit integer class.
For better performance (to compete with other crypto libraries), you may need a static library with a custom assembly inside. Unfortunately, I do not know a portable (wide-pore) interface to this.
If you just need a card from each fundamental type with N bits with 2N bits, then do a simple metafunction:
template< typename half > struct double_bits; template<> struct double_bits< std::uint8_t > { typedef std::uint16_t type; }; template<> struct double_bits< std::uint16_t > { typedef std::uint32_t type; }; template<> struct double_bits< std::uint32_t > { typedef std::uint64_t type; };
source share