Explore chapters and articles related to this topic
Implementing DBNS Arithmetic
Published in Vassil Dimitrov, Graham Jullien, Roberto Muscedere, Multiple-Base Number System, 2017
Vassil Dimitrov, Graham Jullien, Roberto Muscedere
Basic multiplication algorithms produce a product, z, from a multiplier, x, and a multiplicand, y: z = x × y. The classical multiplication algorithm generates the product from the sum of partial products, each computed by the multiplication of each digit of the multiplier by the multiplicand. In the case of a standard binary representation, each partial product is simply the multiplicand shifted to the left by the multiplier digit position (assuming positive integer multiplication), as shown in Equation (3.7) for B-bit integers.
Computer Arithmetic Design Using Verilog HDL
Published in Joseph Cavanagh, Verilog HDL Design Examples, 2017
This section presents the multiplication of two fixed-point binary operands in the 2s complement number representation. An n-bit multiplicand is multiplied by an n-bit multiplier to produce a 2n-bit product. The multiplication algorithm consists of multiplying the multiplicand by the low-order multiplier bit to obtain a partial product. If the multiplier bit is a 1, then the multiplicand becomes the partial product; if the multiplier bit is a 0, then zeroes become the partial product. The partial product is then shifted left one bit -position.
Efficient Implementation
Published in Alfred J. Menezes, Paul C. van Oorschot, Scott A. Vanstone, Handbook of Applied Cryptography, 2018
Alfred J. Menezes, Paul C. van Oorschot, Scott A. Vanstone
The classical algorithm for multiplication (Algorithm 14.12) takes O(n2) bit operations for multiplying two n-bit integers. A recursive algorithm due to Karatsuba and Ofman [661] reduces the complexity of multiplying two n-bit integers to O(n1.58). This divide-and-conquer method is based on the following simple observation. Suppose that x and y are n-bit integers and n = 2t. Then x = 2tx1 +x0 and y = 2ty1 + y0, where x1,y1 are the t high-order bits of x and y, respectively, and x0,y0 are the t low-order bits. Furthermore, x⋅y=u222t+u12t+u0 where u0 = x0 · y0, u2 = x1 · y1 and u1 = (x0 + x1) · (y0 + y1) – u0 – u2. It follows that x· y can be computed by performing three multiplications of t-bit integers (as opposed to one multiplication with 2t-bit integers) along with two additions and two subtractions. For large values of t, the cost of the additions and subtractions is insignificant relative to the cost of the multiplications. With appropriate modifications, u0, u1 and (x0 + x1) · (y0 + y1) can each be computed similarly. This procedure is continued on the intermediate values until the size of the integers reaches the word size of the computing device, and multiplication can be efficiently accomplished. Due to the recursive nature of the algorithm, a number of intermediate results must be stored which can add significant overhead, and detract from the algorithm’s efficiency for relatively small integers. Combining the Karatsuba-Ofman method with classical multiplication may have some practical significance. For a more detailed treatment of the Karatsuba-Ofman algorithm, see Knuth [692], Koc [698], and Geddes, Czapor, and Labahn [445].
Teaching redundant residue number system for electronics and computer students
Published in International Journal of Mathematical Education in Science and Technology, 2022
Algorithm 2 (RRNS (2n±1, redundant) Multiplication): Let’s assume two redundant residues A and B in RRNS. A and B are two k-digit numbers. The RRNS multiplication algorithm is composed of three main steps: Digit-product computation: digit Ai is multiplied by Bj for 0 ≤ i,j < k,Digit-products rotation and arrangement to generate partial products, andPartial product accumulation according to RRNS modular addition rules. The multiplication of two k-digit RRNS numbers for modulo 2n + 1 is depicted in Figure 12. There is a step for rotating partial products in RRNS multiplication that redundant multiplication does not contain it. Besides, the partial product accumulation is performed by RRNS addition.
A GPU-Accelerated Filtered Density Function Simulator of Turbulent Reacting Flows
Published in International Journal of Computational Fluid Dynamics, 2020
M. Inkarbekov, A. Aitzhan, A. Kaltayev, S. Sammak
Here IMRI, IMRE, IMLI and IMLE are the matrices of boundary integrals. Most of the elements of these matrices are zero. Thus, a sparse matrix-vector multiplication algorithm can be employed for higher efficiency. We use a coordinate format also known as ‘ijv’ or ‘triplet’ for storing sparse matrices and vector multiplication (Goharian, Jain, and Sun 2003; Buluç, Gilbert, and Shah 2011). It is noticed that each value of FH is used 4 times in the kernel. It means that global memory is accessed 4 times, which results in 1400 clock cycles. This number can be reduced by using shared memory, as each access to shared memory costs 28 clock cycles. For this, we load the values of FH from the global memory into shared memory. This implies 350 + 28 = 378 clock cycles, and then each of them is read from shared memory 4 times. That is an additional clock cycles. Summing up all, we get 490 clock cycles in total.
A programmable ternary CPU using hybrid CMOS/memristor circuits
Published in International Journal of Parallel, Emergent and Distributed Systems, 2018
Daniel Wust, Dietmar Fey, Johannes Knödtel
As the input data for the algorithms uses constant word length, the energy-delay product for the adder unit yields a constant energy-delay as there are no transitions for positions exceeding the constant word length. Figure 6 summarises the results for the multiplier and divider. The multiplication algorithm is nearly linear due to the increasing number of stages with increasing word length. The division algorithm shows further a logarithmic component in the complexity due to the normalization in the preprocessing steps. It is recognizable that a break-even point exists between a word length of 32 digits and 64 digits at which the signed-digit algorithms outperform the conventional algorithms in regards of the energy-delay product.