Explore chapters and articles related to this topic
Pipeline Architecture
Published in Pranabananda Chakraborty, Computer Organisation and Architecture, 2020
Last but not least, another important design decision in the multicore system organisation is whether the individual cores will be superscalar or will implement SHMT. In recent years, there is a clean shift in system design away from structural parallelism (superscalar) and towards support for fine-grained structural parallelism (hardware threads). The basic driver behind such a shift is simple: achieving maximum performance for a given system cost. Development of multicore SHMT architecture is a clear consequence of such a shift. For example, the Intel Core Duo uses superscalar cores, whereas the more advanced Intel Core i7 uses SHMT cores. One of the main reasons behind this approach is that the SHMT has the outcome of scaling up the number of hardware-level threads that the multicore system provides. Thus, a multicore system with four cores and the SHMT that supports three simultaneous threads in each core appears to be the same as a multicore system with 12 cores at the application level. As the software development for a given range of applications on a platform having steady growth in VLSI capabilities proceeds to meet the grand challenges of fully exploiting parallel resources, an SHMT approach in this scenario appears to be more conducive and supportive than its counterpart, the superscalar approach.
Parallel and high-performance systems
Published in Joseph D. Dumas, Computer Architecture, 2016
On the plus side, the NUMA architecture scales well to a large number of processors, although the differential access time between local and remote memory access does tend to increase as the system grows larger. Not only may a given system be built with many (in some cases, hundreds or thousands of) CPUs, but most NUMA-based machines are designed to be highly configurable, such that a user may start out with a smaller system and add extra processor and memory boards as needed. This is possible because while the interconnection structure within local groups of processors is like that of an SMP and can only handle a limited number of components, the global interconnection network (although slower) is generally much more flexible in structure. Because of their power and flexibility, systems with this type of architecture became increasingly popular during the fifth and sixth generations of computing. Notable historical examples include the Data General/EMC Numaline and AViiON systems, the Sequent (later IBM) NUMA-Q, the Hewlett-Packard/Convex Exemplar line of servers, and the Silicon Graphics Origin 2000 and 3000 supercomputers. Most servers based on modern Intel (Core i7 and Xeon) and AMD (Opteron) x86-64 family processors are based on NUMA architectures.
A loosely-coupling hardware and software video compression for recording affordable MOOCs content with various scenes
Published in Amir Hussain, Mirjana Ivanovic, Electronics, Communications and Networks IV, 2015
Cheng-Yu Tsai, enq-Muh Hsu, Hung-Hsu Tsai, Zhi-Cheng Dai, Pao-Ta Yu
In this section, some experiments of real-time video compression are taken to compare the compression performance between our proposed system and legacy software-based compression. In order to measure the degree of real-time issue in video compression, four video frame rates, 15,18,20, and 24 frames per second, are used to evaluate the performance of real-time video compression. Two types of generated video, HD and Full HD, are generated to measure the degree of realtime issue of video compression. Four scenes are also used to increase the content complexity for comparing the effect of video compression between our proposed system and legacy software-based video compression. The four designed scenes are shown in Figure 4. Scene 1 contains on video source which type is Full HD. Scene 2 contains two Full HD video sources. Scene 3 has three video sources, two are Full HD and the other is HD. Scene 4 has four video sources, two Full HD sources and two HD sources. Two types of GPU accelerator cards are chosen, one card is Intel HD Graphics 4600 and the other card is NVIDIA Quadro K2000. The model of our experimental platform is Intel Core i7 4770.
Bayesian closed-loop robust process design considering model uncertainty and data quality
Published in IISE Transactions, 2020
Linhan Ouyang, Jianxiong Chen, Yizhong Ma, Chanseok Park, Jionghua (Judy) Jin
By contrast, if a second-order regression model is used, the objective function is a fourth-order function of the design variables and requires a numerical optimization procedure. The objective function may not be convex, and hence, the solution may not be unique. In order to avoid local optima, the optimization search is repeated with different starting points that are randomly selected from the feasible region. We choose the solution that yields the smallest objective value as the global optimum. Therefore, the solution to the second-order process model is more computationally intensive than a first-order model. Also, examining the data quality of every incoming response measure also increase the computation cost. In order to compare the computational cost of the online design approaches (i.e., and ) based on the second-order models, a CPU time criterion is used in the simulation. Assume that the number of manufacturing runs . A total of 10 initial points for the numerical optimization are randomly generated. Computations are performed on an Intel Core i5 Dual processor with a 2.4 GHz clock speed. The detailed comparison results can be seen in the Online Supplement of this article. The conclusion we obtained is that the updating time is acceptable for the milling process in our case study. Therefore, the proposed online approach is not overly prohibitive for real-time applications in terms of the average CPU time.