Title: A Comparison of Full and Partial Predicated Execution Support for ILP Processors Authors: Scott A. Mahlke, Richard E. Frank, James E. McCormick, David I. August and Wen-mei W. Hwu Conference: ISCA '95 What is it about? Analyzes the benefit of full and partial predicated execution support in processors. Problem? Branches are a major impediment to exploiting ILP. So remove branches using predication. However, this requires modifications in ISA and hardware. Trade-off with performance. Motivation: Branches are a major pain 'cause 1) impose control dependencies 2) processor resources to handle branch insns are restricted. You might be able to handle only one branch per cycle. multiple-issue processors take a hit. Predicated Execution(in hardware) - IF conversion(in compiler). Benefits: Reduces pressure on branch prediction. Compiler can expose multiple execution paths to hardware. Full Predicated Execution: All instructions provided with an additional source operand to hold predicate. Extensive changes to ISA and processor cores. Hardware Changes: Suppression of Execution: Predicate register file. Suppress in write-back/ Suppress in issue-decode. Cydra 5 suppresses in issue-decode stage. Expression of Condition: A set of new insns to set predicate registers.(Predicate define instructions) These insns are also predicated. Predicate types define how predicate registers are set on execution of these insns. Compiler Changes: Basic blocks are systematically included in a hyperblock. Hyperblock tries to capture large fraction of the likely control flow paths. Including too many blocks in a hyperblock may saturate processor. Partial Predicated Execution: Hardware Changes: Conditional Move: if (cond) dest = src. SPARC V9 and DEC Alpha provide this insn. Select: dest = (cond? src1:src2). Compiler Changes: Fully predicated code is first generated, then convert to partial predicated code. Predicate Promotion and Basic Conversions. Leads to addition of speculative code. Compiler should ensure that this code only modifies temporary registers or memory values and does not cause program terminating exceptions. Follow with peephole optimizations. Hyperblock: A collection of connected basic blocks in which control may only enter at the first block. Control flow may leave from one or more blocks in the hyperblock. All control flow between basic blocks in a hyperblock is eliminated via if-conversion. Evaluation: Emulation driven simulation. Emulation used to produce correct traces. Simulation used to perform timing analysis. Results do not show impact on branch prediction. Does predication remove those branches that are difficult to predict? So why didn't predicated support take off? Branch prediction became good removing the performance gains of predication? Intel decided not to change x86?