Code Review for 6384206

Prepared by: Thomas Rodriguez (never) on Mon Aug 25 15:16:37 PDT 2008
Workspace:/export/ws/rpo
Compare against: /export/ws/baseline
Summary of changes: 1334 lines changed: 917 ins; 234 del; 183 mod; 29413 unchg
Patch of changes: 6384206.patch
Author comments:
6384206: Phis which are later unneeded are impairing our ability to inline based on static types
Summary:
Reviewed-by:

This changes C2's parser into a real reverse post order parser.
Previously C2 used a pre order which was close to RPO but not quite
the same. RPO parsing help guarantee that as often as possible all
predecessors of a block are parsed before we start parsing it. This
helps code fold up more quickly and feeds more exact types into
inlining. There's also a new lightweight bytecode analysis during
ciTypeFlow that identifies phis at loop heads which aren't needed.
This also helps with our inlining decisions. It's really a band aid
until we introduce real post parse inlining.

Additionally ciTypeFlow now performs real loop detection which is used
to drive loop head cloning in ciTypeFlow. Previously the cloning was
based on bcis which was unreliable and the cloning could introduce
irreducible loops.

I added a new assert in loop opts to check that we don't create
irreducible loops during optimization but unfortunately it currently
fails periodically because of the split_flow_path optimization in
cfgnode.cpp. Until I can do something about that I'm going to leave
the assert commented out.

I also modified CTW so that the CompileTheWorldStartAt flag only
controls when compilation begins, which help with reproducing bugs
under CTW. If you start loading later then it's often the case that
you don't end up with the same inlining decisions in other compiles.
I also changed it to make the code not entrant after the compile which
helps keep the code cache from filling up. After this change only 3
of the jar files managed to fill up the code cache during CTW.

I added some LogCompilation output to track the loops we encounter
during compilation. The current output looks like this:

<loop_tree>
<loop idx='7598' >
<loop idx='8114' inner_loop='1' pre_loop='7614' >
</loop>
<loop idx='9506' inner_loop='1' main_loop='9506' >
</loop>
<loop idx='8049' inner_loop='1' post_loop='7614' >
</loop>
</loop>
<loop idx='8265' inner_loop='1' pre_loop='7630' >
</loop>
<loop idx='9118' inner_loop='1' main_loop='9118' >
</loop>
<loop idx='8189' inner_loop='1' post_loop='7630' >
</loop>
<loop idx='7601' inner_loop='1' >
</loop>
</loop_tree>

I haven't finished the performance results yet though mostly they
should be neutral and that's what I've seen so far. The main effect
of this is to get static binding of call sites for which we previously
relied on type profiling to get the right results. In all the
interesting cases I'd looked at, previously we generally got the right
answer but now we consistently get the right answer so it might make
our performance more stable.

Tested with CTW, runThese and nsk on i586, amd64, sparc and sparcv9.

Bug id: 6384206: Phis which are later unneeded are impairing our ability to inline based on static types
Legend: Modified file
Deleted file
New file

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/ci/ciTypeFlow.hpp

241 lines changed: 219 ins; 7 del; 15 mod; 692 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/ci/ciTypeFlow.cpp

740 lines changed: 545 ins; 109 del; 86 mod; 2274 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/ci/ciMethodBlocks.hpp

4 lines changed: 2 ins; 1 del; 1 mod; 120 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/ci/ciMethodBlocks.cpp

9 lines changed: 5 ins; 0 del; 4 mod; 389 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/opto/parse.hpp

19 lines changed: 10 ins; 3 del; 6 mod; 546 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/opto/parse1.cpp

165 lines changed: 58 ins; 60 del; 47 mod; 2059 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/opto/compile.hpp

3 lines changed: 3 ins; 0 del; 0 mod; 732 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/opto/compile.cpp

2 lines changed: 2 ins; 0 del; 0 mod; 2530 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/opto/bytecodeInfo.cpp

37 lines changed: 0 ins; 37 del; 0 mod; 506 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/opto/doCall.cpp

1 line changed: 0 ins; 0 del; 1 mod; 863 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/opto/cfgnode.cpp

5 lines changed: 4 ins; 0 del; 1 mod; 2002 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/opto/graphKit.cpp

2 lines changed: 0 ins; 0 del; 2 mod; 3175 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/opto/loopTransform.cpp

2 lines changed: 2 ins; 0 del; 0 mod; 1731 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/opto/loopnode.cpp

46 lines changed: 36 ins; 6 del; 4 mod; 2832 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/opto/loopopts.cpp

4 lines changed: 4 ins; 0 del; 0 mod; 2711 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/opto/memnode.cpp

4 lines changed: 4 ins; 0 del; 0 mod; 3902 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/classfile/classLoader.cpp

48 lines changed: 21 ins; 11 del; 16 mod; 1247 unchg

Cdiffs Udiffs Sdiffs Frames Old New Patch Raw src/share/vm/includeDB_compiler2

2 lines changed: 2 ins; 0 del; 0 mod; 1102 unchg

This code review page was prepared using /net/smite.sfbay/never/bin/hgwebrev (vers 23.12-hg-never).