<div dir="ltr"><div>I am working with something that does a lot of SHA1's.  I am trying to profile my application and generate flame graphs (see <a href="http://www.brendangregg.com/flamegraphs.html">http://www.brendangregg.com/flamegraphs.html</a> ), but profiling tools cannot successfully backtrace when the processor is running the optimized SHA1 code on x86_64.  This patch adds CFI directives when compiled with a GNU assembler to enable tools that understand DWARF debugging information to backtrace in this circumstance.</div><div><br></div><div>I don't have a build environment for win64, but I did verify that the perl code does not generate the CFI directives if we are not generating code for the GNU assembler (IE if $cfi is not set).</div><div><br></div><div>    -Matt</div><div><br></div><div><br></div><div>commit 9522d706fa58679abd0b6f923aad623fad39abe5</div><div>Author: Matt Cross <<a href="mailto:matt.cross@gmail.com" target="_blank">matt.cross@gmail.com</a>></div><div>Date:   Wed Mar 25 14:15:37 2015 -0400</div><div><br></div><div>    Add CFI directives to the x86_64 SHA1 implementation to allow DWARF aware utilities to backtrace through these routines.</div><div><br></div><div>diff --git a/crypto/sha/asm/<a href="http://sha1-x86_64.pl" target="_blank">sha1-x86_64.pl</a> b/crypto/sha/asm/<a href="http://sha1-x86_64.pl" target="_blank">sha1-x86_64.pl</a></div><div>index 9bb6b49..9fe7b2b 100755</div><div>--- a/crypto/sha/asm/<a href="http://sha1-x86_64.pl" target="_blank">sha1-x86_64.pl</a></div><div>+++ b/crypto/sha/asm/<a href="http://sha1-x86_64.pl" target="_blank">sha1-x86_64.pl</a></div><div>@@ -95,6 +95,7 @@ die "can't locate <a href="http://x86_64-xlate.pl" target="_blank">x86_64-xlate.pl</a>";</div><div> if (`$ENV{CC} -Wa,-v -c -o /dev/null -x assembler /dev/null 2>&1`</div><div> <span style="white-space:pre-wrap">           </span>=~ /GNU assembler version ([2-9]\.[0-9]+)/) {</div><div> <span style="white-space:pre-wrap">  </span>$avx = ($1>=2.19) + ($1>=2.22);</div><div>+<span style="white-space:pre-wrap">   </span>$cfi = 1</div><div> }</div><div> </div><div> if (!$avx && $win64 && ($flavour =~ /nasm/ || $ENV{ASM} =~ /nasm/) &&</div><div>@@ -247,6 +248,8 @@ $code.=<<___;</div><div> .type<span style="white-space:pre-wrap">   </span>sha1_block_data_order,\@function,3</div><div> .align<span style="white-space:pre-wrap">       </span>16</div><div> sha1_block_data_order:</div><div>+`".cfi_startproc" if $cfi`</div><div>+</div><div> <span style="white-space:pre-wrap">  </span>mov<span style="white-space:pre-wrap">     </span>OPENSSL_ia32cap_P+0(%rip),%r9d</div><div> <span style="white-space:pre-wrap"> </span>mov<span style="white-space:pre-wrap">     </span>OPENSSL_ia32cap_P+4(%rip),%r8d</div><div> <span style="white-space:pre-wrap"> </span>mov<span style="white-space:pre-wrap">     </span>OPENSSL_ia32cap_P+8(%rip),%r10d</div><div>@@ -275,17 +278,35 @@ $code.=<<___;</div><div> .align<span style="white-space:pre-wrap">  </span>16</div><div> .Lialu:</div><div> <span style="white-space:pre-wrap">     </span>mov<span style="white-space:pre-wrap">     </span>%rsp,%rax</div><div>+`".cfi_def_cfa_register rax" if $cfi`</div><div> <span style="white-space:pre-wrap">       </span>push<span style="white-space:pre-wrap">    </span>%rbx</div><div>+# The CFA (Cononical Frame Address) is after the pushed return value, so RBX was just stored at CFA - 16:</div><div>+`".cfi_offset rbx,-16" if $cfi`</div><div> <span style="white-space:pre-wrap"> </span>push<span style="white-space:pre-wrap">    </span>%rbp</div><div>+`".cfi_offset rbp,-24" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>push<span style="white-space:pre-wrap">    </span>%r12</div><div>+`".cfi_offset r12,-32" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>push<span style="white-space:pre-wrap">    </span>%r13</div><div>+`".cfi_offset r13,-40" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>push<span style="white-space:pre-wrap">    </span>%r14</div><div>+`".cfi_offset r14,-48" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>mov<span style="white-space:pre-wrap">     </span>%rdi,$ctx<span style="white-space:pre-wrap">       </span># reassigned argument</div><div> <span style="white-space:pre-wrap">  </span>sub<span style="white-space:pre-wrap">     </span>\$`8+16*4`,%rsp</div><div> <span style="white-space:pre-wrap">        </span>mov<span style="white-space:pre-wrap">     </span>%rsi,$inp<span style="white-space:pre-wrap">       </span># reassigned argument</div><div> <span style="white-space:pre-wrap">  </span>and<span style="white-space:pre-wrap">     </span>\$-64,%rsp</div><div> <span style="white-space:pre-wrap">     </span>mov<span style="white-space:pre-wrap">     </span>%rdx,$num<span style="white-space:pre-wrap">       </span># reassigned argument</div><div> <span style="white-space:pre-wrap">  </span>mov<span style="white-space:pre-wrap">     </span>%rax,`16*4`(%rsp)</div><div>+# This adds a "CFA expression" to say that the CFA is calculated by reading the value at RSP+0x40, and adding 8 to it:</div><div>+# DW_CFA_def_cfa_expression    0x0f           : says CFA is calculated by evaluating the following expression</div><div>+# BLOCK</div><div>+#   length (ULEB128)           0x06           : number of bytes remaining</div><div>+#   DW_OP_breg7 0x40           0x77 0xc0 0x00 : read RSP, add 0x40, and push onto stack - note SLEB128 encoding of 0x40</div><div>+#                                               requires 2 bytes to avoid sign extension</div><div>+#   DW_OP_deref                0x06           : read from addr on top of stack</div><div>+#   DW_OP_plus_uconst 0x8      0x23 0x08      : pop top of stack, add 8, push back onto stack</div><div>+</div><div>+`".cfi_escape 0x0f,0x06,0x77,0xc0,0x00,0x06,0x23,0x08" if $cfi`</div><div>+</div><div> .Lprologue:</div><div> </div><div> <span style="white-space:pre-wrap">     </span>mov<span style="white-space:pre-wrap">     </span>0($ctx),$A</div><div>@@ -319,14 +340,22 @@ $code.=<<___;</div><div> <span style="white-space:pre-wrap">     </span>jnz<span style="white-space:pre-wrap">     </span>.Lloop</div><div> </div><div> <span style="white-space:pre-wrap">        </span>mov<span style="white-space:pre-wrap">     </span>`16*4`(%rsp),%rsi</div><div>+`".cfi_def_cfa rsi,8" if $cfi`</div><div> <span style="white-space:pre-wrap">      </span>mov<span style="white-space:pre-wrap">     </span>-40(%rsi),%r14</div><div>+`".cfi_restore r14" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-32(%rsi),%r13</div><div>+`".cfi_restore r13" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-24(%rsi),%r12</div><div>+`".cfi_restore r12" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-16(%rsi),%rbp</div><div>+`".cfi_restore rbp" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-8(%rsi),%rbx</div><div>+`".cfi_restore rbx" if $cfi`</div><div> <span style="white-space:pre-wrap">    </span>lea<span style="white-space:pre-wrap">     </span>(%rsi),%rsp</div><div>+`".cfi_def_cfa rsp,8" if $cfi`</div><div> .Lepilogue:</div><div> <span style="white-space:pre-wrap">        </span>ret</div><div>+`".cfi_endproc" if $cfi`</div><div> .size<span style="white-space:pre-wrap">     </span>sha1_block_data_order,.-sha1_block_data_order</div><div> ___</div><div> if ($shaext) {{{</div><div>@@ -342,6 +371,7 @@ $code.=<<___;</div><div> .align<span style="white-space:pre-wrap"> </span>32</div><div> sha1_block_data_order_shaext:</div><div> _shaext_shortcut:</div><div>+`".cfi_startproc" if $cfi`</div><div> ___</div><div> $code.=<<___ if ($win64);</div><div> <span style="white-space:pre-wrap">     </span>lea<span style="white-space:pre-wrap">     </span>`-8-4*16`(%rsp),%rsp</div><div>@@ -440,6 +470,7 @@ $code.=<<___ if ($win64);</div><div> ___</div><div> $code.=<<___;</div><div> <span style="white-space:pre-wrap"> </span>ret</div><div>+`".cfi_endproc" if $cfi`</div><div> .size<span style="white-space:pre-wrap">     </span>sha1_block_data_order_shaext,.-sha1_block_data_order_shaext</div><div> ___</div><div> }}}</div><div>@@ -473,12 +504,19 @@ $code.=<<___;</div><div> .align<span style="white-space:pre-wrap">      </span>16</div><div> sha1_block_data_order_ssse3:</div><div> _ssse3_shortcut:</div><div>+`".cfi_startproc" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>%rsp,%rax</div><div>+`".cfi_def_cfa_register rax" if $cfi`</div><div> <span style="white-space:pre-wrap">       </span>push<span style="white-space:pre-wrap">    </span>%rbx</div><div>+`".cfi_offset rbx,-16" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>push<span style="white-space:pre-wrap">    </span>%rbp</div><div>+`".cfi_offset rbp,-24" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>push<span style="white-space:pre-wrap">    </span>%r12</div><div>+`".cfi_offset r12,-32" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>push<span style="white-space:pre-wrap">    </span>%r13<span style="white-space:pre-wrap">            </span># redundant, done to share Win64 SE handler</div><div>+`".cfi_offset r13,-40" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>push<span style="white-space:pre-wrap">    </span>%r14</div><div>+`".cfi_offset r14,-48" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>lea<span style="white-space:pre-wrap">     </span>`-64-($win64?6*16:0)`(%rsp),%rsp</div><div> ___</div><div> $code.=<<___ if ($win64);</div><div>@@ -492,6 +530,7 @@ $code.=<<___ if ($win64);</div><div> ___</div><div> $code.=<<___;</div><div> <span style="white-space:pre-wrap"> </span>mov<span style="white-space:pre-wrap">     </span>%rax,%r14<span style="white-space:pre-wrap">       </span># original %rsp</div><div>+`".cfi_def_cfa_register r14" if $cfi`</div><div> <span style="white-space:pre-wrap"> </span>and<span style="white-space:pre-wrap">     </span>\$-64,%rsp</div><div> <span style="white-space:pre-wrap">     </span>mov<span style="white-space:pre-wrap">     </span>%rdi,$ctx<span style="white-space:pre-wrap">       </span># reassigned argument</div><div> <span style="white-space:pre-wrap">  </span>mov<span style="white-space:pre-wrap">     </span>%rsi,$inp<span style="white-space:pre-wrap">       </span># reassigned argument</div><div>@@ -907,14 +946,22 @@ $code.=<<___ if ($win64);</div><div> ___</div><div> $code.=<<___;</div><div> <span style="white-space:pre-wrap">      </span>lea<span style="white-space:pre-wrap">     </span>(%r14),%rsi</div><div>+`".cfi_def_cfa_register rsi" if $cfi`</div><div> <span style="white-space:pre-wrap">     </span>mov<span style="white-space:pre-wrap">     </span>-40(%rsi),%r14</div><div>+`".cfi_restore r14" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-32(%rsi),%r13</div><div>+`".cfi_restore r13" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-24(%rsi),%r12</div><div>+`".cfi_restore r12" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-16(%rsi),%rbp</div><div>+`".cfi_restore rbp" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-8(%rsi),%rbx</div><div>+`".cfi_restore rbx" if $cfi`</div><div> <span style="white-space:pre-wrap">    </span>lea<span style="white-space:pre-wrap">     </span>(%rsi),%rsp</div><div>+`".cfi_def_cfa_register rsp" if $cfi`</div><div> .Lepilogue_ssse3:</div><div> <span style="white-space:pre-wrap">   </span>ret</div><div>+`".cfi_endproc" if $cfi`</div><div> .size<span style="white-space:pre-wrap">     </span>sha1_block_data_order_ssse3,.-sha1_block_data_order_ssse3</div><div> ___</div><div> </div><div>@@ -935,12 +982,19 @@ $code.=<<___;</div><div> .align<span style="white-space:pre-wrap">   </span>16</div><div> sha1_block_data_order_avx:</div><div> _avx_shortcut:</div><div>+`".cfi_startproc" if $cfi`</div><div> <span style="white-space:pre-wrap">       </span>mov<span style="white-space:pre-wrap">     </span>%rsp,%rax</div><div>+`".cfi_def_cfa_register rax" if $cfi`</div><div> <span style="white-space:pre-wrap">       </span>push<span style="white-space:pre-wrap">    </span>%rbx</div><div>+`".cfi_offset rbx,-16" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>push<span style="white-space:pre-wrap">    </span>%rbp</div><div>+`".cfi_offset rbp,-24" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>push<span style="white-space:pre-wrap">    </span>%r12</div><div>+`".cfi_offset r12,-32" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>push<span style="white-space:pre-wrap">    </span>%r13<span style="white-space:pre-wrap">            </span># redundant, done to share Win64 SE handler</div><div>+`".cfi_offset r13,-40" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>push<span style="white-space:pre-wrap">    </span>%r14</div><div>+`".cfi_offset r14,-48" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>lea<span style="white-space:pre-wrap">     </span>`-64-($win64?6*16:0)`(%rsp),%rsp</div><div> <span style="white-space:pre-wrap">       </span>vzeroupper</div><div> ___</div><div>@@ -955,6 +1009,7 @@ $code.=<<___ if ($win64);</div><div> ___</div><div> $code.=<<___;</div><div> <span style="white-space:pre-wrap">      </span>mov<span style="white-space:pre-wrap">     </span>%rax,%r14<span style="white-space:pre-wrap">       </span># original %rsp</div><div>+`".cfi_def_cfa_register r14" if $cfi`</div><div> <span style="white-space:pre-wrap"> </span>and<span style="white-space:pre-wrap">     </span>\$-64,%rsp</div><div> <span style="white-space:pre-wrap">     </span>mov<span style="white-space:pre-wrap">     </span>%rdi,$ctx<span style="white-space:pre-wrap">       </span># reassigned argument</div><div> <span style="white-space:pre-wrap">  </span>mov<span style="white-space:pre-wrap">     </span>%rsi,$inp<span style="white-space:pre-wrap">       </span># reassigned argument</div><div>@@ -1271,14 +1326,22 @@ $code.=<<___ if ($win64);</div><div> ___</div><div> $code.=<<___;</div><div> <span style="white-space:pre-wrap">    </span>lea<span style="white-space:pre-wrap">     </span>(%r14),%rsi</div><div>+`".cfi_def_cfa_register rsi" if $cfi`</div><div> <span style="white-space:pre-wrap">     </span>mov<span style="white-space:pre-wrap">     </span>-40(%rsi),%r14</div><div>+`".cfi_restore r14" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-32(%rsi),%r13</div><div>+`".cfi_restore r13" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-24(%rsi),%r12</div><div>+`".cfi_restore r12" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-16(%rsi),%rbp</div><div>+`".cfi_restore rbp" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-8(%rsi),%rbx</div><div>+`".cfi_restore rbx" if $cfi`</div><div> <span style="white-space:pre-wrap">    </span>lea<span style="white-space:pre-wrap">     </span>(%rsi),%rsp</div><div>+`".cfi_def_cfa_register rsp" if $cfi`</div><div> .Lepilogue_avx:</div><div> <span style="white-space:pre-wrap">     </span>ret</div><div>+`".cfi_endproc" if $cfi`</div><div> .size<span style="white-space:pre-wrap">     </span>sha1_block_data_order_avx,.-sha1_block_data_order_avx</div><div> ___</div><div> </div><div>@@ -1302,12 +1365,19 @@ $code.=<<___;</div><div> .align<span style="white-space:pre-wrap">     </span>16</div><div> sha1_block_data_order_avx2:</div><div> _avx2_shortcut:</div><div>+`".cfi_startproc" if $cfi`</div><div> <span style="white-space:pre-wrap">     </span>mov<span style="white-space:pre-wrap">     </span>%rsp,%rax</div><div>+`".cfi_def_cfa_register rax" if $cfi`</div><div> <span style="white-space:pre-wrap">       </span>push<span style="white-space:pre-wrap">    </span>%rbx</div><div>+`".cfi_offset rbx,-16" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>push<span style="white-space:pre-wrap">    </span>%rbp</div><div>+`".cfi_offset rbp,-24" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>push<span style="white-space:pre-wrap">    </span>%r12</div><div>+`".cfi_offset r12,-32" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>push<span style="white-space:pre-wrap">    </span>%r13</div><div>+`".cfi_offset r13,-40" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>push<span style="white-space:pre-wrap">    </span>%r14</div><div>+`".cfi_offset r14,-48" if $cfi`</div><div> <span style="white-space:pre-wrap">  </span>vzeroupper</div><div> ___</div><div> $code.=<<___ if ($win64);</div><div>@@ -1322,6 +1392,7 @@ $code.=<<___ if ($win64);</div><div> ___</div><div> $code.=<<___;</div><div> <span style="white-space:pre-wrap">     </span>mov<span style="white-space:pre-wrap">     </span>%rax,%r14<span style="white-space:pre-wrap">               </span># original %rsp</div><div>+`".cfi_def_cfa_register r14" if $cfi`</div><div> <span style="white-space:pre-wrap"> </span>mov<span style="white-space:pre-wrap">     </span>%rdi,$ctx<span style="white-space:pre-wrap">               </span># reassigned argument</div><div> <span style="white-space:pre-wrap">  </span>mov<span style="white-space:pre-wrap">     </span>%rsi,$inp<span style="white-space:pre-wrap">               </span># reassigned argument</div><div> <span style="white-space:pre-wrap">  </span>mov<span style="white-space:pre-wrap">     </span>%rdx,$num<span style="white-space:pre-wrap">               </span># reassigned argument</div><div>@@ -1750,14 +1821,22 @@ $code.=<<___ if ($win64);</div><div> ___</div><div> $code.=<<___;</div><div> <span style="white-space:pre-wrap">    </span>lea<span style="white-space:pre-wrap">     </span>(%r14),%rsi</div><div>+`".cfi_def_cfa_register rsi" if $cfi`</div><div> <span style="white-space:pre-wrap">     </span>mov<span style="white-space:pre-wrap">     </span>-40(%rsi),%r14</div><div>+`".cfi_restore r14" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-32(%rsi),%r13</div><div>+`".cfi_restore r13" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-24(%rsi),%r12</div><div>+`".cfi_restore r12" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-16(%rsi),%rbp</div><div>+`".cfi_restore rbp" if $cfi`</div><div> <span style="white-space:pre-wrap">   </span>mov<span style="white-space:pre-wrap">     </span>-8(%rsi),%rbx</div><div>+`".cfi_restore rbx" if $cfi`</div><div> <span style="white-space:pre-wrap">    </span>lea<span style="white-space:pre-wrap">     </span>(%rsi),%rsp</div><div>+`".cfi_def_cfa_register rsp" if $cfi`</div><div> .Lepilogue_avx2:</div><div> <span style="white-space:pre-wrap">    </span>ret</div><div>+`".cfi_endproc" if $cfi`</div><div> .size<span style="white-space:pre-wrap">     </span>sha1_block_data_order_avx2,.-sha1_block_data_order_avx2</div><div> ___</div><div> }</div><div><br></div></div>