<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Silicon Chip Cookies]]></title><description><![CDATA[Posts about computers.]]></description><link>https://www.siliconchipcookies.com</link><image><url>https://substackcdn.com/image/fetch/$s_!dDVW!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d1172b-c6b0-4c0d-b39f-0ef1508c50ca_608x608.png</url><title>Silicon Chip Cookies</title><link>https://www.siliconchipcookies.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 11 May 2026 19:23:45 GMT</lastBuildDate><atom:link href="https://www.siliconchipcookies.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Andrew Furey]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[andrewfurey21@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[andrewfurey21@substack.com]]></itunes:email><itunes:name><![CDATA[Andrew Furey]]></itunes:name></itunes:owner><itunes:author><![CDATA[Andrew Furey]]></itunes:author><googleplay:owner><![CDATA[andrewfurey21@substack.com]]></googleplay:owner><googleplay:email><![CDATA[andrewfurey21@substack.com]]></googleplay:email><googleplay:author><![CDATA[Andrew Furey]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Sections vs Segments in the ELF Format]]></title><description><![CDATA[I watched a really great video about writing an ELF file from scratch, byte by byte, to understand the structure of the file format.]]></description><link>https://www.siliconchipcookies.com/p/sections-vs-segments-in-the-elf-format</link><guid isPermaLink="false">https://www.siliconchipcookies.com/p/sections-vs-segments-in-the-elf-format</guid><dc:creator><![CDATA[Andrew Furey]]></dc:creator><pubDate>Thu, 26 Mar 2026 23:58:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!sNgZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5f7cf6-07da-44b1-949c-7b09ae5cef5f_3420x1624.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sNgZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5f7cf6-07da-44b1-949c-7b09ae5cef5f_3420x1624.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sNgZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5f7cf6-07da-44b1-949c-7b09ae5cef5f_3420x1624.png 424w, https://substackcdn.com/image/fetch/$s_!sNgZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5f7cf6-07da-44b1-949c-7b09ae5cef5f_3420x1624.png 848w, https://substackcdn.com/image/fetch/$s_!sNgZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5f7cf6-07da-44b1-949c-7b09ae5cef5f_3420x1624.png 1272w, https://substackcdn.com/image/fetch/$s_!sNgZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5f7cf6-07da-44b1-949c-7b09ae5cef5f_3420x1624.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sNgZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5f7cf6-07da-44b1-949c-7b09ae5cef5f_3420x1624.png" width="1456" height="691" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/de5f7cf6-07da-44b1-949c-7b09ae5cef5f_3420x1624.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:691,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3961397,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.siliconchipcookies.com/i/192117485?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5f7cf6-07da-44b1-949c-7b09ae5cef5f_3420x1624.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sNgZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5f7cf6-07da-44b1-949c-7b09ae5cef5f_3420x1624.png 424w, https://substackcdn.com/image/fetch/$s_!sNgZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5f7cf6-07da-44b1-949c-7b09ae5cef5f_3420x1624.png 848w, https://substackcdn.com/image/fetch/$s_!sNgZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5f7cf6-07da-44b1-949c-7b09ae5cef5f_3420x1624.png 1272w, https://substackcdn.com/image/fetch/$s_!sNgZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde5f7cf6-07da-44b1-949c-7b09ae5cef5f_3420x1624.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>I watched a really great video about writing an <a href="https://www.youtube.com/watch?v=JM9jX2aqkog">ELF file from scratch</a>, byte by byte, to understand the structure of the file format. It kind of left me wanting a bit more, since I still wasn&#8217;t sure about some concepts like the difference between sections and segments. I mean, they&#8217;re synonyms! Why did they choose that!? Maybe runtime information vs linktime information? Anyways&#8230;</p><p>An ELF file is made up of a few parts. There is an ELF header, ELF Program headers, ELF Section headers, then just binary data. The ELF header comes first. It doesn&#8217;t matter what comes next, since you specify where program and section header tables are in the ELF header. By table, I just mean a sequence of structures, each of which are the same size. A program header table is just a sequence of <code>struct Elf64_Phdr</code>, for example.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.siliconchipcookies.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Silicon Chip Cookies! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p>ELF Segments are a runtime notion. They are portions of the ELF file that get mapped from the ELF file to memory when you start up the program. An ELF Segment is simply a contiguous block of the ELF file described by one ELF Program header. The ELF Program headers are used by the kernel to map the portion of the file that the program headers reference to memory at runtime.</p><p>An ELF Section is contiguous block of the ELF file that is described by an ELF Section header, used for linkers, debuggers etc. The memory ELF Sections reference in the ELF file can also be part of segments. ELF Sections headers have nothing to do with runtime, in fact, you can strip all the section header information with:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;fc9a42aa-e0a3-4eec-a2dd-a7af569fae85&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">strip &#8212;strip-section-headers ./executable</code></pre></div><p>If you run that stripped executable, it will run perfectly. </p><p>An ELF Section can be code, data, or information about the code, like symbols used by linkers. <a href="https://github.com/andrewfurey21/elf-from-scratch/blob/master/reloc.c">Here is an example of writing an object file, byte by byte.</a> It includes the ELF header, section headers, a symbol table, code, a section header string table and a string table. All of that allows gcc to link with a main.c that calls the function described in that object file.</p><p>I only really understood the format once I wrote it out myself. I really recommend watching the <a href="https://www.youtube.com/watch?v=JM9jX2aqkog">elf by hand video</a>, then writing an object file in a similar fashion and get gcc to link with it. Doing it in C is really handy, since you can include <code>elf.h</code> and that header comes with all the structs you need. If you don&#8217;t know any assembly, I included print function at <a href="https://github.com/andrewfurey21/elf-from-scratch/blob/master/print_string.s">print_string.s</a> and you can use the following to get the binary form:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;b59c05f5-4b72-442e-92cb-7a15e0d4f322&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">as print_string.s -c -o print_string.o
objdump -S print_string.o</code></pre></div><p>This is an example object file layout:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0ege!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5a023a9-09b2-4160-bb7a-b6a75ce54956_2298x1570.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0ege!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5a023a9-09b2-4160-bb7a-b6a75ce54956_2298x1570.png 424w, https://substackcdn.com/image/fetch/$s_!0ege!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5a023a9-09b2-4160-bb7a-b6a75ce54956_2298x1570.png 848w, https://substackcdn.com/image/fetch/$s_!0ege!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5a023a9-09b2-4160-bb7a-b6a75ce54956_2298x1570.png 1272w, https://substackcdn.com/image/fetch/$s_!0ege!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5a023a9-09b2-4160-bb7a-b6a75ce54956_2298x1570.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0ege!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5a023a9-09b2-4160-bb7a-b6a75ce54956_2298x1570.png" width="1456" height="995" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5a023a9-09b2-4160-bb7a-b6a75ce54956_2298x1570.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:995,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:547130,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.siliconchipcookies.com/i/192117485?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5a023a9-09b2-4160-bb7a-b6a75ce54956_2298x1570.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0ege!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5a023a9-09b2-4160-bb7a-b6a75ce54956_2298x1570.png 424w, https://substackcdn.com/image/fetch/$s_!0ege!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5a023a9-09b2-4160-bb7a-b6a75ce54956_2298x1570.png 848w, https://substackcdn.com/image/fetch/$s_!0ege!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5a023a9-09b2-4160-bb7a-b6a75ce54956_2298x1570.png 1272w, https://substackcdn.com/image/fetch/$s_!0ege!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5a023a9-09b2-4160-bb7a-b6a75ce54956_2298x1570.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Note that you can&#8217;t run this file, because the OS has no idea how to map code and data to memory when the program starts.</p><p>Try writing it by hand using <code>elf.h</code>! It can be annoying to get all the sections right if you do do it by hand, so good luck.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.siliconchipcookies.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Silicon Chip Cookies! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[How I used Address Sanitizer to solve a memory bug]]></title><description><![CDATA[or why I should spend a lot more time learning how to debug]]></description><link>https://www.siliconchipcookies.com/p/how-i-used-address-sanitizer-to-solve</link><guid isPermaLink="false">https://www.siliconchipcookies.com/p/how-i-used-address-sanitizer-to-solve</guid><dc:creator><![CDATA[Andrew Furey]]></dc:creator><pubDate>Mon, 02 Jun 2025 10:12:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!s-P3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981f2602-5a9c-441f-b023-9e91c93284d5_1161x566.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;ve been contributing to an open source machine learning library called <a href="https://github.com/mlpack/mlpack">mlpack</a> recently. I had this (in retrospect stupid) memory-related bug where the program was crashing at the end. When this happens, I default to using Valgrind because in the past it&#8217;s saved me so much time combing through hundreds of lines of code if not thousands. So I tried using Valgrind and Memcheck, which is a dynamic binary instrumentation tool that allows you to catch these kinds of memory issues.</p><p>Programs like Valgrind often use a technique called shadow memory that &#8220;shadows&#8221; every byte of memory. It gives us some useful metadata about the byte that we are accessing, like whether or not it has been allocated. During execution these tools will update the shadow memory when a memory read, write, free or allocation occurs.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.siliconchipcookies.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Silicon Chip Cookies! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>A lot of tools use this technique for catching different issues like data races and security related issues. The problem with Valgrind is that it&#8217;s painfully slow from all the virtualization (~20x slowdown). From reading up on Valgrind, it takes in your binary, converts it into an internal representation and runs it with a bunch of instrumentation to analyze memory accesses among other things. Valgrinds main benefit is that you don&#8217;t need the source code, just the binary. It&#8217;s also very thorough. From what I&#8217;ve read it can also do some profiling. </p><p>I have the source code, and due to working on machine learning workloads with big tensors doing matrix multiplication and convolutions, I need speed. Instead of using Valgrind, I learned how to use address sanitizer instead. Address Sanitizer (ASan) is compile-time instrumentation, and is a lot more lightweight. You need the source code and it needs to work in your compiler. GCC supports ASan. It only has a ~2x slowdown. After compiling with <code>-fsanitize=address </code>I quickly got useful information.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s-P3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981f2602-5a9c-441f-b023-9e91c93284d5_1161x566.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s-P3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981f2602-5a9c-441f-b023-9e91c93284d5_1161x566.png 424w, https://substackcdn.com/image/fetch/$s_!s-P3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981f2602-5a9c-441f-b023-9e91c93284d5_1161x566.png 848w, https://substackcdn.com/image/fetch/$s_!s-P3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981f2602-5a9c-441f-b023-9e91c93284d5_1161x566.png 1272w, https://substackcdn.com/image/fetch/$s_!s-P3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981f2602-5a9c-441f-b023-9e91c93284d5_1161x566.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s-P3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981f2602-5a9c-441f-b023-9e91c93284d5_1161x566.png" width="1161" height="566" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/981f2602-5a9c-441f-b023-9e91c93284d5_1161x566.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:566,&quot;width&quot;:1161,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:109318,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.siliconchipcookies.com/i/164925529?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981f2602-5a9c-441f-b023-9e91c93284d5_1161x566.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s-P3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981f2602-5a9c-441f-b023-9e91c93284d5_1161x566.png 424w, https://substackcdn.com/image/fetch/$s_!s-P3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981f2602-5a9c-441f-b023-9e91c93284d5_1161x566.png 848w, https://substackcdn.com/image/fetch/$s_!s-P3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981f2602-5a9c-441f-b023-9e91c93284d5_1161x566.png 1272w, https://substackcdn.com/image/fetch/$s_!s-P3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981f2602-5a9c-441f-b023-9e91c93284d5_1161x566.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>From this I assumed I was allocating to little memory for what I needed.</p><pre><code>ERROR: AddressSanitizer: heap-buffer-overflow ...
... WRITE of size 692224 ...
... mlpack/methods/ann/dag_network_impl.hpp:593
... allocated by thread T0 here:
... mlpack/methods/ann/dag_network_impl.hpp:509</code></pre><p>I think this is telling me at line <code>593</code> I&#8217;m trying to write beyond some memory allocation that I allocated at line <code>509</code>.</p><pre><code>509: layerOutputMatrix = MatType(1, batchSize * forwardMemSize);</code></pre><p>This gives me the hint that <code>forwardMemSize</code> is probably the wrong. It turns out when I&#8217;m computing the size of a concatenation operation, I was computing it like so:</p><pre><code>size_t concatSize = layers[i]-&gt;InputDimensions()[0]
for (size_t j = 1; j &lt; layers.size(); j++)
   concatSize += layers[i]-&gt;InputDimensions()[j];</code></pre><p>when it should clearly be done by multiplication. In retrospect this is a very stupid bug that took way to long to find, but after learning about how address sanitizer works and how to use it saved me a lot of time. The lesson here is to get better a using debugging tools.</p><div><hr></div><h2>Useful links</h2><ul><li><p><a href="https://github.com/google/sanitizers/wiki/AddressSanitizer">Useful wiki on ASan</a></p></li><li><p><a href="https://github.com/google/sanitizers/wiki/AddressSanitizerComparisonOfMemoryTools">Table comparing ASan to tools like Valgrind</a></p></li><li><p><a href="https://lemire.me/blog/2019/05/16/building-better-software-with-better-tools-sanitizers-versus-valgrind/">Blog on why you should use ASan</a></p></li></ul><p></p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.siliconchipcookies.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Silicon Chip Cookies! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Gshare Branch Predictor]]></title><description><![CDATA[Programs tend to be predictable and follow patterns.]]></description><link>https://www.siliconchipcookies.com/p/gshare-branch-predictor</link><guid isPermaLink="false">https://www.siliconchipcookies.com/p/gshare-branch-predictor</guid><dc:creator><![CDATA[Andrew Furey]]></dc:creator><pubDate>Thu, 24 Apr 2025 10:51:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!g5UQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe232382a-fcba-4a17-aa57-719c7fff6a2a_1693x673.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Programs tend to be predictable and follow patterns. We can use that to be able to predict whether or not a branch will be taken, and increase instruction throughput. This post discusses how the <a href="https://american.cs.ucdavis.edu/academic/readings/papers/mcfarling.pdf">Gshare branch predictor</a> works.</p><p>Gshare uses two structures to make predictions. A pattern history table and a global history register.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.siliconchipcookies.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Silicon Chip Cookies! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Pattern History Table</h2><p>A pattern history table (PHT) is an array of 2-bit saturating counters. <a href="https://en.wikipedia.org/wiki/Saturation_arithmetic">Saturating arithmetic</a> is when the output of any operation has a fixed minimum and maximum value. With a 2-bit saturating counter, this means that there is no overflow. If the value of the counter is <code>11 </code>and we increase the value, it is still <code>11</code>. We use this saturating arithmetic to represent a finite state machine that represents four states: strongly taken, weakly taken, weakly not-taken and strongly not-taken.</p><p>Here is the finite state machine of a 2-bit saturating counter:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g5UQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe232382a-fcba-4a17-aa57-719c7fff6a2a_1693x673.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g5UQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe232382a-fcba-4a17-aa57-719c7fff6a2a_1693x673.png 424w, https://substackcdn.com/image/fetch/$s_!g5UQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe232382a-fcba-4a17-aa57-719c7fff6a2a_1693x673.png 848w, https://substackcdn.com/image/fetch/$s_!g5UQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe232382a-fcba-4a17-aa57-719c7fff6a2a_1693x673.png 1272w, https://substackcdn.com/image/fetch/$s_!g5UQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe232382a-fcba-4a17-aa57-719c7fff6a2a_1693x673.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g5UQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe232382a-fcba-4a17-aa57-719c7fff6a2a_1693x673.png" width="728" height="289.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e232382a-fcba-4a17-aa57-719c7fff6a2a_1693x673.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:579,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:128657,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.siliconchipcookies.com/i/162031059?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe232382a-fcba-4a17-aa57-719c7fff6a2a_1693x673.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g5UQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe232382a-fcba-4a17-aa57-719c7fff6a2a_1693x673.png 424w, https://substackcdn.com/image/fetch/$s_!g5UQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe232382a-fcba-4a17-aa57-719c7fff6a2a_1693x673.png 848w, https://substackcdn.com/image/fetch/$s_!g5UQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe232382a-fcba-4a17-aa57-719c7fff6a2a_1693x673.png 1272w, https://substackcdn.com/image/fetch/$s_!g5UQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe232382a-fcba-4a17-aa57-719c7fff6a2a_1693x673.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every time the output of a branch instruction is evaluated, the PHT gets updated. If it&#8217;s current state is weakly taken, and that branch was evaluated and was actually taken, we update the counter to be in the strongly taken state.</p><p>This structure keeps track of a branches local history. If it&#8217;s state is strongly not-taken, then it is very likely that that branch will not be taken. We can use this structure to make predictions on a given branch. We could index into this array of counters with the program counter (PC) and make a prediction based on this local history. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9lCn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf110947-c4c3-4b3f-be50-33eb8b95b7aa_1551x1030.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9lCn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf110947-c4c3-4b3f-be50-33eb8b95b7aa_1551x1030.png 424w, https://substackcdn.com/image/fetch/$s_!9lCn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf110947-c4c3-4b3f-be50-33eb8b95b7aa_1551x1030.png 848w, https://substackcdn.com/image/fetch/$s_!9lCn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf110947-c4c3-4b3f-be50-33eb8b95b7aa_1551x1030.png 1272w, https://substackcdn.com/image/fetch/$s_!9lCn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf110947-c4c3-4b3f-be50-33eb8b95b7aa_1551x1030.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9lCn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf110947-c4c3-4b3f-be50-33eb8b95b7aa_1551x1030.png" width="1456" height="967" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/af110947-c4c3-4b3f-be50-33eb8b95b7aa_1551x1030.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:967,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:136463,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.siliconchipcookies.com/i/162031059?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf110947-c4c3-4b3f-be50-33eb8b95b7aa_1551x1030.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9lCn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf110947-c4c3-4b3f-be50-33eb8b95b7aa_1551x1030.png 424w, https://substackcdn.com/image/fetch/$s_!9lCn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf110947-c4c3-4b3f-be50-33eb8b95b7aa_1551x1030.png 848w, https://substackcdn.com/image/fetch/$s_!9lCn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf110947-c4c3-4b3f-be50-33eb8b95b7aa_1551x1030.png 1272w, https://substackcdn.com/image/fetch/$s_!9lCn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf110947-c4c3-4b3f-be50-33eb8b95b7aa_1551x1030.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Global History Register</h2><p>It has been shown that taking global history and local history into account can get you better predictions. Global branch history is a typically sequence of 1&#8217;s and 0&#8217;s, where each bit represents whether the ith last branch was taken. Instead of looking at what the current branch has done so far, like a PHT, global history looks at what the last N branches has done. </p><p>We can represent this as a shift register. Every time a branch executes, we shift the bits to the left, and add the branch outcome to the end.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZhAp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd611efa-a4cf-4af7-8888-64952601f069_1411x781.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZhAp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd611efa-a4cf-4af7-8888-64952601f069_1411x781.png 424w, https://substackcdn.com/image/fetch/$s_!ZhAp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd611efa-a4cf-4af7-8888-64952601f069_1411x781.png 848w, https://substackcdn.com/image/fetch/$s_!ZhAp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd611efa-a4cf-4af7-8888-64952601f069_1411x781.png 1272w, https://substackcdn.com/image/fetch/$s_!ZhAp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd611efa-a4cf-4af7-8888-64952601f069_1411x781.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZhAp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd611efa-a4cf-4af7-8888-64952601f069_1411x781.png" width="1411" height="781" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bd611efa-a4cf-4af7-8888-64952601f069_1411x781.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:781,&quot;width&quot;:1411,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:40021,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.siliconchipcookies.com/i/162031059?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd611efa-a4cf-4af7-8888-64952601f069_1411x781.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZhAp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd611efa-a4cf-4af7-8888-64952601f069_1411x781.png 424w, https://substackcdn.com/image/fetch/$s_!ZhAp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd611efa-a4cf-4af7-8888-64952601f069_1411x781.png 848w, https://substackcdn.com/image/fetch/$s_!ZhAp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd611efa-a4cf-4af7-8888-64952601f069_1411x781.png 1272w, https://substackcdn.com/image/fetch/$s_!ZhAp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd611efa-a4cf-4af7-8888-64952601f069_1411x781.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Instead of indexing with program counter, gshare XORs the global history with the program counter, and uses this output to index into the PHT.</p><p>Here is the full diagram:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vW3_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F483d3523-cb45-4922-8bc3-27b784339d5c_1742x1137.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vW3_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F483d3523-cb45-4922-8bc3-27b784339d5c_1742x1137.png 424w, https://substackcdn.com/image/fetch/$s_!vW3_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F483d3523-cb45-4922-8bc3-27b784339d5c_1742x1137.png 848w, https://substackcdn.com/image/fetch/$s_!vW3_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F483d3523-cb45-4922-8bc3-27b784339d5c_1742x1137.png 1272w, https://substackcdn.com/image/fetch/$s_!vW3_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F483d3523-cb45-4922-8bc3-27b784339d5c_1742x1137.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vW3_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F483d3523-cb45-4922-8bc3-27b784339d5c_1742x1137.png" width="1456" height="950" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/483d3523-cb45-4922-8bc3-27b784339d5c_1742x1137.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:950,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:184125,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.siliconchipcookies.com/i/162031059?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F483d3523-cb45-4922-8bc3-27b784339d5c_1742x1137.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vW3_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F483d3523-cb45-4922-8bc3-27b784339d5c_1742x1137.png 424w, https://substackcdn.com/image/fetch/$s_!vW3_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F483d3523-cb45-4922-8bc3-27b784339d5c_1742x1137.png 848w, https://substackcdn.com/image/fetch/$s_!vW3_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F483d3523-cb45-4922-8bc3-27b784339d5c_1742x1137.png 1272w, https://substackcdn.com/image/fetch/$s_!vW3_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F483d3523-cb45-4922-8bc3-27b784339d5c_1742x1137.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.siliconchipcookies.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Silicon Chip Cookies! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Depthwise Separable Convolutions]]></title><description><![CDATA[what do they do?]]></description><link>https://www.siliconchipcookies.com/p/depthwise-separable-convolutions</link><guid isPermaLink="false">https://www.siliconchipcookies.com/p/depthwise-separable-convolutions</guid><dc:creator><![CDATA[Andrew Furey]]></dc:creator><pubDate>Sun, 26 Jan 2025 11:01:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!qJWa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ef521e9-54c5-4271-b192-807b37ad1dde_1406x657.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qJWa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ef521e9-54c5-4271-b192-807b37ad1dde_1406x657.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qJWa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ef521e9-54c5-4271-b192-807b37ad1dde_1406x657.png 424w, https://substackcdn.com/image/fetch/$s_!qJWa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ef521e9-54c5-4271-b192-807b37ad1dde_1406x657.png 848w, https://substackcdn.com/image/fetch/$s_!qJWa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ef521e9-54c5-4271-b192-807b37ad1dde_1406x657.png 1272w, https://substackcdn.com/image/fetch/$s_!qJWa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ef521e9-54c5-4271-b192-807b37ad1dde_1406x657.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qJWa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ef521e9-54c5-4271-b192-807b37ad1dde_1406x657.png" width="1406" height="657" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ef521e9-54c5-4271-b192-807b37ad1dde_1406x657.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:657,&quot;width&quot;:1406,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53849,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qJWa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ef521e9-54c5-4271-b192-807b37ad1dde_1406x657.png 424w, https://substackcdn.com/image/fetch/$s_!qJWa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ef521e9-54c5-4271-b192-807b37ad1dde_1406x657.png 848w, https://substackcdn.com/image/fetch/$s_!qJWa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ef521e9-54c5-4271-b192-807b37ad1dde_1406x657.png 1272w, https://substackcdn.com/image/fetch/$s_!qJWa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ef521e9-54c5-4271-b192-807b37ad1dde_1406x657.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Regular Convolution</h4><p>A normal convolution is like a linear layer inside a neural network, except that it works on patches of the input instead of the taking the entire image into account, like a fully connected layer. The weights in a convolutional layer is made up of kernels. At the start of training these kernels are made up of random numbers, but after training allow the network to identify patterns that make it useful for image classification or object detection. </p><p>There are m kernels during a convolution. where m is the number of output channels. Each kernel has a  width and height (like 3x3 or 5x5) and a number of channels, which is the same number of input channels. For example, with a 224x224 image with 3 channels (rgb), we could &#8216;convolve&#8217; the image with 8 3x3 kernels, outputting a 222x222x8 feature map (in this case there is no padding, no dilation and the stride is just 1). A convolution multplies each number in it&#8217;s kernel by it&#8217;s corresponding place in the image, summing those multplications up, and then shifting the kernel over by one (or whatever the stride is), and repeating for the entire input.</p><p>The most common convolution you&#8217;ll see is the 3x3 convolution (VGG, Mobilenet, YOLO as examples). It reduces the amount of multiplications needed, with the same number of input/output channels. One 5x5 convolution will take in the same amount of the image as two 3x3 kernels done one after the other, but doing the two 3x3 convoltuions instead results in less multiplications. For example, in a 224x224x3 image, doing a 5x5 convolution with 16 output channels results in 220*220*3*5*5*16 = 580,800,000 multiplications, while two 3x3 convolutions with the same number of output channels results in (222*222*3*3*3*16)+(220*220*3*3*3*16) = 42,199,488 multiplications. The second layer using 3x3 convolutions however will effectively be able to see a 5x5 part of the oringal feature map.</p><p>I&#8217;ll use 3x3 convolutions as the example for the rest of the post, with no padding/dilation and the strides will be set to 1.</p><h4>Groups</h4><p>Groups are another argument on how to describe a convolution operation. You&#8217;ll see it in the <a href="https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html">PyTorch documentation for Conv2d</a>. Having n groups means splitting the input <strong>channels</strong> into n pieces, and using n groups of kernels, where each group of kernels will only process its corresponding input channel group.</p><h4>Depthwise Separable Convolution</h4><p>It turns out we can reduce the amount of computation required in a convolution using depthwise separable convolutions, which takes groups to the extreme shown in the <a href="https://arxiv.org/pdf/1704.04861">MobileNet paper.</a></p><p>I&#8217;ll use a regular convolution to illustrate why this reduces computational cost. If we have an input feature map of shape 28x28x8, and a convolutional layer of 3x3 convolutions with 16 output channels, we have to do 26*26*3*3*8*16 = 778,752 multiplications. We will get a 26x26x16 feature map as the output.</p><p>There are two steps to a depthwise separable convolution. First, we perform a depthwise convolution, and then a pointwise convolution, which will get us the same complexity in our neural net but reduce the number of calculations needed.</p><p>We seperate the incoming feature map into n groups, where n is the number of input channels. This means that each group will only have one channel. We can apply one 3x3 convolution to each group. This is the depthwise convolution. If we perform this for the same example, we will output a 26x26x8 feature map. </p><p>Next we will use 16 1x1 convolution kernels, which will be our pointwise convolution. This will output a 26x26x16 feature map, with exactly the same shape as doing a regular 3x3 convolution, each of the 16 kernels having 8 channels.</p><p>How does this improve performance? The depthwise convolution uses 26*26*3*3*8 = 48672 multiplications. The pointwise layer uses 26*26*8*16 = 86528 multiplications. In total 135200 multiplications, which is an 82% reduction in mulplications. The <a href="https://arxiv.org/pdf/1704.04861">MobileNet paper</a> shows that the reduction will be </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{1}{N} + \\frac{1}{D_K^2}&quot;,&quot;id&quot;:&quot;TUXJKJHVNC&quot;}" data-component-name="LatexBlockToDOM"></div><p>where N is the number of kernels and D_K is the size of the kernels. Since most layers use 3x3 kernels, as you increase the number of output channels you approach 89% reduction in multiplications.</p><h4>Conclusion</h4><p>Using depthwise separable 3x3 convolutions massively reduces the amount of computation needed to perform these operations, while maintaining the same complexity in the neural network.</p><p></p>]]></content:encoded></item><item><title><![CDATA[Implementing CRC32 in Python]]></title><description><![CDATA[In this post I&#8217;ll explain how cyclic redundancy checks (CRC) works, and implement a basic version of CRC32 in python.]]></description><link>https://www.siliconchipcookies.com/p/implementing-crc32-in-python</link><guid isPermaLink="false">https://www.siliconchipcookies.com/p/implementing-crc32-in-python</guid><dc:creator><![CDATA[Andrew Furey]]></dc:creator><pubDate>Thu, 15 Aug 2024 15:40:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!64cB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4857e38-7b73-4cdc-b042-10076ad679dd_904x661.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this post I&#8217;ll explain how cyclic redundancy checks (CRC) works, and implement a basic version of CRC32 in python. In my opinion, the best way to go about learning something is by doing it, so I recommend you to implement this yourself. If you want to do some CRC calculations, here is a <a href="https://crccalc.com/">good calculator</a>.</p><h2>Why?</h2><p>Cyclic redundancy checks are numbers you append to moving data (either on a network, or even transferring from storage to memory) that allows you to confirm that the data has no errors. Moving data around without some sort of error detection would be pretty useless.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.siliconchipcookies.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Silicon Chip Cookies! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>How?</h2><p>The maths involved is interesting but because I&#8217;m not an expert on finite fields I&#8217;m not going to delve too deeply into things. You can find <a href="https://arxiv.org/pdf/2408.07499">decent books on the theory behind Galois fields</a> and proofs on how well CRC works on the internet. The main principle of cyclic redundancy checks is to represent that data as a polynomial in GF(2). This means that the data will be a polynomial, where the coefficients are either 1 or 0. Since data is represented in binary in computers, this is perfect. </p><p>Here is an example: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;11010 \\Rightarrow  1x^{4} + 1x^{3} + 0x^{2} + 1x^{1} + 0x^{0} \\Rightarrow 1x^{4} + 1x^{3}  +1x^{1}&quot;,&quot;id&quot;:&quot;CNDGLPDLRG&quot;}" data-component-name="LatexBlockToDOM"></div><p>We then decide on a generator polynomial. We&#8217;ll use this to calculate the number that we append at the end of the message. An example polynomial might be:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;111 \\Rightarrow 1x^2 + 1x^1 + 1x^{0} \\Rightarrow 1x^{2} + 1x^{1} + 1&quot;,&quot;id&quot;:&quot;NOLMBZEREX&quot;}" data-component-name="LatexBlockToDOM"></div><p>We will use this polynomial to divide into our data polynomial. This is the exact same long division algorithm you would have learned in school for dividing polynomial, and finding their remainder. We&#8217;re using <a href="https://en.wikipedia.org/wiki/GF(2)">GF(2)</a> arithmetic. Subtraction and addition are equivalent and are the same as an XOR operation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!64cB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4857e38-7b73-4cdc-b042-10076ad679dd_904x661.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!64cB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4857e38-7b73-4cdc-b042-10076ad679dd_904x661.png 424w, https://substackcdn.com/image/fetch/$s_!64cB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4857e38-7b73-4cdc-b042-10076ad679dd_904x661.png 848w, https://substackcdn.com/image/fetch/$s_!64cB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4857e38-7b73-4cdc-b042-10076ad679dd_904x661.png 1272w, https://substackcdn.com/image/fetch/$s_!64cB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4857e38-7b73-4cdc-b042-10076ad679dd_904x661.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!64cB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4857e38-7b73-4cdc-b042-10076ad679dd_904x661.png" width="362" height="264.6924778761062" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a4857e38-7b73-4cdc-b042-10076ad679dd_904x661.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:661,&quot;width&quot;:904,&quot;resizeWidth&quot;:362,&quot;bytes&quot;:16770,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!64cB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4857e38-7b73-4cdc-b042-10076ad679dd_904x661.png 424w, https://substackcdn.com/image/fetch/$s_!64cB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4857e38-7b73-4cdc-b042-10076ad679dd_904x661.png 848w, https://substackcdn.com/image/fetch/$s_!64cB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4857e38-7b73-4cdc-b042-10076ad679dd_904x661.png 1272w, https://substackcdn.com/image/fetch/$s_!64cB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4857e38-7b73-4cdc-b042-10076ad679dd_904x661.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The remainder <code>01 </code>is our redundant information that we will append to our outgoing message <code>11010</code>. The sender will transmit <code>1101001 </code>and the receiver, who also has the generator polynomial will divide <code>111</code> into <code>11010</code> and will check to see if the remainder is the same as what was received. I left out the quotient because it&#8217;s irrelevant to CRC. All we need is the remainder.</p><h2>Basic Implementation</h2><p>Our generator polynomial for CRC32 will be 32bits long as the name implies. The polynomial we&#8217;ll be using is <code>0x04C11DB7</code>. The biggest thing to note here is that in an actual implementation, we don&#8217;t need the leading 1, since everytime we XOR 1 with 1 it will be 0.</p><pre><code><code># Basic implementation if crc32
def crc32(message:bytearray, poly:int):
    bitmask = 0xFFFFFFFF
    crc = 0

    for byte in message:
        for _ in range(8):
            b = byte &amp; (1&lt;&lt;7) != 0
            divide = bitmask if (crc &amp; (1&lt;&lt;31) != 0) else 0
            crc = (crc &lt;&lt; 1) | b
            crc ^= (poly &amp; divide)
            byte &lt;&lt;= 1
    return (crc &amp; bitmask) # to keep it at 32bits long</code></code></pre><h2>Some little details</h2><p>There are a few things that can be improved upon above, before we go implementing anything.</p><p>First, it&#8217;s going to be a pain having to wait for the entire message to come in before we get to calculate the remainder. We want this to be really fast, and start calculating the remainder while the message is coming in.</p><p>To do this, in our little example above, we will append 2 zeroes to our outgoing message, before we calculate the remainder. Like so</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!U7sE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d77669-8c5b-4c5c-8ea6-e90ac97fb698_887x899.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!U7sE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d77669-8c5b-4c5c-8ea6-e90ac97fb698_887x899.png 424w, https://substackcdn.com/image/fetch/$s_!U7sE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d77669-8c5b-4c5c-8ea6-e90ac97fb698_887x899.png 848w, https://substackcdn.com/image/fetch/$s_!U7sE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d77669-8c5b-4c5c-8ea6-e90ac97fb698_887x899.png 1272w, https://substackcdn.com/image/fetch/$s_!U7sE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d77669-8c5b-4c5c-8ea6-e90ac97fb698_887x899.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!U7sE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d77669-8c5b-4c5c-8ea6-e90ac97fb698_887x899.png" width="322" height="326.3562570462232" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06d77669-8c5b-4c5c-8ea6-e90ac97fb698_887x899.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:899,&quot;width&quot;:887,&quot;resizeWidth&quot;:322,&quot;bytes&quot;:24505,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!U7sE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d77669-8c5b-4c5c-8ea6-e90ac97fb698_887x899.png 424w, https://substackcdn.com/image/fetch/$s_!U7sE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d77669-8c5b-4c5c-8ea6-e90ac97fb698_887x899.png 848w, https://substackcdn.com/image/fetch/$s_!U7sE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d77669-8c5b-4c5c-8ea6-e90ac97fb698_887x899.png 1272w, https://substackcdn.com/image/fetch/$s_!U7sE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d77669-8c5b-4c5c-8ea6-e90ac97fb698_887x899.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So, our outgoing message will be <code>1101011. </code>This allows will allow the receiver to calculate the remainder as the message is coming in. If the remainder is 0, we know (or most likely) have the correct data.</p><p>What happens if leading 0&#8217;s get added/deleted during transmission? It will still be divisible by the generator, but obviously the wrong data. The solution is to prepend a certain amount of 1&#8217;s (in our case 32, since we&#8217;re going to implement a 32 bit CRC) to the data.</p><p>What happens if ending 0&#8217;s get added/deleted during transmission? Well this wouldn&#8217;t be a problem if our message (data + remainder) ended with a 1, but if it ends with a 0 this error may go undetected. In this case, we will end the function with an XOR of some value to fix this problem.</p><p>The resulting code becomes:</p><pre><code># Better implementation of crc32, leading/trailing zeros
def crc32_improved(message:bytearray, poly:int, init:int=0, final_xor:int=0):

    bitmask = 0xFFFFFFFF
    crc = init

    for byte in message:
        for _ in range(8):
            b = bitmask if byte &amp; (1&lt;&lt;7) != 0 else 0
            divide = bitmask if (crc &amp; (1&lt;&lt;31)) != 0 else 0
            crc = (crc &lt;&lt; 1) ^ (poly &amp; (b ^ divide))
            byte &lt;&lt;= 1
    return (crc &amp; bitmask) ^ final_xor</code></pre><div><hr></div><p>There are some other optimizations you could make. You could make a look up table that does some caching like so:</p><pre><code>from functools import lru_cache
@lru_cache
def create_lut(poly): return [crc32_improved(bytearray([x]), poly) for x in range(256)]
# Implementation of crc32 with look up table
def crc32_lut(message:bytearray, poly:int):
    """
    Generates a crc with a look up table for improved speed
    """
    bitmask = 0xFFFFFFFF
    crc = 0

    lut = create_lut(poly)
    for m in message:
        index = (int(m) ^ (crc &gt;&gt; 24)) &amp; 0xFF
        crc = (crc &lt;&lt; 8) ^ lut[index]

    return crc &amp; bitmask</code></pre><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.siliconchipcookies.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Silicon Chip Cookies! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>