Linux, python and startups — Romain Dorgueilhttps://romain.dorgueil.net/2017-07-12T00:00:00+02:00Simple ETL with Bonobo - Europython 20172017-07-12T00:00:00+02:002017-07-12T00:00:00+02:00Romain Dorgueiltag:romain.dorgueil.net,2017-07-12:/blog/en/talk/2017/07/12/simple-etl-with-bonobo-europython-2017.html<p class="first last">I presented Bonobo ETL at EuroPython 2017. My talk, video, slides, links etc. are here.</p>
<img alt="Europython 2017" class="chapo" src="https://romain.dorgueil.net/images/europython-2017.png" />
<p>This year, I had the chance to attend Europython 2017 in Rimini, Italia. I was also reaaly happy and proud to present
<a class="reference external" href="https://www.bonobo-project.org/">Bonobo ETL</a> there, Europython being my favourite tech conference for quite a few
years now.</p>
<div class="section" id="conference-abstract">
<h2>Conference Abstract</h2>
<p>Simple is better than complex, right? That’s true for data pipelines too.</p>
<p>For more than 5 years, I hacked together extract-transform-load (ETL) processes in various different positions (ETL is
just a fancy term for «bunch of things that take data somewhere and put it elsewhere, eventually transformed»).</p>
<p>I did it as a founder, as a consultant, as a technical co-founder, for some side projects, and now as a technical
advisors in two (corporate) start-up accelerators.</p>
<p>In each case, I felt frustrated with the tools available, and in some serious cases, I had to hack things myself to get
the job done. Bonobo is the repackaging of my past experiences for python 3.5+, and grasping the basics should not take
more than the length of the presentation.</p>
</div>
<div class="section" id="video">
<h2>Video</h2>
<div style="text-align:center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/OrNkstD_1O8?rel=0" frameborder="0" allowfullscreen></iframe>
</div></div>
<div class="section" id="slides">
<h2>Slides</h2>
<div style="text-align:center">
<iframe src="//www.slideshare.net/slideshow/embed_code/key/p1whVHDnPZyQ6l" width="560" height="315" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe> <div style="margin-bottom:5px"> <strong> <a href="https://www.bonobo-project.org/post/bonobo-was-at-europython-2017-in-rimini" title="EuroPython 2017 - Bonono - Simple ETL in python 3.5+" target="_blank">EuroPython 2017 - Bonono - Simple ETL in python 3.5+</a> </strong> from <strong><a href="https://romain.dorgueil.net/" target="_blank">Romain Dorgueil</a></strong> </div>
</div></div>
<div class="section" id="sprints">
<h2>Sprints</h2>
<p>I can't thank enough all participants for Bonobo sprint, during the week-end that followed the conference. Yes, a lot of
people were tempted by Rimini's sun and beaches, and it takes real courage to spend a good part of the week-end in
Palacongressi, hacking on open-source projects!</p>
<p>But yes, you guys are warriors, and you were there!</p>
<img alt="The Bonobo ETL sprint, at Europython 2017! Hard forking action in progress!" class="body half-width" src="https://romain.dorgueil.net/images/europython-2017-sprints.jpg" />
<p>Thank you <a class="reference external" href="https://github.com/alekzvik">Alex Vykaliuk</a>, <a class="reference external" href="https://github.com/alexmach77">Alexmach77</a>,
<a class="reference external" href="https://github.com/carloca">Carloca</a>, <a class="reference external" href="https://github.com/faic">Faic</a>, <cite>Parthiv20 <https://github.com/Parthiv20>_</cite>,
<a class="reference external" href="https://github.com/vit-">Vitalii Vokhmin</a> and any one I would have forgotten here (unlikely, but still, against my
will if I did).</p>
</div>
<div class="section" id="links">
<h2>Links</h2>
<p>Not there? Time to discover <a class="reference external" href="https://www.bonobo-project.org/">Bonobo ETL</a>!</p>
<ul class="simple">
<li><a class="reference external" href="https://www.bonobo-project.org/">Bonobo Website</a></li>
<li><a class="reference external" href="http://docs.bonobo-project.org/">Bonobo Documentation</a></li>
</ul>
<p>Also, I'll be at PyconDE 2017, so if you attend, let's talk!</p>
</div>
Your best SEO strategy is to ignore it2016-10-10T08:00:00+02:002016-10-10T08:00:00+02:00Romain Dorgueiltag:romain.dorgueil.net,2016-10-10:/blog/en/startup/2016/10/10/your-best-seo-strategy-is-to-ignore-it.html<p class="first last">Stop wasting your time on Search Engine Optimizations (SEO), your business needs is a product that your users
love, bots will follow them.</p>
<img alt="Focus on building a product that your users love" class="float-xs-left chapo f-l-280" src="https://romain.dorgueil.net/images/focus-on-building-a-product-your-users-love.jpg" />
<p>There are a lot of myths about magical recipes that put you on top of google's first page (preferably on highly competitive
keywords). I've seen a lot of founders thinking about search engine optimization as a kind of holy grail, some almighty
wonder-channel sending them humongous amount of free traffic...</p>
<p>But what an early stage startup wants to work on is its value proposition, its product and finding its product-market fit.</p>
<p>Your only focus should be to <strong>build an amazing solution to a painful problem</strong>, in the most simple and most efficient way
you can imagine. Got a way? You need to do even better.</p>
<p>Whatever takes you away from this goal is a distraction.</p>
<br class="clearfix"><div class="jumbotron docutils container">
<p><em class="fa left-icon fa-quote-left"></em>You want to build for users, not for search engine bots.</p>
<p>A start-up company can't do both.<em class="fa right-icon fa-quote-right"></em></p>
<p class="pull-right tweet-this"><a class="reference external" href="https://twitter.com/intent/tweet?text=You+want+to+build+for+users%2C+not+for+search+engine+bots.+A+start-up+company+can%27t+do+both.+__URL__+%28via+%40rdorgueil%29">tweet this</a></p>
</div>
<ul class="simple">
<li>Bots won't become customers.</li>
<li>Bots won't help you validate hypothesis.</li>
<li>Bots won't take part in a community.</li>
<li>Bots won't call your sales.</li>
<li>Fucking bots have no money!</li>
</ul>
<div class="section" id="search-engine-optimization-for-dummies">
<h2>Search engine optimization for dummies</h2>
<p>Let's step back for a second: What the fuck is search engine optimization? If you're already familiar with the term and
its different meanings, feel free to skip this.</p>
<div class="jumbotron docutils container">
<p>[<strong>tl;dr</strong>] An attempt to define Search Engine Optimization:</p>
<ul class="simple">
<li>You're either trying to rape a 500B+$ business, and good luck with that.</li>
<li>Or you're making a product that rocks, and search engines will love it.</li>
</ul>
</div>
<div class="section" id="what-s-the-main-goal-of-search-engine-companies">
<h3>What's the main goal of search engine companies?</h3>
<p>You can't speak about SEO without understanding a bit of the search engines' goals.</p>
<p>Google's business is to make informations available to people, and they monetize this information. The more accurate
information they get to the user, the more money they'll make in the long run.</p>
<div class="jumbotron docutils container">
<p><em class="fa left-icon fa-quote-left"></em>Search engines wants <strong>happy users</strong>.</p>
<p>If your pages make their users happy, they will send traffic.<em class="fa right-icon fa-quote-right"></em></p>
<p class="pull-right tweet-this"><a class="reference external" href="https://twitter.com/intent/tweet?text=Search+engines+wants+happy+users.+If+your+pages+make+their+users+happy%2C+they+will+send+traffic.+__URL__+%28via+%40rdorgueil%29">tweet this</a></p>
</div>
<ul class="simple">
<li>Happy users that find what they were looking for.</li>
<li>Happy users that won't come back to search the same thing 10 seconds after clicking your link.</li>
<li>Happy users that got relevant informations from their favorite engine.</li>
<li>Happy users that won't try any competing service having such great information here.</li>
<li>Happy users that the engine know so well it can push highly relevant ads that users will even be more satisfied to
read, and click.</li>
</ul>
<p>If your goal is to lie or trick users onto getting to your site, you have a conflicting goal, and search engines won't
be your friends. I'll talk about lying to search engines below. And if as a start-up company, your way to acquire
customers is to lie to them, you won't last long.</p>
</div>
<div class="section" id="how-does-search-engines-work">
<h3>How does search engines work?</h3>
<p>Search engines mostly work by having crawlers that downloads the more content they can on Internet, following links
from page to page, ranking pages on thousands of criteria to feed a kind of inverted index (keywords to document
index). If you know about softwares like Apache SOLR or Elasticsearch, it's pretty much the same concept. Searh engine
actors just made their own super-specialized reverse index to work well with the complex problems they have.</p>
</div>
<div class="section" id="making-your-web-or-mobile-product-easier-to-read">
<h3>Making your (web or mobile) product easier to read</h3>
<p>This should not be a feature, it should be «by design». If you know your tools and use web standards, you should not
spend more than a few minutes a day on this part and it will be sufficient to handle 80% of «on-site» SEO. As a
<a class="reference external" href="http://romain.dorgueil.net/essays/pareto-pragmatic-entrepreneur/">corolary to the Pareto's law</a>,
you should ignore the leftover optimizations that will eat up an infinite amount of time for small to no value.</p>
<div class="jumbotron docutils container">
<p><em class="fa left-icon fa-quote-left"></em>You should focus on the 20% causes that produce 80% of the effects, then simply ignore the rest.<em class="fa right-icon fa-quote-right"></em></p>
<p class="pull-right tweet-this"><a class="reference external" href="https://twitter.com/intent/tweet?text=You+should+focus+on+the+20%25+causes+that+produce+80%25+of+the+effects%2C+then+simply+ignore+the+rest.+__URL__+%28via+%40rdorgueil%29">tweet this</a></p>
</div>
<p>Have clean and accessible URLs, generate your content server side, have title tags around titles, write
alternate texts for images, use standard markup... And that's about it on the technical part.</p>
<p>Have a well thought content structure and organization, a clean navigation, (a.k.a. a way for your visitors to find
their way in the informations)... That's user centric, bots will love it and you'll be good on the structural part.</p>
<p>In fact, that is not SEO, that's only web development done right, and that's what the search bots want to see. Don't
worry if your about your not-so perfect titles, your flaky semantics, your lack of keywords... That's only time
consuming bullshit, on a search engine marketing point of view.</p>
<p>Search engines got very good at ignoring those unimportant mistakes, in favor of real user satisfaction related criteria.</p>
<p>Focus on your users. Provide them the best content you can; the best service you can; the most value possible; an
amazing user experience. This is what makes a good product. This is the only viable way to search engine wonderland.</p>
<p>To sum up, don't craft for bots, craft for users. And as a bonus, that is what search engine need (amazing results for
their users, yay), so you'll have more credit from them.</p>
</div>
<div class="section" id="making-yourself-a-reputation-be-an-expert">
<h3>Making yourself a reputation, be an expert</h3>
<p>Good products make people chat, and that's the root of modern search (as invented by Google in late 90s, mostly).
Once called PageRank, the rules are now much more complex, but the root idea is the same. The more relevant people say
your product is amazing, the more your product is considered relevant.</p>
<p>Oh wait, isn't that how humans already think about things?</p>
<p>It is what you want to create as a company: an amazing service your users will love, cherrish, and be proud to tell
their friends or colleagues about.</p>
<p>If you create amazingness, you'll get press for it, you'll get people refering to you because they love your work, your
way to present it, the philosophy behind it ... If you have the best service for X out there (let's say,
sheep-shaving-as-a-service), people will talk about you for this expertise, and as a bonus, search engines will start
to trust your authority on the subject.</p>
</div>
<div class="section" id="getting-nasty-with-a-giant">
<h3>Getting nasty with a giant</h3>
<p>So now, what a lot of people refers to when speaking of SEO is often something completely different.</p>
<p>Also refered from time to time as «black hat SEO», they do want to cheat, abuse of the system, get a lot of this
valuable traffic at all cost.</p>
<p>If you're building a «start-up» company, trying to give birth to a new, previously unexisting, product and business
model, this is one of the worst ideas you can have, for three main reasons:</p>
<ul class="simple">
<li>You're losing focus, and your product will, as a result, suck.</li>
<li>You're getting nasty with a giant, and the giant will win at this game.</li>
<li>You're building a business that depends on tricking the system. Is the system decide to unplug your tricks, you're dead.</li>
</ul>
</div>
<div class="section" id="still-wanna-do-seo-for-the-sake-of-seo">
<h3>Still wanna do SEO for the sake of SEO?</h3>
<p>If you think all this is bullshit, and that SEO should take a lot of your energy, I suggest you take as the root
directives the following document, created and made available by
<a class="reference external" href="http://searchengineland.com/seotable">Search Engine Land</a> (you know, they created valuable piece of content, they
bring value to the table, and wow, they get free referals by doing so...)</p>
<img alt="Periodic table of SEO, 2015 version" class="body full-width" src="https://romain.dorgueil.net/images/periodic-table-of-seo-2015.png" />
<p>Take each item, priorize it, and do it, ignoring the 20% that will eat up 80% of your time.</p>
<p>Done? Now get back to your product, and focus 200% on your product, because in the end, you will lose if you don't.</p>
<p>And don't forget one thing: if you focus on things like making bots think you're better than you actually are, the worst
that can happen is that they may eventually end up thinking you are indeed a very good answer to a given intent. And I
hope the bots have money, because you do need customers.</p>
</div>
</div>
<div class="section" id="the-underestimated-cost-of-seo">
<h2>The (underestimated) cost of SEO</h2>
<p>Also known as «The myth of the free inbound channel», I've seen a lot of people say that they will «do SEO» because it's
the only way to get traffic for free.</p>
<p>Free, hu?</p>
<p>Let's try to evaluate the different hidden costs. This can't be universal, but try to do it honnestly:</p>
<ul class="simple">
<li>Cash-burn related: enginering time, content marketing time, product design time ...</li>
<li>Non cash-burn related: impact on product quality, on your time to market, on your team happiness ...</li>
</ul>
<p>Now give a number, do you think this channel is still free, or cheap?</p>
</div>
<div class="section" id="focus-on-building-a-product-that-your-users-love">
<h2>Focus on building a product that your users love</h2>
<p>I hope I convinced you that as a product creator, company builder or hacker of 21th century, the only thing that matters
is the value you create for your customers.</p>
<p>Bots aren't customers and won't be.</p>
<p>Having good vanity metrics (like organic traffic) but no traction because of a poor product is the worst that can happen
to you. It will kill you slowly and painfully.</p>
<p>So don't fall for the usual mistake:</p>
<div class="jumbotron docutils container">
<p>Your best SEO strategy as a start-up company is to ignore it, plain and simple.</p>
<p class="pull-right tweet-this"><a class="reference external" href="https://twitter.com/intent/tweet?text=Your+best+SEO+strategy+as+a+start-up+company+is+to+ignore+it%2C+plain+and+simple.+__URL__+%28via+%40rdorgueil%29">tweet this</a></p>
</div>
</div>
Instrument, Measure and Learn2016-09-06T11:00:00+02:002016-09-06T11:00:00+02:00Romain Dorgueiltag:romain.dorgueil.net,2016-09-06:/blog/en/web-performance/2016/09/06/4-instrumentation-measure-learn.html<p class="first last">Web Performance (part 4/4): Let's see a few tools to help measuring how good you've done (pagespeed
statistics, search console, google analytics and pagespeed insights).</p>
<img alt="Flying website" class="chapo full-width" src="https://romain.dorgueil.net/images/the-need-for-speed.jpg" />
<div class="sidebar">
<p class="first sidebar-title">Serving the Web at light-speed Serie</p>
<ul class="last simple">
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/1-light-speed-web-performance.html">Overview</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/2-nginx-pagespeed-nodejs.html">Nginx, pagespeed and a backend</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/3-amazon-cloudfront-cdn.html">Content delivery with Cloudfront</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/4-instrumentation-measure-learn.html">Instrument, Measure and Learn</a></li>
</ul>
</div>
<div class="section" id="we-built-it-we-tested-it-now-let-s-learn">
<h2>We built it, we tested it... Now let's learn!</h2>
<p>There is a few tools you can use to measure the impact of pagespeed. If you're going to optimize your HTTP front boxes,
you probably want to measure if it has a good or bad impact on your response time.</p>
<p>Interesting metrics:</p>
<ul class="simple">
<li><strong>Connection time.</strong> The time an user's browser needs to establish communication with your server.</li>
<li><strong>HTTP Response Time.</strong> The time an user's browser needs to get your HTTP response (html, for example ...).</li>
<li><strong>Request-to-rendered time.</strong> The time it takes for an user to see the actually rendered page as the result of his request.</li>
</ul>
<p>Even if all three metrics can give you very interesting insights on how your server is behaving, we'll consider here
that the one metric that matters is the latest, as what we want to achieve is a better end-user experience. We want the
pages to load fast, and to an user, "load" means "displayed".</p>
</div>
<div class="section" id="a-few-tools">
<h2>A few tools...</h2>
<div class="section" id="pagespeed-statistics">
<h3>Pagespeed Statistics</h3>
<p>First and foremost, let's have a look at the administration and statistic tools bundled with <tt class="docutils literal">ngx_pagespeed</tt>.</p>
<p>By default, they're disabled (for security reasons, you don't want to expose this to the world). Let's enable them.</p>
<div class="highlight"><pre><span></span><span class="c1"># Enable statistics and statistics logging</span>
<span class="k">pagespeed</span> <span class="s">Statistics</span> <span class="no">on</span><span class="p">;</span>
<span class="k">pagespeed</span> <span class="s">StatisticsLogging</span> <span class="no">on</span><span class="p">;</span>
<span class="k">pagespeed</span> <span class="s">LogDir</span> <span class="s">/var/log/pagespeed</span><span class="p">;</span>
<span class="c1"># Admin locations (needs to be enabled later, see admin.conf)</span>
<span class="k">location</span> <span class="s">/ngx_pagespeed_statistics</span> <span class="p">{</span> <span class="kn">allow</span> <span class="mi">127</span><span class="s">.0.0.1</span><span class="p">;</span> <span class="kn">allow</span> <span class="mi">172</span><span class="s">.17.0.0/24</span><span class="p">;</span> <span class="kn">deny</span> <span class="s">all</span><span class="p">;</span> <span class="p">}</span>
<span class="k">location</span> <span class="s">/ngx_pagespeed_message</span> <span class="p">{</span> <span class="kn">allow</span> <span class="mi">127</span><span class="s">.0.0.1</span><span class="p">;</span> <span class="kn">allow</span> <span class="mi">172</span><span class="s">.17.0.0/24</span><span class="p">;</span> <span class="kn">deny</span> <span class="s">all</span><span class="p">;</span> <span class="p">}</span>
<span class="k">location</span> <span class="s">/pagespeed_console</span> <span class="p">{</span> <span class="kn">allow</span> <span class="mi">127</span><span class="s">.0.0.1</span><span class="p">;</span> <span class="kn">allow</span> <span class="mi">172</span><span class="s">.17.0.0/24</span><span class="p">;</span> <span class="kn">deny</span> <span class="s">all</span><span class="p">;</span> <span class="p">}</span>
<span class="k">location</span> <span class="p">~</span> <span class="sr">^/pagespeed_admin</span> <span class="p">{</span> <span class="kn">allow</span> <span class="mi">127</span><span class="s">.0.0.1</span><span class="p">;</span> <span class="kn">allow</span> <span class="mi">172</span><span class="s">.17.0.0/24</span><span class="p">;</span> <span class="kn">deny</span> <span class="s">all</span><span class="p">;</span> <span class="p">}</span>
<span class="c1"># Administrative modules</span>
<span class="k">pagespeed</span> <span class="s">StatisticsPath</span> <span class="s">/ngx_pagespeed_statistics</span><span class="p">;</span>
<span class="k">pagespeed</span> <span class="s">MessagesPath</span> <span class="s">/ngx_pagespeed_message</span><span class="p">;</span>
<span class="k">pagespeed</span> <span class="s">ConsolePath</span> <span class="s">/pagespeed_console</span><span class="p">;</span>
<span class="k">pagespeed</span> <span class="s">AdminPath</span> <span class="s">/pagespeed_admin</span><span class="p">;</span>
</pre></div>
<p>You'll need to adjust the allowed IP so that only you or some trusted network can access it.</p>
<p>Two of the most interesting instruments you'll be able to access is now the nginx pagespeed's statistics...</p>
<img alt="Pagespeed statistics." class="body half-width" src="https://romain.dorgueil.net/images/web-performance/pagespeed-statistics.png" />
<p>... and even more interesting, the histograms, showing amonst other timings of your real users full page load time
(it is using a beacon to measure time, so you'll know how much time is spent between the server connection and the fully
rendered page on client side):</p>
<img alt="Pagespeed histograms." class="body half-width" src="https://romain.dorgueil.net/images/web-performance/pagespeed-histograms.png" />
</div>
<div class="section" id="pagespeed-insights">
<h3>Pagespeed Insights</h3>
<p>A great tool from google called Pagespeed Insights (not to mistake with apache module and nginx extension that we just
used) allows scoring a page from 0 to 100 on a bunch of speed criteria, both for desktop and mobile usage.</p>
<img alt="Welcome, speed." class="body half-width" src="https://romain.dorgueil.net/images/web-performance/google-pagespeed-insights-example.png" />
</div>
<div class="section" id="google-analytics">
<h3>Google Analytics</h3>
<p>I never really understood the timings shown by Analytics. However, you need to know that the tool tries to measure
the speed of page display from a user point of view and records it so you can act.</p>
<img alt="Some fuckingly slow website." class="body half-width" src="https://romain.dorgueil.net/images/web-performance/analytics-site-speed-example.png" />
</div>
<div class="section" id="google-search-console-previously-google-webmaster-tools">
<h3>Google Search Console (previously Google Webmaster Tools)</h3>
<p>This one will have a few days of lag before updating, but going to crawl statistics will give you the average time
Googlebot takes to retrieve your website's pages. Try to keep this one as low as possible (under 200ms is excellent,
500ms is acceptable, 1sec is badly lagging and more than that means that your soft needs urgent attention if you wanna
do anything serious with it).</p>
<img alt="Crawl speed in webmaster tools." class="body half-width" src="https://romain.dorgueil.net/images/web-performance/google-search-console-speed-example.png" />
<p>Be aware that it's not the user render time, so being fast here (let's say, 300ms) does not mean the website will feel
fast for your user. Better use the ngx_pagespeed statistics to know that.</p>
</div>
</div>
<div class="section" id="that-s-all-folks">
<h2>That's all folks!</h2>
<p>End for today! I really hope you enjoyed this little serie.</p>
<p>Do you have some more tools for measuring? Some techniques that are missing here? Please share it in the comments, and
I'll do my best to update the articles as fast as I humanly can!</p>
<p class="btn btn-primary pull-left"><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/3-amazon-cloudfront-cdn.html">« Previous</a></p>
<br class="clearfix">
<br>
<br></div>
Amazon Cloudfront as our Content Delivery Network2016-09-06T10:00:00+02:002016-09-06T10:00:00+02:00Romain Dorgueiltag:romain.dorgueil.net,2016-09-06:/blog/en/web-performance/2016/09/06/3-amazon-cloudfront-cdn.html<p class="first last">Web Performance (part 3/4): Adding a geo-distributed CDN in front of nginx+pagespeed to serve optimized assets
even faster.</p>
<img alt="Flying website" class="chapo full-width" src="https://romain.dorgueil.net/images/the-need-for-speed.jpg" />
<div class="sidebar">
<p class="first sidebar-title">Serving the Web at light-speed Serie</p>
<ul class="last simple">
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/1-light-speed-web-performance.html">Overview</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/2-nginx-pagespeed-nodejs.html">Nginx, pagespeed and a backend</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/3-amazon-cloudfront-cdn.html">Content delivery with Cloudfront</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/4-instrumentation-measure-learn.html">Instrument, Measure and Learn</a></li>
</ul>
</div>
<p>Now that <tt class="docutils literal">nginx+pagespeed</tt> makes a lot of automated optimizations, we would like to offload the task of serving
assets that do not change often to a content delivery network service, and I'll demonstrate the trick with Cloudfront.</p>
<p>We'll gain the following:</p>
<ul class="simple">
<li>Offload our webserver by serving static assets.</li>
<li>Serve assets from a different, cookieless, domain.</li>
<li>Serve the assets from a geographically closer datacenter.</li>
</ul>
<p>Note that as Cloudfront will need to have an HTTP access to your webserver, you won't be able to test the content
delivery locally.</p>
<p>Our target architecture, and in green, the parts we focus on in this article:</p>
<img alt="target architecture with cloudfront and nginx highlighted" class="body half-width m-y-3" src="https://romain.dorgueil.net/images/web-performance/architecture-amazon-cloudfront-pagespeed.png" />
<div class="section" id="create-a-cloudfront-distribution">
<h2>Create a Cloudfront distribution</h2>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p>Cloudfront is a paying service from Amazon Web Services. I find it pretty unexpensive for what it does, but you
should probably pay attention to the pricing before trying out stuff in this section.</p>
<p class="last">Also, I consider that you're already familiar with Amazon Web Services, and that you already have an account that
you can use for this purpose.</p>
</div>
<p>An HTTP endpoint relaying assets from another server is called a <tt class="docutils literal">distribution</tt> in Amazon Cloudfront.</p>
<p>Let's create one, and tune a few settings (you can also edit everything after the distribution is created).</p>
<ol class="arabic simple">
<li>Login to your AWS console, and create a new (web) distribution that will be used as your website's CDN.</li>
</ol>
<img alt="Cloudfront distributions." class="body half-width" src="https://romain.dorgueil.net/images/web-performance/cloudfront-distributions.png" />
<ol class="arabic simple" start="2">
<li>Setup your web application FQDN</li>
</ol>
<ul class="simple">
<li><strong>Origin Domain Name</strong>: you're publicly accessible FQDN</li>
</ul>
<img alt="Cloudfront distribution creation, part 1: fully qualified domain name." class="body half-width" src="https://romain.dorgueil.net/images/web-performance/cloudfront-create-distribution.png" />
<ol class="arabic simple" start="3">
<li>Whitelist Origin HTTP header (will help with CORS)</li>
</ol>
<ul class="simple">
<li><strong>Forward Headers</strong>: Whitelist</li>
<li><strong>Whitelist Headers</strong>: add <tt class="docutils literal">Origin</tt></li>
</ul>
<img alt="Cloudfront distribution creation, part 2: whitelisting the CORS headers." class="body half-width" src="https://romain.dorgueil.net/images/web-performance/cloudfront-create-distribution-2.png" />
<ol class="arabic" start="4">
<li><p class="first">Click <strong>«Create distribution»</strong>, grab your distribution's domain name (for me, <tt class="docutils literal">d13rn6zvbpb4kh.cloudfront.net</tt>)
and let's go back to nginx configuration while Amazon is spreading this distribution configuration to the various
availability zones you selected.</p>
<p>You can also grab a coffee (not required).</p>
</li>
</ol>
</div>
<div class="section" id="configure-nginx-pagespeed-to-rewrite-assets-urls-to-use-our-cdn-distribution">
<h2>Configure nginx+pagespeed to rewrite assets' URLs to use our CDN distribution</h2>
<p>Yet another great thing about <tt class="docutils literal">nginx+pagespeed</tt> is that a simple configuration option enables rewriting all assets
URLs that are relative to a given domain to another. Which means you can, without changing anything to your backend
service code, change all the image, scripts, styles URL so it hits the CDN.</p>
<p>Just adapt and include the two following lines in the nginx cofiguration's server section that is relevant to your
website.</p>
<div class="highlight"><pre><span></span><span class="k">pagespeed</span> <span class="s">Domain</span> <span class="s">http://ngxps.rdc.li/</span><span class="p">;</span>
<span class="k">pagespeed</span> <span class="s">MapRewriteDomain</span> <span class="s">https://d13rn6zvbpb4kh.cloudfront.net</span> <span class="s">http://ngxps.rdc.li</span><span class="p">;</span>
</pre></div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p>I used the HTTPS version of Cloudfront distribution here, because it makes it easier to transition your website from
HTTP to HTTPS, as browsers will refuse HTTP assets if you're serving over SSL. The oposite being false, it won't
harm, but feel free to use the HTTP distribution if you do prefer it.</p>
<p class="last">Of course, you can use the same technique with another CDN, but this is out of this article scope, even if the
pagespeed configuration will look the same.</p>
</div>
</div>
<div class="section" id="wrapping-up">
<h2>Wrapping up</h2>
<p>That's it!</p>
<p>You'll need a publicly accessible HTTP endpoint to test your newly setup, backed by Cloudfront CDN. Just launch both
containers in your favorite scheduler (for example, using a Kubernetes Deployment), and you should be set.</p>
<p>As I can't cover all the specific aspects of running it on a (probably not alone) given production site, I won't get in
details here but if you don't use a scheduler, just fire up the two containers as you would have done locally and notice
the rewrites.</p>
<img alt="CDN is here!" class="body half-width" src="https://romain.dorgueil.net/images/web-performance/prod-server-with-cdn.png" />
<p class="btn btn-primary pull-left"><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/2-nginx-pagespeed-nodejs.html">« Previous</a></p>
<p class="btn btn-primary pull-right"><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/4-instrumentation-measure-learn.html">Next in the serie: Instrument, Measure and Learn »</a></p>
<br class="clearfix">
<br>
<br></div>
<div class="section" id="caveats">
<h2>Caveats</h2>
<ul>
<li><p class="first">Multiple nginx frontends can lead to get the CDN lost, as the hash won't be the same. Currently working on this, as
it can be usefull for example to have a round robin dns lead to more than one front http server, or if you're using
something like kubernetes, having more than one pod working for the service connected to your ingress.</p>
</li>
<li><p class="first">You may hit cross-domain assets failure:</p>
<div class="highlight"><pre><span></span>Font from origin 'https://d13rn6zvbpb4kh.cloudfront.net' has been blocked from loading by Cross-Origin Resource Sharing policy:
No 'Access-Control-Allow-Origin' header is present on the requested resource.
Origin 'http://ngxps.rdc.li' is therefore not allowed access.».
</pre></div>
<p>A few solutions exist for this, but the easiest one (and probably not the most secure) is to add a rather permissive
header to all responses sent by nginx:</p>
<div class="highlight"><pre><span></span><span class="c1"># Cross domain insecure header</span>
<span class="k">add_header</span> <span class="s">"Access-Control-Allow-Origin"</span> <span class="s">"*"</span><span class="p">;</span>
</pre></div>
<p>Still not working? Keep in mind your assets are served from your webserver to Amazon Cloudfront, to the user, and the
CDN-in-the-middle will cache not only the content but also the request headers. You should thus create a cache
invalidation within your Cloudfront distribution to tell Amazon to refresh its informations (and wait a bit, as the
invalidation will take a few minutes to propagate through the availability zones).</p>
</li>
</ul>
</div>
Nginx, pagespeed and a simple node.js backend2016-09-06T09:00:00+02:002016-09-06T09:00:00+02:00Romain Dorgueiltag:romain.dorgueil.net,2016-09-06:/blog/en/web-performance/2016/09/06/2-nginx-pagespeed-nodejs.html<p class="first last">Web Performance (part 2/4): Nginx and pagespeed setup (with a node.js backend) in docker containers for
automatic assets optimization without any new code!</p>
<img alt="Flying website" class="chapo full-width" src="https://romain.dorgueil.net/images/the-need-for-speed.jpg" />
<div class="sidebar">
<p class="first sidebar-title">Serving the Web at light-speed Serie</p>
<ul class="last simple">
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/1-light-speed-web-performance.html">Overview</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/2-nginx-pagespeed-nodejs.html">Nginx, pagespeed and a backend</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/3-amazon-cloudfront-cdn.html">Content delivery with Cloudfront</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/4-instrumentation-measure-learn.html">Instrument, Measure and Learn</a></li>
</ul>
</div>
<div class="section" id="basic-setup-where-nginx-and-nodejs-shares-a-mojito">
<h2>Basic setup — Where nginx and nodejs shares a mojito</h2>
<p>In this first step, we will focus on having a local setup working, in docker containers, with a nginx+pagespeed
container reverse proxy-ing requests to a universal NodeJS backend, a.k.a <a class="reference external" href="http://rdc.li/leanjs">LeanJS</a>.</p>
<p>If you wish, you can replace the later by anything in a container that can serve HTTP requests.</p>
<p>Our target architecture (in green, the parts we focus on in this article):</p>
<img alt="target architecture, nginx and the backend highlighted" class="body half-width m-y-3" src="https://romain.dorgueil.net/images/web-performance/architecture-nginx-pagespeed.png" />
</div>
<div class="section" id="backend-setup">
<h2>Backend setup</h2>
<p>As an example backend, we will use a little (node) starter project I wrote a while ago to learn React, called LeanJS.
It is pre-packaged as a container image and it will serve us well for the demo purpose. You can, of course, use anything
else that serves HTTP.</p>
<div class="highlight"><pre><span></span>docker run --name backend -p <span class="m">3080</span>:3080 -d rdorgueil/leanjs
</pre></div>
<p>After a little while (the first run needs to download the <tt class="docutils literal">rdorgueil/leanjs</tt> image), you should have a backend
running as a background container, creatively named <tt class="docutils literal">backend</tt>.</p>
<p>Open a browser and check that something answers on <a class="reference external" href="http://localhost:3080/">localhost:3080</a>.</p>
<img alt="Welcome, beautiful universal LeanJS default homepage!" class="body half-width" src="https://romain.dorgueil.net/images/web-performance/welcome-leanjs.png" />
<p>Good? Great.</p>
</div>
<div class="section" id="nginx-pagespeed-setup">
<h2>Nginx / Pagespeed setup</h2>
<p>One of the cons of <tt class="docutils literal">ngx_pagespeed</tt> is that it requires to compile your own <tt class="docutils literal">nginx</tt> with it, and probably you will get
your systems dirty doing so.</p>
<p>The good news is I already wrote a little recipe to compile <tt class="docutils literal">ngx_pagespeed</tt> along with <tt class="docutils literal">nginx</tt> in a Debian package,
based on the official nginx's Debian releases.</p>
<p>You can use the <a class="reference external" href="https://github.com/okdocker/image-builder">docker image factory</a> to rebuild it for yourself, or just
<a class="reference external" href="https://hub.docker.com/r/okdocker/nginx/">use the image from docker hub</a>.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p>Schematically, the <tt class="docutils literal">okdocker/nginx</tt> recipe does the following:</p>
<ul class="last simple">
<li>create a build container</li>
<li>install the build dependencies</li>
<li>download the source package</li>
<li>adds ngx_pagespeed in the source</li>
<li>changes the .deb manifest/metadata files</li>
<li>compile the shit</li>
<li>create a clean runtime container</li>
<li>install the newly built Debian package</li>
</ul>
</div>
<p>To use this image to serve an actual backend, you'll need to configure nginx. I usually start with the
<a class="reference external" href="https://github.com/okdocker/server-configs-nginx">okdocker fork of H5BP's nginx configuration boilerplate</a> that
should cover pretty much all your basic needs.</p>
<p>Let's build and run a container based on this <tt class="docutils literal">okdocker/nginx</tt> image:</p>
<div class="highlight"><pre><span></span>git clone https://github.com/okdocker/server-configs-nginx.git nginx-container
<span class="nb">cd</span> nginx-container
make release
make run
</pre></div>
<p>That should be it, a <tt class="docutils literal">nginx</tt> web server is now running on your docker engine's host (most probably locahost),
listening to the <tt class="docutils literal">:80</tt> port, with a rather unimpressive page served.</p>
<img alt="Welcome, stupid default hello world page!" class="body half-width" src="https://romain.dorgueil.net/images/web-performance/welcome-nginx.png" />
<p>Yet, you can look at the source and notice the difference (removed whitespaces and some obscure javascript used for
instrumentation, mostly) with the original in
<a class="reference external" href="https://github.com/okdocker/server-configs-nginx/blob/master/www/default/index.html">the nginx config repository</a>:</p>
<img alt="Woot! Where are those whitespaces?" class="body half-width" src="https://romain.dorgueil.net/images/web-performance/welcome-nginx-source.png" />
</div>
<div class="section" id="plugging-our-backend-where-nginx-is-teinted-with-magic">
<h2>Plugging our backend — Where nginx is teinted with magic</h2>
<p>You should still have the LeanJS backend running in a container, that should be named <tt class="docutils literal">backend</tt>. If you don't,
go back to the backend setup step and start over.</p>
<p>Let's change the default host to proxy our backend. Open <tt class="docutils literal"><span class="pre">sites-enabled/000_default</span></tt> (or <tt class="docutils literal"><span class="pre">sites-available/default</span></tt>,
which is the former's symlink target) and add change it so it looks like this:</p>
<div class="highlight"><pre><span></span><span class="k">server</span> <span class="p">{</span>
<span class="kn">listen</span> <span class="s">[::]:80</span> <span class="s">default_server</span> <span class="s">deferred</span><span class="p">;</span>
<span class="kn">listen</span> <span class="mi">80</span> <span class="s">deferred</span><span class="p">;</span>
<span class="c1"># Path for static files</span>
<span class="kn">root</span> <span class="s">/var/www/default</span><span class="p">;</span>
<span class="c1"># Specify a charset</span>
<span class="kn">charset</span> <span class="s">utf-8</span><span class="p">;</span>
<span class="c1"># Custom 404 page</span>
<span class="kn">error_page</span> <span class="mi">404</span> <span class="s">/404.html</span><span class="p">;</span>
<span class="c1"># Pagespeed basic config</span>
<span class="kn">include</span> <span class="s">pagespeed/basic.conf</span><span class="p">;</span>
<span class="kn">include</span> <span class="s">pagespeed/aggressive.conf</span><span class="p">;</span>
<span class="kn">include</span> <span class="s">pagespeed/admin.conf</span><span class="p">;</span>
<span class="c1"># Pass requests to the internal backend proxy location.</span>
<span class="kn">location</span> <span class="s">/</span> <span class="p">{</span>
<span class="kn">try_files</span> <span class="nv">$uri</span> <span class="s">@backend</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1"># Internal backend proxy location.</span>
<span class="kn">location</span> <span class="s">@backend</span> <span class="p">{</span>
<span class="kn">internal</span><span class="p">;</span>
<span class="kn">if</span> <span class="s">(</span><span class="nv">$request_filename</span> <span class="p">~</span><span class="sr">*</span> <span class="s">\.(jpg|jpeg|gif|png|bmp|ico|pdf|flv|swf|txt|css|js|otf|eot|svg|ttf|woff|woff2|map)</span>$<span class="s">)</span> <span class="p">{</span>
<span class="kn">expires</span> <span class="s">7d</span><span class="p">;</span>
<span class="kn">add_header</span> <span class="s">Cache-control</span> <span class="s">public</span><span class="p">;</span>
<span class="kn">access_log</span> <span class="no">off</span><span class="p">;</span>
<span class="p">}</span>
<span class="kn">include</span> <span class="s">proxy/basic.conf</span><span class="p">;</span>
<span class="kn">proxy_pass</span> <span class="s">http://backend:3080</span><span class="p">;</span>
<span class="kn">proxy_pass_header</span> <span class="s">Cache-Control</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Build and run again the nginx container, with a docker link to the backend:</p>
<div class="highlight"><pre><span></span><span class="nb">cd</span> nginx-container
<span class="nv">DOCKER_RUN_OPTIONS</span><span class="o">=</span><span class="s2">"--link backend:backend"</span> make release run
</pre></div>
<p>Open again <a class="reference external" href="http://localhost/">localhost:80</a> in your favorite web browser and stare for a few seconds at your
backend's output.</p>
<img alt="Now, nginx is serving our backend." class="body half-width" src="https://romain.dorgueil.net/images/web-performance/welcome-leanjs-in-nginx.png" />
<p>Once again, you can have a look at the source and see whitespaces removed, instrumentation added, and assets rewritten
(you may notice they're not on the first load, it is because pagespeed will progressively enhance the output over time,
learning from what users actually request).</p>
<img alt="And the wizard-esque source..." class="body half-width" src="https://romain.dorgueil.net/images/web-performance/welcome-leanjs-in-nginx-source.png" />
<p>Amazing right? Isn't that <a class="reference external" href="http://discworld.wikia.com/wiki/Octarine">octarine</a>-esque?</p>
<p class="btn btn-primary pull-left"><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/1-light-speed-web-performance.html">« Previous</a></p>
<p class="btn btn-primary pull-right"><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/3-amazon-cloudfront-cdn.html">Next in the serie: Amazon Cloudfront as our Content Delivery Network »</a></p>
<br class="clearfix">
<br>
<br></div>
<div class="section" id="caveats">
<h2>Caveats</h2>
<ul>
<li><p class="first">Cache headers are relying less on browser, and more on nginx, which can lead to problems with resources you want to
cache client side (I had problems with Edge Side Includes, for example).</p>
<div class="highlight"><pre><span></span><span class="k">pagespeed</span> <span class="s">ModifyCachingHeaders</span> <span class="no">off</span><span class="p">;</span>
</pre></div>
<p>Beware though as this setting can have pretty unexpected side-effects on the pagespeed behavior. Try to avoid it if
you can, and be conscious that you can cause yourself headaches with this setting.</p>
</li>
<li><p class="first">Some pagespeed filters are riskier than others and can break your website, depending what you're already doing.
You probably won't have any problems for small and very static websites, but if you have a lot of JavaScript magic,
you can encounter a few problems. If you start to experience glitches, try disabling all the filters, see if it solves
the problem, and if it does, reactivate the filters one by one (or bisect the activated filters to find the culprit,
to debug it in log(n) passes instead of n).</p>
<p>For example, when getting this article ready, the <tt class="docutils literal">remove_comments</tt> filter broke the React app. I first confirmed
that pagespeed was the culprit by adding a <tt class="docutils literal">pagespeed off;</tt> directive in the config, then bisected the active
filters (by disabling half, re-enabling half of the half, etc.) to find the dubious filter. By following this process,
you should not take more than a few minutes to debug pagespeed caused issues.</p>
</li>
</ul>
</div>
Serving the web at light-speed2016-09-06T08:00:00+02:002016-09-06T08:00:00+02:00Romain Dorgueiltag:romain.dorgueil.net,2016-09-06:/blog/en/web-performance/2016/09/06/1-light-speed-web-performance.html<p class="first last">Web Performance (part 1/4): Tools and techniques to optimize your website for load speed (4 article serie). Leverages
Nginx, Pagespeed and Amazon Cloudfront CDN.</p>
<img alt="Flying website" class="chapo full-width" src="https://romain.dorgueil.net/images/the-need-for-speed.jpg" />
<div class="sidebar">
<p class="first sidebar-title">Serving the Web at light-speed Serie</p>
<ul class="last simple">
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/1-light-speed-web-performance.html">Overview</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/2-nginx-pagespeed-nodejs.html">Nginx, pagespeed and a backend</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/3-amazon-cloudfront-cdn.html">Content delivery with Cloudfront</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/4-instrumentation-measure-learn.html">Instrument, Measure and Learn</a></li>
</ul>
</div>
<p>In this serie, I'll show how to use a few tools and techniques to optimize your website for load speed.</p>
<p>Everyone loves speed, right?</p>
<p>We will only focus on universal web performance techniques, that you can use whatever language/framework you're using,
as long as it serves HTTP, and on techniques that won't need any (or only very tiny) adjustments to your code base.</p>
<div class="section" id="tl-dr">
<h2>TL;DR</h2>
<p>No time for introductions? Jump right in!</p>
<ul class="simple">
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/2-nginx-pagespeed-nodejs.html">Part 1 — Nginx, pagespeed and a simple node.js backend</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/3-amazon-cloudfront-cdn.html">Part 2 — Amazon Cloudfront as our Content Delivery Network</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/4-instrumentation-measure-learn.html">Part 3 — Instrument, Measure and Learn</a></li>
</ul>
</div>
<div class="section" id="the-need-for-speed">
<h2>The need for speed</h2>
<div class="jumbotron docutils container">
<p>You want you web applications to load fast. Faster. The fastest you can.</p>
<p class="pull-right tweet-this"><a class="reference external" href="https://twitter.com/intent/tweet?text=You+want+you+web+applications+to+load+fast.+Faster.+The+fastest+you+can.+__URL__+%28via+%40rdorgueil%29">tweet this</a></p>
</div>
<p>One of the main reasons reason is that slow websites damage the user experience, and the competition being fierce,
there is a chance your visitor will bounce elsewhere.</p>
<p>I firmly believe a product should always put the user and his needs first. If you're not a "user-first" believer, there
are still other minor reasons that include, non-exhaustively, a
<a class="reference external" href="https://webmasters.googleblog.com/2010/04/using-site-speed-in-web-search-ranking.html">better SEO ranking</a>, a better
mobile experience, possibly more webserver resources to serve the next user (if your server's worker is not anymore
serving a request, then it can serve the next one, yes it's less true with async models, but it's still the case.), etc.</p>
<p>One problem is that if you are convinced web performance is important (and you should), you may start investing infinite
working times in micro optimisations that will hardly have an overall impact on your global site speed. And you know,
engineer time is expensive, so better use it on valuable things.</p>
<p>Over the years, I found a few tools that are cheap to implement but could save you easily a few hundreds of load time
milliseconds.</p>
<p>Those tools are not expert language-specific low-level micro-optimisations, they just "talk the web", are easy and fast
to implement and their impact can be measured pretty fast.</p>
</div>
<div class="section" id="tools-of-the-trade">
<h2>Tools of the trade</h2>
<p>We will use:</p>
<ul class="simple">
<li><a class="reference external" href="http://nginx.org/">nginx</a> to serve HTTP(s) content; already using it? you can just tune your current instance's
configuration;</li>
<li><a class="reference external" href="https://developers.google.com/speed/pagespeed/module/">ngx_pagespeed</a> to optimize the HTML, scripts, CSS and
images; lots of possible options, we'll use reasonable defaults, it's up to you to chose the best deal
by reading the (fucking) docs.</li>
<li><a class="reference external" href="https://aws.amazon.com/fr/cloudfront/">amazon cloudfront</a> as a geo-CDN, a <a class="reference external" href="https://en.wikipedia.org/wiki/Content_delivery_network">content delivery network</a> that will take into account your visitors geolocation to
serve assets from an edge location closer from him than your backend servers; it is also used to serve assets from
a separate, cookie-less, domain, an useful feature to circumvent the simultaneous-download-per-host browser limits;
of course, this only apply if you're using HTTP or regular HTTPS, and is nulled if you're using SPDY or HTTP2;</li>
<li><a class="reference external" href="https://www.docker.com/">docker</a> (optional) to run the various services in isolated Linux containers.</li>
</ul>
<p>Here is our target architecture:</p>
<img alt="target architecture of this article serie" class="body half-width m-y-3" src="https://romain.dorgueil.net/images/architecture-nginx-pagespeed-cloudfront.png" />
<p>The steps to achieve this will be detailed in the following articles:</p>
<ul class="simple">
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/2-nginx-pagespeed-nodejs.html">Part 1 — Nginx, pagespeed and a simple node.js backend</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/3-amazon-cloudfront-cdn.html">Part 2 — Amazon Cloudfront as our Content Delivery Network</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/4-instrumentation-measure-learn.html">Part 3 — Instrument, Measure and Learn</a></li>
</ul>
<p class="btn btn-primary pull-right"><a class="reference external" href="https://romain.dorgueil.net/blog/en/web-performance/2016/09/06/2-nginx-pagespeed-nodejs.html">Next in the serie: Nginx, pagespeed and a simple node.js backend »</a></p>
<br class="clearfix">
<br>
<br></div>
<div class="section" id="caveats">
<h2>Caveats</h2>
<ul class="simple">
<li>For Apache users, you can use <a class="reference external" href="https://developers.google.com/speed/pagespeed/module/download">mod_pagespeed</a> to
achieve the same goals, although its setup is out of this article scope.</li>
</ul>
</div>
Planning poker is not about estimating2016-08-02T00:00:00+02:002016-08-02T00:00:00+02:00Romain Dorgueiltag:romain.dorgueil.net,2016-08-02:/blog/en/agile/2016/08/02/planning-poker-is-not-about-estimating.html<p class="first last">Planning-poker is a common estimation method used for iterative
software development. But more often than not, people overlooks it as
«only» being an estimation technique, although it's way more.</p>
<img alt="Don't lie around and think nobody will notice ..." class="chapo" src="https://romain.dorgueil.net/images/poker-aces.jpg" />
<p>Once in a while, I get the opportunity to talk about planning-poker with various
kind of people. I did implement it in variously sized teams, but it was often
misunderstood by my hierarchy as «only a way to get estimations».</p>
<p>In my opinion, it is a very nice and simple management artifact, unduly reduced
to estimating time and budget, despite the fact it's in fact addressing one
(if not the one) of the most costly parts of any organization: <strong>communication
between people</strong>.</p>
<br class="clearfix"><div class="section" id="some-background">
<h2>Some background</h2>
<p>Planning and estimating software development projects is tough, yet one of the most
common things asked to a development team. The reason is simple, one always want to
have an answer to the simple question «How much?».</p>
<p>Some people tried to apply techniques borrowed to construction; the same people once
thought that two people can do twice as much as one in a given period of time; or that
nine women would give birth to a child in one month.</p>
<p>But... No. <a class="reference external" href="https://en.wikipedia.org/wiki/Waterfall_model">Waterfall mostly died</a>,
everybody is shouting «agile» whenever they can, but people still want to know
beforehand how much something will cost, yet will tell you that «we'll see later
about the feature details, you know man, be "agile"...». Pretty cheap negociation
technique, and will probably get you an awfull product.</p>
<p>A few really important thing to grasp about (software) product development:</p>
<ul class="simple">
<li>Velocity you can get is directly dependant on the team that will execute the work
(and I don't talk about individuals, but the team as a group).</li>
<li>Most often, and even if you have detailed blueprints of your application, the development
process is undeterministic and things <em>will</em> be discovered allong the way.</li>
<li>Software development is a bridge between art and science. It's using science as a tool,
but it is a creative process.</li>
</ul>
<p>If you're saying «this project needs 50 man-days» you're either a liar, or you forgot to
say «to this exact team I know very well, according to past events and according to the
fact our understanding of the project today is not too far from where it will get». Most
of the time, people are liars, and don't even know it. They don't lie because they intend
to harm, they lie because they don't know but still wanna say something.</p>
<p>So what? I don't want to be a liar. But I need to say <em>something</em>. What can I do?</p>
</div>
<div class="section" id="meet-planning-poker">
<h2>Meet Planning-Poker</h2>
<p>Planning poker is a simple «serious game» (well, no, it's not a game, in fact...)
often used to make estimations of amount work that can be done in a «sprint»,
meaning a short iteration of development.</p>
<p>More widely, it can be used as a tool to estimate any task that is either not completely
definite, or any task on which time spent to completion is highly function of who's executing
it.</p>
<p>It's a simple artifact to use with your whole team, that will be used to
calculate a backlog's complexity, an arbitrary statistic measure that can be
then used to make predictions based on what already happened.</p>
<img alt="Planning poker cards using a rounded fibonacci sequence" class="body" src="https://romain.dorgueil.net/images/planning-poker.gif" />
<p>A simple way to run it is to have all team members (players, let's say) have a
hand of around 10 cards with fibonacci numbers on it (1, 2, 3, 5, 8, ...), and
ask them how hard is each task compared to the others. Each «player» should
assign a complexity value for the task and everyone should reveal it at
the same time, to avoid various bias of human influence in communication. Put
shortly, everyone must make a choice, only one choice, and by himself.</p>
<p>For each task, two things can happen.</p>
<p>First possibility is that everybody agrees, or have a pretty close estimation,
and you can just grab the number (or averaged number) and jump to the next task
(if you're using some kind of kanban, you can write it down right on the card.
If you have a web-based bug tracker, you should find a way to save the
complexity number attached to the ticket).</p>
<p>The second possibility is that there is no consensus, and then interesting things
happen, because you need to get to one. My way to do it is to take the two
people that chose the most extreme values, and let them debate why they
think it's so easy or so hard to realize. Once they have a better understanding
of the task a second round happen and except for very complex tasks, you should
get a consensus pretty soon.</p>
<p>What to do with all those numbers? Easy. Make a sum, compute how much of those points
are resolved the first few days, and with a simple division you'll get an estimation
of the overall time needed to resolve the whole backlog.</p>
<p>It's basically statistics, so you'll get more precision with more real world data,
but you should have a vision on the time budget you need to allocate even after
just a few days of work.</p>
</div>
<div class="section" id="what-you-should-not-do">
<h2>What you should <em>not</em> do</h2>
<p>Don't try to get this value for single persons by taking the complexity resolved
per day average and divide it by number of people in the team. The whole process
works because the team work as such, and you're most likely not to get the same
value if you take only one as "a team". Use it for what it's good.</p>
<p>A corollary of this «don't» is that you can't say «Hey, I need to achieve the
project 1.5 times faster than the estimation gives me, so i'll just do the maths
and I'll just make the team grow by 1.5». That does not work, and you can even
lower the complexity-per-day (also called velocity) of the team by doing so, at
least in the first days or weeks.</p>
<p>A team should be able to work together as a group, taking in consideration the
work of each member as a part of the whole, using efficient communication.
When you double one team's heads count, what happens first is that the cost of
communication raises: merging code is communication; understanding the standards
for one project is communication; feeling the right balance between code quality
and intellectual waste is communication; not having two people solving the
same issue is communication; understanding the business stakes is communication; etc.</p>
<p>That's why I think the most important role of <strong>planning poker is not estimating</strong>.
Correct estimations is just a colateral damage you'll get for free.</p>
</div>
<div class="section" id="collective-intelligence">
<h2>Collective intelligence</h2>
<p>Planning poker is a small amount of time dedicated to make your team's
communication skills grow.</p>
<p>Planning poker is an inexpensive way to exploit the collective knowledge of
a team at its maximum.</p>
<p>How damn? Why?</p>
<p>Think about what happens when two members of a team debates after having given
complexities of «2» and «13» on the same task. What they will say can sound like
the following:</p>
<blockquote class="pull-quote">
<p><strong>Jane —</strong> «Hey, I think it's easy to do because we have some framework that can scaffolds
this user interface for us in a breeze.»</p>
<p><strong>John —</strong> «Oh really? Did not think about it, should make this a lot easier, but did you
think about the impact on logistics department? They should be able to see
quotations and invoices right when they're emitted, and they use a system that's
a hell to interact with... I had a minor patch to apply on it last week and what
should have took 5 minutes kept me busy for days...»</p>
<p><strong>Jane —</strong> «Mmmh right, did not think about it... Maybe we should ask them if a daily
export can be enough?»</p>
</blockquote>
<p>You got the idea... What's happening here? The whole team is getting
priceless information about the knowledge of each one on a given task. The tools
that make their life easier, the gotchas about some external services... They
even can start to make proposals on way to lower the development price...</p>
<p>Seniors in the team will dispatch informations that only them can use usually,
juniors will make better contributions, and each one can concentrate on their
real value, being members of one team, instead of being a few individuals acting
as such in something their managers called a «team».</p>
<p>Isn't it worth it to waste two hours a week on getting better at working together?</p>
<hr class="docutils" />
<p>I'm not a «tool» or «process» advocate, for the sake of it. As the agile
manifesto would say, I value individuals and interactions over processes and
tools. But that is exactly what it is.</p>
<p>Interactions between people. Efficient and fast. Using a small simple artifact
that any smart child can learn in 10 minutes.</p>
<p>Still not convinced? Just invest some man-hours once and observe the result.</p>
<hr class="docutils" />
<small class="credits">
Photo credits:
<a href="https://www.flickr.com/photos/36417205@N08/5112547263/" target="_blank" rel="nofollow">fitzsean</a> <a href="http://creativecommons.org/licenses/by-nd/2.0/" rel="license nofollow" target="_blank">cc by-nd</a>
</small></div>
Future is now2016-07-28T00:00:00+02:002016-07-28T00:00:00+02:00Romain Dorgueiltag:romain.dorgueil.net,2016-07-28:/blog/en/europython/2016/07/28/future-is-now.html<p class="first last">What's next? Where is python future? No way exhaustive or true, but here are some hints about what may come, faster than we may think.</p>
<img alt="Future of Python?" class="chapo" src="https://romain.dorgueil.net/images/back-to-the-future-de-laurean.jpg" />
<div class="sidebar">
<p class="first sidebar-title">EuroPython 2016</p>
<ul class="last simple">
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/26/europython-2016.html">My notes on EuroPython 2016</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/27/asynchronous-programming.html">Asynchronous programming</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/27/web-scraping.html">Web scraping</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/27/data-processing.html">Data processing</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/27/testing.html">Testing</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/28/future-is-now.html">Future is now</a></li>
</ul>
</div>
<p>A few amazing shiny brand new things you can try ...</p>
<div class="section" id="class-attributes-definition-order">
<h2>Class attributes definition order</h2>
<p><a class="reference external" href="https://www.python.org/dev/peps/pep-0520/">PEP 520 — Preserving Class Attribute Definition Order</a>. It defines
a new special attribute of class called <cite>__definition_order__</cite> you can used to avoid awful definition counters on
descriptors (that a looot of frameworks are using, for example django in forms and orm).</p>
</div>
<div class="section" id="naming-descriptors">
<h2>Naming descriptors</h2>
<p><a class="reference external" href="https://www.python.org/dev/peps/pep-0487/">PEP 487 — Simpler customisation of class creation</a> (still draft at the
time of writing) is another "awful hack killer" that allows a class to tell its descriptor attributes about their
name, avoiding common but stupid iterations over type attributes to set the name afterwards. This also removes the
restriction of the impossibility to path the type using those mechanics after the metaclass being instanciated
(ever tried to add a field to a django model after the class definition?).</p>
</div>
<div class="section" id="jupyter-notebooks">
<h2>Jupyter Notebooks</h2>
<p>How many of you have ever heard of... No just kidding.</p>
<p>But some lightning talk presented a few novelties around Jupyter:</p>
<ul class="simple">
<li>Jupyter Lab, the next major version, got an amazing interface revamp, that includes tabs, split panes and fully
functional terminals. You should give it a try, it's amazing</li>
<li>Try to <cite>!pip install something</cite> in a notebook.</li>
<li>Amazing extensions like kernel gateways, dashboards ...</li>
<li><cite>%lsmagic</cite></li>
</ul>
</div>
<div class="section" id="python-4">
<h2>Python 4?</h2>
<p><a class="reference external" href="https://github.com/pyparallel/pyparallel/blob/b65ae9845dfd4def11daa05909586ecd85ec070b/README.md">PyParralel, a solution for removing the limitation of the Python Global Interpreter Lock (GIL) without needing to actually remove it at all.</a></p>
<p>That's all folks!</p>
</div>
Asynchronous programming with python2016-07-27T00:00:00+02:002016-07-27T00:00:00+02:00Romain Dorgueiltag:romain.dorgueil.net,2016-07-27:/blog/en/europython/2016/07/27/asynchronous-programming.html<p class="first last">Asynchronous programming with modern python (3.4+) using asyncio and friends.
some goodies from the future. (notes from Europython 2016 Bilbao)</p>
<img alt="Let's be async" class="chapo full-width" src="https://romain.dorgueil.net/images/clocks-around-the-globe.png" />
<div class="sidebar">
<p class="first sidebar-title">EuroPython 2016</p>
<ul class="last simple">
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/26/europython-2016.html">My notes on EuroPython 2016</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/27/asynchronous-programming.html">Asynchronous programming</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/27/web-scraping.html">Web scraping</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/27/data-processing.html">Data processing</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/27/testing.html">Testing</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/28/future-is-now.html">Future is now</a></li>
</ul>
</div>
<p>A lot of fuss has been around <tt class="docutils literal">asyncio</tt> and tools for asynchronous programming integrated to python standard library
starting with python 3.4 and without much of a surprise, a lot of talks where around this topic.</p>
<p>Although it would be hard to summarize all the talks, here are a few notes I took.</p>
<div class="section" id="random-asyncio-patterns">
<h2>Random asyncio patterns</h2>
<p><strong>Executing some synchronous code in a task/future wrapped thread executor...</strong></p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">asyncio</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="k">def</span> <span class="nf">long_task</span><span class="p">():</span>
<span class="k">print</span><span class="p">(</span><span class="s1">'starting ...'</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s1">'done!'</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">loop</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">loop</span><span class="o">.</span><span class="n">run_in_executor</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">long_task</span><span class="p">))</span>
<span class="k">finally</span><span class="p">:</span>
<span class="n">loop</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</pre></div>
<p><strong>Waiting on a list of futures and acting as soon as one is completed...</strong></p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">asyncio</span>
<span class="n">async</span> <span class="k">def</span> <span class="nf">my_sleeping_task</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">secs</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="s1">'Starting {}'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">name</span><span class="p">))</span>
<span class="n">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">secs</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s1">'Finished {}'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">name</span><span class="p">))</span>
<span class="k">return</span> <span class="n">name</span>
<span class="n">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">tasks</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">my_sleeping_task</span><span class="p">(</span><span class="s1">'one'</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span>
<span class="n">my_sleeping_task</span><span class="p">(</span><span class="s1">'two'</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span>
<span class="n">my_sleeping_task</span><span class="p">(</span><span class="s1">'three'</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="p">]</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">tasks</span><span class="p">):</span>
<span class="n">done</span><span class="p">,</span> <span class="n">tasks</span> <span class="o">=</span> <span class="n">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">wait</span><span class="p">(</span><span class="n">tasks</span><span class="p">,</span> <span class="n">return_when</span><span class="o">=</span><span class="n">asyncio</span><span class="o">.</span><span class="n">FIRST_COMPLETED</span><span class="p">)</span>
<span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="n">done</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="n">task</span><span class="p">,</span> <span class="n">task</span><span class="o">.</span><span class="n">result</span><span class="p">())</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">loop</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">main</span><span class="p">())</span>
<span class="k">finally</span><span class="p">:</span>
<span class="n">loop</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</pre></div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last"><tt class="docutils literal"><span class="pre">asyncio.gather(...)</span></tt> can be used to get the results of a futures/coroutines collection, but you won't get the
results as soon as the first future or coroutine is completed.</p>
</div>
</div>
<div class="section" id="twisted-s-still-alive">
<h2>Twisted's still alive</h2>
<p>Yes, Twisted is overlapping features with asyncio but in fact, it's a good news for them. They will be able to remove
the async part from the library to focus on things asyncio is not meant to do.</p>
<p>Still, Twisted is a huge library with high stability requirements so it's not likely to happen before at least 2020.</p>
</div>
<div class="section" id="new-tools">
<h2>New tools</h2>
<p>Replace asyncio event loop <tt class="docutils literal">uvloop</tt>, based on <tt class="docutils literal">libuv</tt>:</p>
<ul class="simple">
<li>Benchmarks: <a class="reference external" href="http://magic.io/blog/uvloop-blazing-fast-python-networking/">http://magic.io/blog/uvloop-blazing-fast-python-networking/</a></li>
<li>Sources <a class="reference external" href="https://github.com/MagicStack/uvloop">https://github.com/MagicStack/uvloop</a></li>
</ul>
<p>Talk with your database asynchronously:</p>
<ul class="simple">
<li><a class="reference external" href="https://magicstack.github.io/asyncpg/">Async driver for postgres</a></li>
</ul>
</div>
<div class="section" id="references">
<h2>References</h2>
<ul class="simple">
<li><a class="reference external" href="https://ep2016.europython.eu/conference/talks/python-and-async-programming">Python and async programming - Nicolas Lara - EuroPython 2016</a></li>
<li><a class="reference external" href="https://www.youtube.com/watch?v=7cC3_jGwl_U">Building protocol libraries the right way - Cory Benfield - PyCon 2016</a></li>
<li><a class="reference external" href="https://www.youtube.com/watch?v=l4Nn-y9ktd4">Thinking in coroutines - Łukasz Langa - PyCon 2016</a></li>
</ul>
</div>
Data processing2016-07-27T00:00:00+02:002016-07-27T00:00:00+02:00Romain Dorgueiltag:romain.dorgueil.net,2016-07-27:/blog/en/europython/2016/07/27/data-processing.html<p class="first last">A few notes from EuroPython 2016 about data processing, natural language processing, machine learning...</p>
<img alt="Tadaaa-Tadaaa (Dataaa Dataaa?)" class="chapo" src="https://romain.dorgueil.net/images/data-scrabble.jpg" />
<div class="sidebar">
<p class="first sidebar-title">EuroPython 2016</p>
<ul class="last simple">
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/26/europython-2016.html">My notes on EuroPython 2016</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/27/asynchronous-programming.html">Asynchronous programming</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/27/web-scraping.html">Web scraping</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/27/data-processing.html">Data processing</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/27/testing.html">Testing</a></li>
<li><a class="reference external" href="https://romain.dorgueil.net/blog/en/europython/2016/07/28/future-is-now.html">Future is now</a></li>
</ul>
</div>
<p>One of the keynotes, by Gaël Varoquaux, was called
<a class="reference external" href="http://www.slideshare.net/GaelVaroquaux/scientit-meets-web-dev-how-python-became-the-language-of-data">«How python became the language of data?»</a>.
And while it's probably not the only one language good with data, the fact that python is widely used for this purpose
by people with different educational/professional backgrounds, like engineers and (hard-core) scientists, makes the
libraries available in this domain of very good quality.</p>
<p>Most of the greatest tools available are summarized on <a class="reference external" href="http://pydata.org/downloads.html">PyData</a>.</p>
<div class="section" id="managing-data-processing-tasks-at-scale">
<h2>Managing data processing tasks at scale</h2>
<p><strong>AirFlow</strong> (from Airbnb, now an Apache incubator project)</p>
<ul class="simple">
<li><a class="reference external" href="http://airbnb.io/projects/airflow/">http://airbnb.io/projects/airflow/</a></li>
<li><a class="reference external" href="https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls">https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls</a></li>
</ul>
<p><strong>Luigi</strong> (from Spotify, requires HDFS)</p>
<ul class="simple">
<li><a class="reference external" href="http://luigi.readthedocs.io/">http://luigi.readthedocs.io/</a></li>
</ul>
</div>
<div class="section" id="natural-language-processing">
<h2>Natural Language Processing</h2>
<p>An amazing talk by <a class="reference external" href="http://kjamistan.com/">Katharine Jarmul</a> was about natural language processing state-of-the-art.
You can find her slides on her website: <a class="reference external" href="http://kjamistan.com/i-hate-you-nlp/">I hate you NLP</a>.</p>
</div>
<div class="section" id="machine-learning-for-dummies">
<h2>Machine Learning for Dummies</h2>
<p>What I know about machine learning is between nothing and not much:</p>
<ul class="simple">
<li>Start-ups love to include it in their pitches even if what they're actually doing is a few regexp matches.</li>
<li>Neural networks helped AlphaGo learn how to beat Lee Sedol, one of the (if not the one) best go players in the world.</li>
<li>ML is mostly transforming some vague human-friendly thing into a n-dimmensional vector in a given n-dimmensional
space, so it becomes easy to compute similarity as geometrical distances between two vectors.</li>
</ul>
<p>I added a few bits to my understanding of the concept during the conference talks:</p>
<ul class="simple">
<li>Tensors are n-dimmensional generalization of matrices. But now that I'm reading
<a class="reference external" href="https://en.wikipedia.org/wiki/Tensor">the wikipedia page</a>, I'm lost again.</li>
<li>Neural networks are networks of "neurons", that transforms the input by applying a <tt class="docutils literal">y = softmax(weight * x + biasis)</tt>
tranformation.</li>
<li>Machine learning is an iterative process adjusting weight and biasis of each neuron on each iteration, trying to get
a network actually able to transform some abstract matrix (like an image, a document, a word, a go board ...) into
an computer intelligible concept (like "dog", "happy", "sad", "good move", "bad move").</li>
</ul>
<p>Sorry if I'm taking shortcuts or showing severe lacks here, please feel free to comment to say how incorrect I am.</p>
<p><a class="reference external" href="https://www.tensorflow.org/">TensorFlow</a> is a python library that makes it "easy" to create neural network
repreentations in python, and then to execute them either locally or on a distributed cluster of machines. <a class="reference external" href="https://www.ianlewis.org/">Ian Lewis</a>, from the Google Cloud Platform team, made a great talk about it.</p>
</div>
<div class="section" id="data-visualization">
<h2>Data visualization</h2>
<p>Although I didn't find the time to play with the tool yet, we had a great presentation of
<a class="reference external" href="http://bokeh.pydata.org/">Bokeh</a>, an amazing data visualization library with a lot of really usefull features. It
allows to describe graphs in pure python, either in a notebook or in a regular script, and can display it either in
a client-server way or even write down plain static HTML files (think reporting).</p>
<p>Definitely worth checking out.</p>
</div>
<div class="section" id="misc">
<h2>Misc</h2>
<p>As always, jupyter notebooks use is highly recommended to create your models.</p>
<p>Also, I found <a class="reference external" href="http://drivendata.github.io/cookiecutter-data-science/">the cookiecutter data science template</a> to
be a great directory structure starting point for data processing projects.</p>
<p>And, how many of you have heard/used Jupyter Notebooks?</p>
<hr class="docutils" />
<small class="credits">
Photo credits:
<a href="https://www.flickr.com/photos/jannekestaaks/14204781667/" target="_blank" rel="nofollow">jannekestaaks</a>
<a href="http://creativecommons.org/licenses/by-nc/2.0/" rel="license nofollow" target="_blank">cc-by-nc</a>
</small></div>