Notes from FOSDEM 2016
02 Feb 2016The Free and Open Source Developers’ European Meeting (FOSDEM) just concluded its 2016 edition in Brussels, Belgium. It was my first time attending this huge, fascinating, registration-free conference, and I really enjoyed the chance to become more familiar with the FOSS community and its ideals, and learn about some very cool projects and technologies. Here’s a quick summary of what I took away from FOSDEM, along with my notes from the talks I attended.
Some things I learned
- The meaning of “free software” and the four freedoms, as told by Richard Stallman
- Several things about CPython that can make Python code slow, and several ways to speed it up
- Interesting things going on in Lua land: cool projects are using it for web development, LuaJIT is crazy fast (almost suspiciously so) and is even being used for networking
- Accessible software: design-for-all principles and some FOSS projects concerned with accessibility
- Tools & strategies developers can use to make their (FOSS) projects easier to localize/translate
- The Fairphone 2 is pretty awesome (well, I already new that), and plays nicely with free software
- Codes of Conduct are hella important, and not contradictory to the ideals of the FOSS community
- The Humanitarian OpenStreetMap Team are my heroes
See the notes below for more!
Disclaimer: The notes below represent my impressions of the content of these talks, jotted down as I listened. They may or may not be totally accurate, or precisely/adequately represent what the speakers said or think. If you want to get it from the horse’s mouth, follow the links to the FOSDEM schedule entry to find the abstract, slides, and/or other resources!
Day 1 - Saturday 30.01.2016
Python Tips, Tricks, and Dark Magic
Jordi Soucheiron
- Some useful syntax tricks, etc.
- See slides on GitHub
Design for All vs. Design for One and Adaptive User Interfaces
C. Strobbe (Stuttgart Media Uni/liberte0.org)
Github: cstrobbe
http://cstrobbe.github.io/AccessibilityResources/
GPII: universal preferences for accessibility settings gpii.net cloud4all.info
- See slides on FOSDEM talk page
OpenSouceDesign.net
Freenote: #openSourceDeisgn
Github: OpenSourceDesigns
Exploring CPython
Stephane Wirbel
(matrixise on GitHub, Twitter)
Slides: https://speakerdeck.com/matrixise/exploring-our-python-interpreter
When you execute a Python command at the command line, the flow is:
- command line
- lexer (
tokens
) - parser (
ast
) - compiler (bytecode)
- interpreter
- standard library
How to get started contributing to Python:
- DevGuide
- pythonmentors.com - program for new contributors to Python core (mailing list, see below)
- mailing lists
- ML = https://mail.python.org/mailman/listinfo
- ML/new-bugs-announce
- ML/python-bugs-list
- ML/core-mentorship
- Create user account on https://bugs.python.org (sign contributor agreement)
Overview of directories (see slides)
Use ast
to explore the structure of your python code:
Use dis
to explore bytecode (see Include/opcode.h for the codes)
Peepholer
- optimizes the bytecode by omitting e.g. instructions from the python source that won’t ever be executed (“dead branches”)
- Python/peephole.c >
PyCode_Optimize
- In python, use via:
bytecodes = optimizer(bytecodes)
Interpreter: everything is in Python/ceval.c, especially in the funciton PyEval_EvalFrameEx
(huge switch statement that Allison Kaptur was talking about?)
Python Tricks from Mercurial
Pierre-Yves David
Imports
- can be slow
- you might not need everything you’re importing => lazy imports: only load module when accessed
In Python, everything is stored as dictionaries (not entirely true, but for simplicity we’ll say it is)
When python looks for wally
, it searches:
- Local variables (mostly)
- Module variables
- Global variables
When Python looks for wally.hat
, it searches:
So if you put an attribute lookup inside of a loop, e.g.
This will be slow because python has to look up the attribute each time. So you should do something like:
Similarly, you should reassign globals/builtins to local variables for faster lookup inside of a loop:
Something about __contains__
…?
__slots__
:
When Python creates an object, it creates a dictionary to store the object’s properties.
It doesn’t know how many properties there’ll be, so it allocates a bunch of space.
If you’re going to create a bunch of small objects (only a few properties),
and you know they won’t need other properties, you should use
__slots__
to tell Python only to create a dictionary of the right size,
and not to allocate unnecessary space for the dictionary
Garbage collection can be turned off in certain situations to save time
PyPy sped up hg log
from 3x to 12x
- just in time compilation
- finds hot loops and compiles to machine code
CPython also has room for improvement
LOAD_METHOD
- FAT Python static optimizer
Things not to do:
- Testing
None
as a boolean, e.g.if a:
- useif a is not None:
instead - Using mutable default values, e.g.
def babar(hat=[])
- this value is created once when the function is defined, and then used by all subsequent calls to the function. So if the first call changes it, the changed version will be seen by the second call - A closure in a closure in a closure - try to make an actual function and reduce the number of arguments you want to pass
Going beyond CPython: CFFI and PyPy
Armin Rego
PyPy: http://pypy.org/ CFFI: http://cffi.readthedocs.org/
CFFI (C Foreign Function Interface): Basically lets you execute C code right from python
PyPy: Python interpreter in Python
- 3-10x faster than CPython
- Starts working like an interpreter, then uses a JIT compiler
- Generate & execute machine code from the program
- Different garbage collection techniques than CPython
- instead of using reference counters (as in CPYthon), stores objects in a “nursery” and then moves out the surviving objects when the nursery is full
- doesn’t work with CPython C extensions (e.g.
numpy
), because these expect reference counters (assumes that the python object in question is PyObject*, that it won’t be moving around in memory and it has a reference counter)
- PyPy has some support for C extensions, via
cpyext
module: this emulates objects for C extensions with the expected stuff (fixed memory location and reference counter) - butcpyext
is really, really slow
CPython C API
- Makes it easy to add things to Python
- Tools built on top of it: SWIG, Cython, CFFI, ….
- CFFI is sort of an alternative to it, because it essentially allows you to do C things in python via
ffi.*()
- and CFFI works better with non-CPython interpreters
FAT Python - static optimizer
Victor Stinner
(haypo on GitHub, @victorstinner on Twitter)
PDF of slides on FOSDEM site
http://faster-cpython.readthedocs.org/fat_python.html
- CPython is slower than C, also slower than JavaScript (fast JIT compilers)
- There are other, faster versions of python: e.g. PyPy JIT, Pyston JIT, Numba JIT for numpy, Cython static optimizer
- But none of these have replaced CPython yet as thte reference implementation for new features
Problem: everythin in Python is mutable, e.g. you can replace builtin functions, function code, global variables… if you optimize based on the assumption that e.g. builtins haven’t been replaced, the optimization “breaks” python syntax and won’t work for all applications
Solution: implement “guards” to check assumptions before attempting to optimize the code; if assumptions don’t hold, skip the optimization
CPython already has optimization via the Peephole optimizer, but this is in C and is very limited
Optimizing AST:
- Call builtin functions: you can replace a call to a pure, builtin function with the result (e.g. len(‘abc’)=>3)
- If you have a call like
frozenset('abc')
, they object will be created at runtime. Instead, since the object won’t change, you can create the object at compile time and load it as a constant, so that it can be looked up directly in an array (or something?) - Simplify iterables (e.g. range(3)=> (0,1,2))
- Loop unrolling: rewrite
for x in [1,2,3]: print(x)
asx=1; print(x); x=2; print(x);
etc. - Copy constants: rewrite
x=1; print(x); x=2; print(x);
asx=1; print(1); x=2; print(2);
etc. - Constant folding: when you have operations on constants, compute everything you can at compile time: e.g. +(5)=>5,
'P' in 'Python'=>True
,python2.7
[:-2]=>’python’;x in [1,2,3]
=>x in (1,2,3)
- Copy globals to constants (LOAD_CONST is faster than LOAD_GLOBAL)
- Remove dead code (code that will never be executed)
PEPs related to FAT:
PEP 509: dict version - Each dict (e.g. builtins) has a version identifier, such that when something is changed in the dictionary, the version changes. This allows you to check whether e.g. something has been replaced in builtins dict.
PEP 510: Specialize - Adds PyFunction_Specialize() C function, and modifies Python/ceval.c to decide whether to specialize the code or not (e.g. based on result of guard checks)
PEP 511: Transformer - This one is controversial, still under discussion and may change
Lua: The Language of the Web?
Paul Cuthbertson
@paulcuth
Starlight project: Lua to ES6 transpiler https://github.com/paulcuth/starlight
Project started when he saw cool features of ES6 and noticed similarities with Lua (?) e.g. generators, …, proxies, destructuring, multiple return values, block scoping
Starlight transpiles Lua to ES6 (need to also include Babel to run as ES5 in browsers)
Allows you to write scripts in Lua and even import JS into the Lua env
Has a DOM API (same as Moonshine)
But not all features of Lua can translate to ES6
Also: Sailor MVC framework (uses Starlight) http://sailorproject.org/index.lua
Lua for scripting OSX: Hammerspoon
http://www.hammerspoon.org/
Mail2Voice - Accessible email
- http://www.mail2voice.org/en/
- https://git.framasoft.org/Mail2Voice/Mail2Voice_v1
- http://www.mail2voice.org/wiki/doku.php
- Mailing list (see slides)
-
Slides on FOSDEM site
- In version 1.0, now being re-worked for version 2.0
- Uses speech synthesis to read emails out loud to users
- Has users record an audio message and sends it as mp3
- Doesn’t use recognition because target users may have speech impairment etc. which affects their ability to be recognized
Day 2 - Sunday 31.01.2016
coala Code Analyzer
Continuous translation with Weblate
Michal Čihař
- Solves the problem: by the time translators have finished translating (part of) a project, the source has been updated such that some of the translations are outdated
- Allows translation to progress continuously throughout the project
- Weblate pulls from your VCS (git or mercurial)
- Translators commit new translations to Weblate (doesn’t require technical proficiency with git - uses web interface)
- Weblate pushes the translations back to your project
- Tracks authorship so you can use git blame to control quality
“Some of my best friends are localizers” - tips for developers to help localizers
Dwayne Bailey
dwayne@translate.co.za
- Translate.co.za builds tools
- Driven by need in SA for localized systems so that people can learn/use software even if they’re not literate in English
- PO - GNU gettext format
http://www.translate.org.za/10-things-localizers-would-like-developers-to-keep-in-mind/
- Can’t assume technical skills on the part of localizers/translators (especially true for lower-resource languages)
- Complex contribution processes can be a barrier to localization - don’t force localizers to become coders
- Best localizers are users of your project/passionate about the domain
- Translators use tools (e.g. with translation memories, variable highlighting) - if you add e.g. a new file format to the project, it might not be compatible with their tools. Forcing them to not use their tools is like forcing a vim user to use emacs
- Use standard formats (PO, XLIFF)
- or what is expected in your domain (if you’re doing Java, do it the Java way, etc.)
- Assumptions of language: direct translations might not work in all languages
- try to choose e.g. a register that works internationally (instead of only over-hyped SV marketing speak) - be aware that localizers may need ot use a different register/tone in their language
- keep an eye on idioms/pop culture references - most translators won’t know what to do with these and if there are no instructions on what to do with them, they’re stuck
- don’t assume that e.g. capitalization or punctuation will work the same in other languages - e.g. Django automatically capitalizes the name of the model and adds a “:” - this doesn’t work for some languages
- plurals will work differently in other languages - don’t make assumptions about them
- number/date formatting
- Names - don’t assume “first name” and then “surname”, not all names break down like that
- Keep strings to translate together - e.g. don’t split sentences over multiple lines
- Terminology can be complicated (e.g. words that are unknown like “rasterization”, words that take on special meaning like “justify” in Word)
- Comments are important - allow localizers to comment on strings
- Don’t take away context (e.g. by sorting a list of strings alphabetically - terrible!)
- Keep internationalization/localization in mind from the start
- Don’t assume Left-To-Right layout - keep RTL a possibility from the outset
- Use UTF8
Hacking on the Fairphone 2
Kees Jongenburger
Freenode: #fairphone http://forum.fairphone.com
- Fairphone is the first modular phone on the market
- OS Alternatives
- Building your own android on Fairphone - http://code.fairphone.com/
- Sailfish OS/Jolla phone
- Google Free (?)
- Firefox OS
- Device not locked by default - easy to root
- Alternative cases from Shapeways etc.
- Software team - 6 white dudes
Comparing Codes of Conduct to Copyleft Licenses
Sumana Harihareswara
- Copyleft licenses restrict certain behaviors in order to make the community a better, more freer place
- Is a certain level of safety a prerequisite to liberty?
- Also, excluding certain groups of potential contributors has negative effects on the amount/quality of free software
- Differences between the two:
- CoCs generally deal with face-to-face contexts/events
- Does the restriction of behavior feel like a contract, or governance?
- Copyleft licenses feel like a contract - specific, relatively easy to determine if someone has complied
- Rules restraining (speech) acts feel more like governance - explicit about who gets to make/implement rules affecting a community (e.g. a committee to make policy decisions), takes work to decide who has complied
- Advantages of governance model over contract model:
- Contracts have to be self-contained, and are generally inflexible - the governance model enables more flexibility and adaptability over time
- Contracts have to enumerate cases of (non-)compliance, while governance model allows use of e.g. examples of cases of (non-)compliance without explicitly defining those as the all-and-only situations in which the code has been violated
- Advice for getting your community to implement CoCs
- If the community isn’t ready to accept the idea that the community itself is an artifact we construct together, try starting out with CoCs that apply to things that are already thought of as such artifacts (e.g. rules for wiki pages, slide decks etc.)
- Try to understand your own theory of change and that of the other(s)
- Maybe individual open source projects aren’t the right context in which to implement CoCs (what would a better context be?)
- In addition to the contract-governance spectrum, we should also consider the hospitality-liberty spectrum
Questions:
- Are there standard CoCs such that I can say “The license is GPL and the CoC is X”?
- Geek Feminism wiki CoC seems to be most popular
- Subfictional blog: like you should get stakeholders to check in/reach some consensus re the license, you should have them check in re the CoC - otherwise CoC can become superficial
Putting 8 Million People on the Map
Blake Girardot
Humanitarian OpenStreetMap Team (HOT)
- Many people (esp. in the “global south”) live in places that are not on any map, let alone open maps - e.g. Gueckedou, Guinea: center of Ebola epidemic, has 250,000 people but had virtually no map data before HOT effort
- HOT trains people to be able to map their own communities; e.g. points of interest (your home, the doctor), infrastructure, etc.
- Also does “remote mapping” - extract maps (vectorized data) of e.g. roads, buildings from aerial imagery
- Crisis mapping - no time to walk around with paper & pencil during a crisis
- Also important to map before a disaster happens, e.g. in areas susceptible to malaria outbreaks, flooding, …
- Turning vectorized data from remote mapping into useful maps: can see a building, but no idea what its for - ask volunteers to go around noting what the structures are
- OSM Tasking Manager (written in Python with Pyramids framework, uses SQLAlchemy/GISAlchemy ) - prioritizes areas for mapping, tracks what’s been validated, etc. - important for crisis situations. This tool integrates with OSM mapping tools (different tools, created by OSM community)
- OSM Export Tool - exports map data for use by e.g. disaster responders
- HOT was founded in response to the Haiti crisis in 2010 (? check that)
- In the Nepal earthquake 2015, the Kathmandu valley had already been mapped by a team (who realized that this type of data would be necessary in the future, as there were bound to be more earthquakes in Nepal), allowing the HOT team to focus on the more remote mountainous regions that were also affected
- OpenAerialMap (Node.js app) - collects aerial imagery from drones, balloons, kites, etc., makes index available for people to find the best possible aerial imagery
- Missing Maps - project founded by HOT and other organizations