OCaml Planet
The OCaml Planet aggregates various blogs from the OCaml community. If you would like to be added, read the Planet syndication HOWTO.





An in-depth Look at OCaml’s new “Best-fit” Garbage Collector Strategy — OCamlPro, Mar 23, 2020
The Garbage Collector probably is OCaml’s greatest unsung hero. Its pragmatic approach allows us to allocate without much fear of efficiency loss. In a way, the fact that most OCaml hackers know little about it is a good sign: you want a runtime to gracefully do its job without having to mind it all the time.
But as OCaml 4.10.0 has now hit the shelves, a very exciting feature is in the changelog:
- #8809, #9292: Add a best-fit allocator for the major heap; still
experimental, it should be…Read more...
The Garbage Collector probably is OCaml’s greatest unsung hero. Its pragmatic approach allows us to allocate without much fear of efficiency loss. In a way, the fact that most OCaml hackers know little about it is a good sign: you want a runtime to gracefully do its job without having to mind it all the time.
But as OCaml 4.10.0 has now hit the shelves, a very exciting feature is in the changelog:
- #8809, #9292: Add a best-fit allocator for the major heap; still
experimental, it should be much better than current allocation
policies (first-fit and next-fit) for programs with large heaps,
reducing both GC cost and memory usage.
This new best-fit is not (yet) the default; set it explicitly with
OCAMLRUNPARAM="a=2" (or Gc.set from the program). You may also want
to increase the `space_overhead` parameter of the GC (a percentage,
80 by default), for example OCAMLRUNPARAM="o=85", for optimal
speed.
(Damien Doligez, review by Stephen Dolan, Jacques-Henri Jourdan,
Xavier Leroy, Leo White)
At OCamlPro, some of the tools that we develop, such as the package manager Opam, the Alt-Ergo SMT solver or the Flambda optimizer, can be quite demanding in memory usage, so we were curious to better understand the properties of this new allocator.
Minor heap and Major heap: the GC in a nutshell
Not all values are allocated equal. Some will only be useful for the span of local calculations, some will last as long as the program lives. To handle those two kinds of values, the runtime uses a Generational Garbage Collector with two spaces:
- The minor heap uses the Stop-and-copy principle. It is fast but has to stop the computation to perform a full iteration.
- The major heap uses the Mark-and-sweep principle. It has the perk of being incremental and behaves better for long-lived data.
Allocation in the minor heap is straightforward and efficient: values are stored sequentially, and when there is no space anymore, space is emptied, surviving values get allocated in the major heap while dead values are just forgotten for free. However, the major heap is a bit more tricky, since we will have random allocations and deallocations that will eventually produce a scattered memory. This is called fragmentation, and this means that you’re using more memory than necessary. Thankfully, the GC has two strategies to counter that problem:
- Compaction: a heavyweight reallocation of everything that will remove those holes in our heap. OCaml’s compactor is cleverly written to work in constant space, and would be worth its own specific article!
- Free-list Allocation: allocating the newly coming data in the holes (the free-list) in memory, de-scattering it in the process.
Of course, asking the GC to be smarter about how it allocates data makes the GC slower. Coding a good GC is a subtle art: you need to have something smart enough to avoid fragmentation but simple enough to run as fast as possible.
Where and how to allocate: the 3 strategies
OCaml used to propose 2 free-list allocation strategies: next-fit, the default, and first-fit. Version 4.10 of OCaml introduces the new best-fit strategy. Let’s compare them:
Next-fit, the original and remaining champion
OCaml’s original (and default) “next-fit” allocating strategy is pretty simple:
- Keep a (circular) list of every hole in memory ordered by increasing addresses;
- Have a pointer on an element of that list;
- When an allocation is needed, if the currently pointed-at hole is big enough, allocate in it;
- Otherwise, try the next hole and so-on.
This strategy is extremely efficient, but a big hole might be fragmented with very small data while small holes stay unused. In some cases, the GC would trigger costly compactions that would have been avoidable.
First-fit, the unsuccessful contender
To counteract that problem, the “first-fit” strategy was implemented in 2008 (OCaml 3.11.0):
- Same idea as next-fit, but with an extra allocation table.
- Put the pointer back at the beginning of the list for each allocation.
- Use the allocation table to skip some parts of the list.
Unfortunately, that strategy is slower than the previous one. This is an example of making the GC smarter ends up making it slower. It does, however, reduce fragmentation. It was still useful to have this strategy at hand for the case where compaction would be too costly (on a 100Gb heap, for instance). An application that requires low latency might want to disable compaction and use that strategy.
Best-fit: a new challenger enters!
This leads us to the brand new “best-fit” strategy. This strategy is actually composite and will have different behaviors depending on the size of the data you’re trying to allocate.
- On small data (up to 32 words), segregated free lists will allow allocation in (mostly) constant time.
- On big data, a general best-fit allocator based on splay trees.
This allows for the best of the two worlds, as you can easily allocate your numerous small blocks in the small holes in your memory while you take a bit more time to select a good place for your big arrays.
How will best-fit fare? Let’s find out!
Try it!
First, let us remind you that this is still an experimental feature, which from the OCaml development team means “We’ve tested it thoroughly on different systems, but only for months and not on a scale as large as the whole OCaml ecosystem”.
That being said, we’d advise you don’t use it in production code yet.
Why you should try it
Making benchmarks of this new strategy could be beneficial for you and the language at large: the dev team is hoping for feedback, the more quality feedback you give means the more the future GC will be tuned for your needs.
In 2008, the first-fit strategy was released with the hope of improving memory usage by reducing fragmentation. However, the lack of feedback meant that the developers were not aware that it didn’t meet the users’ needs. If more feedback had been given, it’s possible that work on improving the strategy or on better strategies would have happened sooner.
Choosing the allocator strategy
Now, there are two ways to control the GC behavior: through the code or through environment variables.
First method: Adding instructions in your code
This method should be used by those of us who have code that already does some GC fine-tuning. As early as possible in your program, you want to execute the following lines:
let () =
Gc.(set
{ (get()) with
allocation_policy = 2; (* Use the best-fit strategy *)
space_overhead = 100; (* Let the major GC work a bit less since it's more efficient *)
})
You might also want to add verbose = 0x400; or verbose = 0x404; in order to get some GC debug information. See here for more details on how to use the GC module.
Of course, you’ll need to recompile your code, and this will apply only after the runtime has initialized itself, triggering a compaction in the process. Also, since you might want to easily switch between different allocation policies and overhead specifications, we suggest you use the second method.
Second method: setting $OCAMLRUNPARAM
At OCamlPro, we develop and maintain a program that any OCaml developer should want to run smoothly. It’s called Opam, maybe you’ve heard of it? Though most commands take a few seconds, some administrative-heavy commands can be a strain on our computer. In other words: those are perfect for a benchmark.
Here’s what we did to benchmark Opam:
$ opam update $ opam switch create 4.10.0 $ opam install opam-devel # or build your own code $ export OCAMLRUNPARAM='b=1,a=2,o=100,v=0x404' $ cd my/local/opam-repository $ perf stat ~/.opam/4.10.0/lib/opam-devel/opam admin check --installability # requires right to execute perf, time can do the trick
If you want to compile and run your own benchmarks, here are a few details on OCAMLRUNPARAM:
b=1means “print the backtrace in case of uncaught exception”a=2means “use best-fit” (default is0, first-fit is1)o=100means “do less work” (default is80, lower means more work)v=0x404means “have the gc be verbose” (0x400is “print statistics at exit”, 0x4 is “print when changing heap size”)
See the manual for more details on OCAMLRUNPARAM
You might want to compare how your code fares on all three different GC strategies (and fiddle a bit with the overhead to find your best configuration).
Our results on opam
Our contribution in this article is to benchmark opam with the different allocation strategies:
| Strategy: | Next-fit | First-fit | Best-fit | ||
| Overhead: | 80 | 80 | 80 | 100 | 120 |
| Cycles used (Gcycle) | 2,040 | 3,808 | 3,372 | 2,851 | 2,428 |
| Maximum heap size (kb) | 793,148 | 793,148 | 689,692 | 689,692 | 793,148 |
| User time (s) | 674 | 1,350 | 1,217 | 1,016 | 791 |
A quick word on these results. Most of opam‘s calculations are done by dose and rely heavily on small interconnected blocks. We don’t really have big chunks of data we want to allocate, so the strategy won’t give us the bonus you might have as it perfectly falls into the best-case scenario of the next-fit strategy. As a matter of fact, for every strategy, we didn’t have a single GC compaction happen. However, Best-fit still allows for a lower memory footprint!
Conclusions
If your software is highly reliant on memory usage, you should definitely try the new Best-fit strategy and stay tuned on its future development. If your software requires good performance, knowing if your performances are better with Best-fit (and giving feedback on those) might help you in the long run.
The different strategies are:
- Next-fit: generally good and fast, but has very bad worst cases with big heaps.
- First fit: mainly useful for very big heaps that must avoid compaction as much as possible.
- Best-fit: almost the best of both worlds, with a small performance hit for programs that fit well with next-fit.
Remember that whatever works best for you, it’s still better than having to malloc and free by hand. Happy allocating!
HideAbout OCamlPro
OCamlPro is a R&D lab founded in 2011, with the mission to help industrial users harness the OCaml state-of-the art programming language.
We design, create and implement custom ad-hoc software for our clients in state-of-the-art languages (OCaml, Rust…). We also have a long experience in developing and maintaining open-source tooling for OCaml, such as Opam and ocp-indent, and we contribute to the core-development of OCaml, notably with our work on the Flambda optimizer branch. Another area of expertise is that of Formal Methods, with tools such as our SMT Solver Alt-Ergo (check our Alt-Ergo Users’ Club). We also provide vocational trainings in OCaml and Rust, and we can build courses on formal methods on-demand. Do not hesitate reach out by email: contact@ocamlpro.com.





Sliding Tile Puzzle, Self-Contained OCaml Webapp — Psellos, Mar 21, 2020
I just finished coding up another webapp in OCaml. I thought it would be cool to publish the sources of a small, completely self-contained app. It’s a sliding tile puzzle coded entirely in OCaml, using a few of the BuckleScript extensions. There are no dependencies on any frameworks or the like aside from the Js modules of BuckleScript. The app itself consists of just one JavaScript file—no images, nothing else.
![]() |
You can try out the puzzle or get the sources a…
Read more...I just finished coding up another webapp in OCaml. I thought it would be cool to publish the sources of a small, completely self-contained app. It’s a sliding tile puzzle coded entirely in OCaml, using a few of the BuckleScript extensions. There are no dependencies on any frameworks or the like aside from the Js modules of BuckleScript. The app itself consists of just one JavaScript file—no images, nothing else.
![]() |
You can try out the puzzle or get the sources at the Slide24 page.
I also tried to make a clean DOM interface based on experience with the Cassino and Schnapsen apps. I think it came out well, at least for these self-contained webapps.
The idea for the DOM interface is that it should expose only abstract
types, but that there should be a subtype relation based on the
JavaScript DOM. It turns out that you can do this pretty easily using
private declarations. Types for contents of a document look like this:
type node
type text = private node
type element = private node
type canvas = private element
type box = private element
type div = private box
type span = private box
type button = private box
So, the interface reveals nothing whatsoever about the types except that
they participate in a subtype relation. text and element are
subtypes of node, canvas is a subtype of element, and so on.
node is the supertype of all the document content types.
What this means is that you can pass a canvas or a button to a function that expects an element, and so on. I don’t find the necessity to use explicit supertype coercion to be too much trouble when working with non-parameterized types.
let style = Pseldom.(element_style (mybutton :> element)) in
. . .
It seems to me this captures pretty much everything that I want from the object-oriented approach without the usual complicated baggage. I think inheritance is more often a hindrance than a help. It’s too dependent on implementation details, and is associated with too many informal (unenforceable) descriptions of how to write subclasses of each class without messing up the semantics.
I’ve never used private declarations before, but they seem to have
created exactly the structure I was hoping for.
Posted by: Jeffrey
Hide




New version of Try OCaml in beta! — OCamlPro, Mar 16, 2020
We are happy to announce that our venerable “Try Ocaml” service is being retired and replaced by a new, modern version based upon our work on Learn-OCaml.
→ Try it here ←
The new interface provides an editor panel besides the familiar top-level, error and warning positions highlighting, the latest OCaml release (4.10.0), local storage of your session, and more.
The service is still in beta, so it would be helpful if you could tell us about any hiccups you may encounter on the Dis…
Read more...We are happy to announce that our venerable “Try Ocaml” service is being retired and replaced by a new, modern version based upon our work on Learn-OCaml.
→ Try it here ←
The new interface provides an editor panel besides the familiar top-level, error and warning positions highlighting, the latest OCaml release (4.10.0), local storage of your session, and more.
The service is still in beta, so it would be helpful if you could tell us about any hiccups you may encounter on the Discuss thread.
Hide




Frama-Clang 0.0.8 is out. Download it here. — Frama-C, Mar 10, 2020





A reasonable TyXML release | Drup's thingies — Gabriel Radanne, Mar 06, 2020
I have the pleasure to announce the release of TyXML 4.4.0, with special Reason support!





Alt-Ergo Users’ Club Annual Meeting — OCamlPro, Mar 03, 2020
The second annual meeting of the Alt-Ergo Users’ Club was held in mid-February. Our annual meeting is the perfect place to review each partner’s needs regarding Alt-Ergo. This year, we had the pleasure of receiving our partners to discuss the roadmap for future Alt-Ergo developments and enhancements.
Read more...Alt-Ergo is an automatic mathematical formula checker, jointly developed by LRI and OCamlPro (since 2014). For more information or to join the club, visit https://alt-ergo.ocamlpro.com.
The second annual meeting of the Alt-Ergo Users’ Club was held in mid-February. Our annual meeting is the perfect place to review each partner’s needs regarding Alt-Ergo. This year, we had the pleasure of receiving our partners to discuss the roadmap for future Alt-Ergo developments and enhancements.
Alt-Ergo is an automatic mathematical formula checker, jointly developed by LRI and OCamlPro (since 2014). For more information or to join the club, visit https://alt-ergo.ocamlpro.com.
Our club has several goals, the first of which is to ensure the sustainability of Alt-Ergo by fostering collaboration among club members and strengthening collaboration with formal method communities such as Why3. One of our priorities is to increase our user community by extending our tool to new sectors such as Model Checking. Participation in international competitions is also a way to gain visibility. Finally, the last objective of the club is to find new projects or contracts for the development of long-term functionalities.
We thank all our members for their support and bid a warm welcome Mitsubishi Electric R&D Centre Europe as they join AdaCore and the CEA list as a member of the club this year. We would also like to highlight the Why3 development team with whom we are working to improve our tools.
Our members are particularly interested in:
- Better generation of models and counter-examples.
- The addition of sequence theory.
- The improvement of non-linear arithmetic support in Alt-Ergo.
These features are now top priorities for us. For more information, you can read our articles on this blog.
Hide




OCaml iOS Apps Ported to Browser — Psellos, Feb 24, 2020
Something like ten years ago we produced two iOS card game apps written in OCaml, partly as a proof of concept for compiling OCaml to iOS and partly because we enjoy card games. Unfortunately we weren’t able to spark a worldwide craze for writing iOS apps in OCaml or for playing Schnapsen, as we had hoped. Consequently there was very little financial return and we all had to move on to other projects.
Both apps play a very good two-player card game. The apps are essenti…
Read more...Something like ten years ago we produced two iOS card game apps written in OCaml, partly as a proof of concept for compiling OCaml to iOS and partly because we enjoy card games. Unfortunately we weren’t able to spark a worldwide craze for writing iOS apps in OCaml or for playing Schnapsen, as we had hoped. Consequently there was very little financial return and we all had to move on to other projects.
Both apps play a very good two-player card game. The apps are essentially a kind of solitaire where you play against the app as your opponent. The games are:
Cassino, a classic fishing game said without substantiation to be of Italian origin (per Wikipedia).
Schnapsen, a kind of miniature Pinochle very popular in the territories of the former Austro-Hungarian Empire (again per Wikipedia).
The Schnapsen app also plays a closely related game called Sixty-Six, named after this number in many different languages.
The images and layout algorithms for the apps quickly fell behind the formats and display capabilities of later iOS devices, so I was thinking we might just as well release the apps as is, for free. However there is a lot of rigamarole (and some cost) associated with the App Store if all you want to do is release some free apps.
|
Cassino |
|
Master Schnapsen/66 |
Recently I wondered if it wouldn’t be possible to revive the apps in a browser environment. These days you can compile OCaml to JavaScript using BuckleScript or Js_of_ocaml. The HTML 5 canvas element has an interface a lot like the two-dimensional graphics used by the apps. It seems like it should be possible.
So, in fact, that’s what I did. I ported the two card game apps to run as webapps. Visually they run in a smallish rectangle exactly the size of the original iPhone screen. I was able to retain the iPhone behavior almost unchanged. Computationally they run completely in the browser, and make no further contact with psellos.com (unless you want to access the game pages at psellos.com).
The OCaml code is compiled to JavaScript using the BuckleScript compiler. Because the target language is JavaScript, there’s no need for any stubs or supporting code (as there was in iOS). All of the code for the apps is in OCaml.
Once the basic graphics primitives were in place, a lot of the code worked without any change. The part of the code that actually plays the game (the “engine”) didn’t change at all.
As an unexpected and very welcome side effect, two of the old iOS app team members got interested in the project again. They’re working on making the Schnapsen app into an even better player. I added some extra features to the Schnapsen GUI to make it easier to keep track of what’s happened in a hand.
Neither of the webapps is quite finished yet. But they both are already playable and in fact quite enjoyable. You can try them by clicking on the icons above. You can also read about the apps and the games they play on their own separate pages.
If you have any comments, suggestions, or encouragement, leave them below or email me at jeffsco@psellos.com.
Posted by: Jeffrey
Hide




Watch all of Jane Street's tech talks — Jane Street, Feb 20, 2020
Jane Street has been posting tech talks from internal speakers and invited guests for years—and they’re all available on our YouTube channel:





Mercurial: prettyconfig extension — Marc Simpson, Feb 16, 2020
Mercurial: prettyconfig extension
Since the Bitbucket migration, I’ve found myself tinkering1 with Mercurial and its extensions system again (after a long hiatus).
One byproduct of this was a simple, single function extension for listing aliases in a user-friendly way. I subsequently realised that the same behaviour would be useful for arbitrary config sections (aliases, paths, schemes)… and so, the prettyconfig fork.
The prettyconfig …
Mercurial: prettyconfig extension
Since the Bitbucket migration, I’ve found myself tinkering1 with Mercurial and its extensions system again (after a long hiatus).
One byproduct of this was a simple, single function extension for listing aliases in a user-friendly way. I subsequently realised that the same behaviour would be useful for arbitrary config sections (aliases, paths, schemes)… and so, the prettyconfig fork.
The prettyconfig extension defines a single command, prettyconfig, for colourising and neatly displaying config values. Without arguments, all config name/value pairs are output, where names are qualified with section prefixes (akin to hg showconfig). Single sections are displayed with the -s (or --section) switch, e.g.:
$ hg prettyconfig -s alias # display aliases
$ hg prettyconfig --section alias # same thing
Colouring can be applied to the output by setting the following two labels in an .hgrc file (either globally or locally):
So far, I’ve found the following aliases2 useful enough to store globally in ~/.hgrc:
Hopefully others will find prettyconfig useful too. Feedback is welcome.
Perusing APIs, making old extensions Python 3 compatible, tweaking web templates, etc.↩
Note that the
aliasesalias effectively does the same thing as enabling hg-aliases (from which this extension was derived).↩
comments powered by Disqus Hide





Mercurial extensions (update) — Marc Simpson, Feb 05, 2020
Mercurial extensions (update)
In my previous post, I mentioned that a couple of old Mercurial extensions are archived on this server: hg-prettypaths and hg-persona.
Both have now been lightly tidied and updated to work with newer versions of Mercurial (tested on 4.5.3, 5.3).
comments powered by Disqus





2019 at OCamlPro — OCamlPro, Feb 04, 2020
OCamlPro was created to help OCaml and formal methods spread into the industry. We grew from 1 to 21 engineers, still strongly sharing this ambitious goal! The year 2019 at OCamlPro was very lively, with fantastic accomplishments all along!
Let’s quickly review the past years’ works, first in the world of OCaml (flambda2 & compiler optimisations, opam 2, our Rust-based UI for memprof, tools like tryOCaml, ocp-indent, and supporting the OCaml Software Foundation), then in the wor…
Read more...OCamlPro was created to help OCaml and formal methods spread into the industry. We grew from 1 to 21 engineers, still strongly sharing this ambitious goal! The year 2019 at OCamlPro was very lively, with fantastic accomplishments all along!
Let’s quickly review the past years’ works, first in the world of OCaml (flambda2 & compiler optimisations, opam 2, our Rust-based UI for memprof, tools like tryOCaml, ocp-indent, and supporting the OCaml Software Foundation), then in the world of formal methods (new versions of our SMT Solver Alt-Ergo, launch of the Alt-Ergo Users’ Club, the Love language, etc.).
Flambda & Compilation Team
* Work by Pierre Chambart, Vincent Laviron, Guillaume Bury and Pierrick Couderc

Pierre and Vincent’s considerable work on Flambda 2 (the optimizing intermediate representation of the OCaml compiler – on which inlining occurs), in close cooperation with Jane Street (Mark Shinwell, Leo White and their team) aims at overcoming some of flambda’s limitations. We have continued our work on making OCaml programs always faster: internal types are clearer, more concise, and possible control flow transformations are more flexible. Overall a precious outcome for industrial users. In 2019, the major breakthrough was to go from the initial prototype to a complete compiler, which allowed us to compile simple examples first and then to bootstrap it.
On the OCaml compiler side, we also worked with Leo on two new features: functorized compilation units and functorized packs, and recursive packs. The former will allow any developer to implement ‘.ml’ files as if they were functors and not simply modules, and more importantly generate packs that are real functors. As such, this allows to split big functors into several files or to parameterize libraries on other modules. The latter allows two distinct usages: recursive interfaces, to implement recursive types into distinct `.mlis` as long as they do not need any implementation; and recursive packs, whose components are typed and compiled as recursive modules.
- These new features are described on the new RFC repository for OCaml (a similar idea was suggested and implemented in 2011 by Fabrice Le Fessant).
- The implementation is available on GitHub for both functorized packs and recursive packs. Be aware that both are based on an old version of OCaml for now, but should be in sync with the current trunk in the near future.
- See also Vincent’s OCamlPro’s compiler team work update of August 2019.
This work is allowed thanks to Jane Street’s funding.
Work on a formalized type system for OCaml
* Work of Pierrick Couderc
At the end of 2018, Pierrick defended his PhD on “Checking type inference results of the OCaml language“, leading to a formalized type systems and semantics for a large subset of OCaml, or at least its unique typed intermediate language: the Typedtree. This work led us to work on new aspects of the OCaml compiler as recursive and functorized packs described earlier, and we hope this proves useful in the future for the evolution of the language.
The OPAM package manager
*Work of Raja Boujbel and Louis Gesbert
OPAM is maintained and developed at OCamlPro by Louis and Raja. Thanks to their thorough efforts the opam 2.1 first release candidate is soon to be published!
Back in 2018, the long-awaited opam 2.0 version was finally released. It embedded many changes, in opam itself as well as for the community. The opam file format was redefined to simplify and add new features. With the close collaboration of OCamlLabs and opam repository maintainers, we were able to manage a smooth transition of the repository and whole ecosystem from opam 1.2 format to the new – and richer – opam 2.0 format. Other emblematic features are e.g. for practically integrated mccs solver, sandboxing builds, for security issues (we care about your filesystem!), for usability reworking of the pin’ command, etc.
While the 2.1.0 version is in preparation, the 2.0.0 version is still updated with minor releases to fix issues. The lastest 2.0.6 release is fresh from January.
In the meantime, we continued to improve opam by integrating some opam plugins (opam lock, opam depext), recursively discover opam files in the file tree when pinning, new definition of a switch compiler, the possibility to use z3 backend instead of mccs, etc.
All these new features – among others – will be integrated in the 2.1.0 release, that is betaplanned for February. The best is yet to come!
- More details: on http://opam.ocaml.org
- Releases on Releases on https://github.com/ocaml/opam/
releases & our blog
This work is allowed thanks to Jane Street’s funding.
Encouraging OCaml adoption
OCaml Expert trainings for professional programmers
We proposed in 2019 some OCaml expert training specially designed for developers who want to use advanced features and master all the open-source tools and libraries of OCaml.
The “Expert” OCaml course is for already experienced OCaml programmers to better understand advanced type system possibilities (objects, GADTs), discover GC internals, write “compiler-optimizable” code. These sessions are also an opportunity to come discuss with our OPAM & Flambda lead developers and core contributors in Paris.
- Next session: 3-4 March 2020, Paris (registration)
Our cheat-sheets on OCaml, the stdlib and opam
* Work of Thomas Blanc, Raja Boujbel and Louis Gesbert
Thomas announced the release of our up-to-date cheat-sheets for the OCaml language, standard library and opam. Our original cheat-sheets were dating back to 2011. This was an opportunity to update them after the many changes in the language, library and ecosystem overall.
Cheat-sheets are helpful to refer to, as an overview of the documentation when you are programming, especially when you’re starting in a new language. They are meant to be printed and pinned on your wall, or to be kept in handy on a spare screen. They come in handy when your rubber duck is rubbish at debugging your code!
- More details on Thomas’ blog post
Open Source Tooling and Web IDEs
And let’s not forget the other tools we develop and maintain! We have tools for education such as our interactive editor OCaml-top and Try-OCaml (from the previous work on the learn-OCaml platform for the OCaml Fun MOOC) which you can use to code in your browser. Developers will appreciate tools like our indentation tool ocp-indent, and ocp-index which gives you easy access to the interface information of installed OCaml libraries for editors like Emacs and Vim.
Supporting the OCaml Software Foundation
OCamlPro was proud to be one of the first supporters of the new Inria’s OCaml Software Foundation. We keep committed to the adoption of OCaml as an industrial language:
“[…] As a long-standing supporter of the OCaml language, we have always been committed to helping spread OCaml’s adoption and increase the accessibility of OCaml to beginners and students. […] We value close and friendly collaboration with the main actors of the OCaml community, and are proud to be contributing to the OCaml language and tooling.” (August 2019, Advisory Board of the OCSF, ICFP Berlin)
- More information on the OCaml Software Foundation
In the World of Formal Methods
* By Mohamed Iguernlala, Albin Coquereau, Guillaume Bury
In 2018, we welcomed five new engineers with a background in formal methods. They consolidate the department of formal methods at OCamlPro, in particular help develop and maintain our SMT solver Alt-Ergo.
Release of Alt-Ergo 2.3.0, and version 2.0.0 (free)
After the release of Alt-Ergo 2.2.0 (with a new front-end that supports the SMT-LIB 2 language, extended prenex polymorphism, implemented as a standalone library) came the version 2.3.0 in 2019 with new features : dune support, ADT / algebraic datatypes, improvement of the if-then-else and let-in support, improvement of the data types.
- More information on the Alt-Ergo SMT Solver
- Albin Coquereau defended his PhD thesis in Decembre 2019 “Improving performance of the SMT solver Alt-Ergo with a better integration of efficient SAT solver”
- We participated in the SMT-COMP 2019 during the 22nd SAT conference. The results of the competition are detailed here.
The launch of the Alt-Ergo Users’ Club
Getting closer to our users, gathering industrial and academic supporters, collecting their needs into the Alt-Ergo roadmap is key to Alt-Ergo’s development and sustainability.
The Alt-Ergo Users’ Club was officially launched beginning of 2019. The first yearly meeting took place in February 2019. We were happy to welcome our first members Adacore, CEA List, Trust-In-Soft, and now Mitsubishi MERCE.
- More information on the Alt-Ergo Users’ Club
Harnessing our language-design expertise: Love

* Work by David Declerck & Steven de Oliveira
Following the launch of Dune network, the Love language for smart-contracts was born from the collaboration of OCamlPro and Origin Labs. This new language, whose syntax is inspired from OCaml and Liquidity, is an alternative to the Dune native smart contract language Michelson. Love is based on system-F, a type system requiring no type inference and allowing polymorphism. The language has successfully been integrated on the network and the first smart contracts are being written.
- LOVE: a New Smart Contract Language for the Dune Network
- The Love Smart Contract Language: Introduction & Key Features — Part I
Rust-related activities
The OCaml & Rust combo should be a candidate for any ambitious software project!
- A Rust-based UI for memprof: we started in 2019 to work in collaboration with the memprof developer team on a Rust based UI for memprof. See Pierre and Albin’s exposé at the JFLA2020‘s “Gardez votre mémoire fraiche avec Memthol” (Pierre Chambart , Albin Coquereau and Jacques-Henri Jourdan)
- Rust training : Rust borrows heavily from functional programming languages to provide very expressive abstraction mechanisms. Because it is a systems language, these mechanisms are almost always zero-cost. For instance, polymorphic code has no runtime cost compared to a monomorphic version.This concern for efficiency also means that Rust lets developers keep a very high level of control and freedom for optimizations. Rust has no Garbage Collection or any form of runtime memory inspection to decide when to free, allocate or re-use memory. But because manual memory management is almost universally regarded as dangerous, or at least very difficult to maintain, the Rust compiler has a borrow-checker which is responsible for i) proving that the input program is memory-safe (and thread-safe), and ii) generating a safe and “optimal” allocation/deallocation strategy. All of this is done at compile-time.
- Next sessions: April 20-24th 2020 (registration)
OCamlPro around the world
OCamlPro’s team members attended many events throughout the world:
- ICFP 2019 (Berlin)
- The JFLA’2019 (Les Rousses, Haut-Jura)
- The POSS’2019 (Paris)
- MirageOS Retreat (Marrakech)
As a committed member of the OCaml ecosystem’s animation, we’ve organized OCaml meetups too (see the famous OUPS meetups in Paris!).
Now let’s jump into the new year 2020, with a team keeping expanding, and new projects ahead: keep posted!
Past projects: blockchain-related achievements (2018-beginning of 2019)
Many people ask us about what happened in 2018! That was an incredibly active year on blockchain-related achievements, and at that time we were hoping to attract clients that would be interested in our blockchain expertise.
But that is history now! Still interested? Check the Origin Labs team and their partner The Garage on Dune Network!
For the record:
- (April 2019) We had started Techelson: a testing framework for Michelson and Liquidity
- (Nov 2018) An Introduction to Tezos RPCs: Signing Operations / An Introduction to Tezos RPCs: a Basic Wallet / Liquidity Tutorial: A Game with an Oracle for Random Numbers / First Open-Source Release of TzScan
- (Oct 2018) OCamlPro’s TZScan grant proposal accepted by the Tezos Foundation – joint press release
- (Jul 2018) OCamlPro’s Tezos block explorer TzScan’s last updates
- (Feb 2018) Release of a first version of TzScan.io, a Tezos block explorer / OCamlPro’s Liquidity-lang demo at JFLA2018 – a smart-contract design language . We were developing Liquidity, a high level smart contract language, human-readable, purely functional, statically-typed, which syntax was very close to the OCaml syntax.
- To garner interest and adoption, we also developed the online editor Try Liquidity. Smart-contract developers could design contracts interactively, directly in the browser, compile them to Michelson, run them and deploy them on the alphanet network of Tezos. Future plans included a full-fledged web-based IDE for Liquidity. Worth mentioning was a neat feature: decompiling a Michelson program back to its Liquidity version, whether it was generated from Liquidity code or not.





Bitbucket repository migration — Marc Simpson, Feb 03, 2020
Bitbucket repository migration
Since Bitbucket are discontinuing Mercurial support in a few months’ time (see here), I’ve started migrating a few old repositories:
- hugo-unix is now hosted on GitHub
- Extensions hg-prettypaths and hg-persona are now archived on this server.
- Note that hg-prettypaths is deprecated (broken in newer versions of Mercurial).
- hg-persona h…
Bitbucket repository migration
Since Bitbucket are discontinuing Mercurial support in a few months’ time (see here), I’ve started migrating a few old repositories:
- hugo-unix is now hosted on GitHub
- Extensions hg-prettypaths and hg-persona are now archived on this server.
- Note that hg-prettypaths is deprecated (broken in newer versions of Mercurial).
- hg-persona has been updated to work with hg ≥4.3.
comments powered by Disqus Hide





Troubleshooting systemd with SystemTap — Jane Street, Feb 03, 2020
When we set up a schedule on a computer, such as a list of commands to run every day at particular times via Linux cron jobs, we expect that schedule to execute reliably. Of course we’ll check the logs to see whether the job has failed, but we never question whether the cron daemon itself will function. We always assume that it will, as it always has done; we are not expecting mutiny in the ranks of the operating system.





Ocsigen Start updated — Ocsigen project (The Ocsigen Team), Jan 20, 2020
New release: Ocsigen Start 2.15
Ocsigen Start is a template for client-server Web and/or mobile app in OCaml or ReasonML. It contains many standard features like user management, notifications, and many code examples. Use it to learn Web/mobile development in OCaml or to quickly create your own Minimum Viable Product. See an online demo.
Last features include:
- Get rid of all remaining Camlp4 code
- compatibility with OCaml 4.09





Ocsigen Start updated — Ocsigen blog (The Ocsigen Team), Jan 20, 2020
New release: Ocsigen Start 2.15
Ocsigen Start is a template for client-server Web and/or mobile app in OCaml or ReasonML. It contains many standard features like user management, notifications, and many code examples. Use it to learn Web/mobile development in OCaml or to quickly create your own Minimum Viable Product. See an online demo.
Last features include:
- Get rid of all remaining Camlp4 code
- compatibility with OCaml 4.09





opam 2.0.6 release — OCamlPro, Jan 16, 2020
We are pleased to announce the minor release of opam 2.0.6.
This new version contains some small backported fixes and build update:
- Don’t remove git cache objects that may be used [#3831 @AltGr]
- Don’t include .gitattributes in index.tar.gz [#3873 @dra27]
- Update FAQ uri [#3941 @dra27]
- Lock: add warning in case of missing locked file [#3939 @rjbou]
- Directory tracking: fix cached entries retrieving with precise tracking [#4038 @hannesm]
- Build:
We are pleased to announce the minor release of opam 2.0.6.
This new version contains some small backported fixes and build update:
- Don’t remove git cache objects that may be used [#3831 @AltGr]
- Don’t include .gitattributes in index.tar.gz [#3873 @dra27]
- Update FAQ uri [#3941 @dra27]
- Lock: add warning in case of missing locked file [#3939 @rjbou]
- Directory tracking: fix cached entries retrieving with precise tracking [#4038 @hannesm]
- Build:
- Shell:
- Sandbox:
- opam-devel file: avoid copying extraneous files in opam-devel example [#3999 @maroneze]
As sandbox scripts have been updated, don’t forget to run opam init --reinit -ni to update yours.
Note: To homogenise macOS name on system detection, we decided to keep
macos, and convertdarwintomacosin opam. For the moment, to not break jobs & CIs, we keep uploadingdarwin&macosbinaries, but from the 2.1.0 release, onlymacosones will be kept.
Installation instructions (unchanged):
- From binaries: run
sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)or download manually from the Github “Releases” page to your PATH. In this case, don’t forget to run
opam init --reinit -nito enable sandboxing if you had version 2.0.0~rc manually installed or to update you sandbox script. - From source, using opam:
opam update; opam install opam-devel(then copy the opam binary to your PATH as explained, and don’t forget to run
opam init --reinit -nito enable sandboxing if you had version 2.0.0~rc manually installed or to update you sandbox script) - From source, manually: see the instructions in the README.
We hope you enjoy this new minor version, and remain open to bug reports and suggestions.
HideNOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com. Please head to the latter for the comments!
If you need more opam in your life, print our cheat sheet!





opam 2.0.6 release — OCaml Platform (Raja Boujbel - OCamlPro, Louis Gesbert - OCamlPro), Jan 16, 2020
We are pleased to announce the minor release of opam 2.0.6.
This new version contains some small backported fixes and build update:
- Don't remove git cache objects that may be used [#3831 @AltGr]
- Don't include .gitattributes in index.tar.gz [#3873 @dra27]
- Update FAQ uri [#3941 @dra27]
- Lock: add warning in case of missing locked file [#3939 @rjbou]
- Directory tracking: fix cached entries retrieving with precise tracking [#4038 @hannesm]
- Build:
We are pleased to announce the minor release of opam 2.0.6.
This new version contains some small backported fixes and build update:
- Don't remove git cache objects that may be used [#3831 @AltGr]
- Don't include .gitattributes in index.tar.gz [#3873 @dra27]
- Update FAQ uri [#3941 @dra27]
- Lock: add warning in case of missing locked file [#3939 @rjbou]
- Directory tracking: fix cached entries retrieving with precise tracking [#4038 @hannesm]
- Build:
- Shell:
- Sandbox:
- opam-devel file: avoid copying extraneous files in opam-devel example [#3999 @maroneze]
As sandbox scripts have been updated, don't forget to run opam init --reinit -ni to update yours.
Note: To homogenise macOS name on system detection, we decided to keep
macos, and convertdarwintomacosin opam. For the moment, to not break jobs & CIs, we keep uploadingdarwin&macosbinaries, but from the 2.1.0 release, onlymacosones will be kept.
Installation instructions (unchanged):
From binaries: run
sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)or download manually from the Github "Releases" page to your PATH. In this case, don't forget to run
opam init --reinit -nito enable sandboxing if you had version 2.0.0~rc manually installed or to update you sandbox script.From source, using opam:
opam update; opam install opam-devel(then copy the opam binary to your PATH as explained, and don't forget to run
opam init --reinit -nito enable sandboxing if you had version 2.0.0~rc manually installed or to update you sandbox script)From source, manually: see the instructions in the README.
We hope you enjoy this new minor version, and remain open to bug reports and suggestions.
HideNOTE: this article is cross-posted on opam.ocaml.org and ocamlpro.com. Please head to the latter for the comments!





Hackers and climate activists join forces in Leipzig — MirageOS (Damien Leloup), Jan 08, 2020
Hackers and climate activists join forces in Leipzig
By Damien Leloup, special correspondent, Le Monde. Originally published by Le Monde on December 30, 2019. English translation by the MirageOS Core Team.
The Chaos Communication Congress, the world's largest self-managed event dedicated to IT security, hosted its 36th edition this weekend in Germany.
In front of Leipzig station, around fifty students and high school students are gathered. It's Sunday, but the local branch of the Fridays for F…
Read more...Hackers and climate activists join forces in Leipzig
By Damien Leloup, special correspondent, Le Monde. Originally published by Le Monde on December 30, 2019. English translation by the MirageOS Core Team.
The Chaos Communication Congress, the world's largest self-managed event dedicated to IT security, hosted its 36th edition this weekend in Germany.
In front of Leipzig station, around fifty students and high school students are gathered. It's Sunday, but the local branch of the Fridays for Future movement, which organizes demonstrations on Fridays at the call of activist Greta Thunberg, made an exception to its usual calendar to take advantage of the presence, a few kilometers from there, of the Chaos Communication Congress (CCC).
Organized each year for thirty-six years, this gigantic gathering of hackers and activists - 18,000 participants, dozens of conference talks around the clock - invades the Leipzig convention center for four days, in an atmosphere that is both anarcho-libertarian and very family oriented. For the first time, the slogan of the 2019 edition is devoted to the environment: “Resource exhaustion”, a reference both to a computer attack technique and to the preservation of the planet.
"It makes sense: it's a major issue, and the environmental movement is a highlight of the year, isn't it?", notes, hair dyed pink and megaphone in hand, Rune, one of the organizers of the event. “In any case, we are very happy that the CCC opened its doors to us and supports us."
The conference program gave pride of place to ecological themes and organizations such as Fridays for Future or Extinction Rebellion. These themes were all features in the main talks. The audience for the event, marked on the far left, is quite sensitive to environmental issues. Fridays for Future's review of the year was sold out; the track where some scientists explained how they build their climate models was full and was not able to host all the attendees.
Safety of power plants and the right to repair
But if the CCC has given a lot of space to environmental issues, it has done it in its own way. In this mecca of cyber-security, we could for example discover long lists of vulnerabilities affecting the on-board systems used to manage the turbines of power plants. Do not panic: "These flaws do not block a power plant, or cut the power of a city," relativized Radu Motspan, the author of the study. Some of them have been corrected; for others, it is up to plant managers to carry out verifications. The researcher and his associates produced a small turnkey guide to help them: “No need to hire expensive consultants, you can do everything yourself."
This “do it yourself” spirit, omnipresent in hacker culture in general and at the CCC in particular, easily lends itself to an environmental translation. The collective Runder Tisch Reparatur, which campaigns for the adoption at European level of a "right to repair", was thus invited for the first time to the conference. The philosophy of the movement, which aims above all to reduce the amount of waste produced by obsolescence whether or not it is programmed, is very similar to that of the free software movement, say Eduard and Erik, who run the stand of the association. "An object that you cannot repair does not really belong to you," they believe, just as the promoters of free software believe that software that you cannot modify yourself deprives you of certain freedoms.
But the main issue, at the heart of many talks during the four days of the CCC, is that of the energy impact of the Internet. No one in the aisles of the Leipzig congress center plans to reduce their use of the Internet; but everyone concedes that the network consumes a lot of electricity unnecessarily, or uses too much fossil energy. "There are simple things to do to improve the carbon footprint of a site or service," said Chris Adams, environmental activist and member of the Green Web Foundation. “If your service uses Amazon Web Service [AWS, a very popular cloud service], you can choose the data center you want to use, for example. The one assigned to you by default may be in a country that produces little renewable electricity and uses a lot of coal for its power plants… ”
Existing non-digital systems (like boilers) already have ways to function more efficiently, such as off-peak hours at night, when electricity is both cheaper and more eco-friendly. There are equivalent options for more modern, digital systems, for instance: Chris Adams advocates the use of the Low Carbon Kubernetes Scheduler, an orchestration tool which allows to optimise the power consumption of a server in order to reduce its environmental impact.
Safety and minimum consumption, get the best of both worlds
Despite everything, the “greenest” electricity remains that which we do not consume in the first place. There too, promising solutions exist: Hannes Mehnert, a German computer scientist, presented at the opening of the CCC the MirageOS project, an operating system for ultra-minimalist servers, coded in a language renowned for its lightness, and which runs each process in a dedicated virtual machine. A radical approach - and reserved for connoisseurs - which allows the software to embed only the bare minimum of lines of code in each compiled version. "Reducing complexity mathematically reduces the number of calculation operations required," explains Mehnert. Result: "A carbon footprint that drops drastically, with ten times less computing power used by the processor, and up to twenty-five times less memory used", according to his measurements.
Above all, and this is a strong argument at a conference dedicated to computer security, minimalism is also a real advantage in terms of potential vulnerabilities: the more compact the code is, the less it risks containing flaws or errors. And such systems are also better protected and safer than more conventional systems, as they are not vulnerable to memory-safety issues.
But the collaboration between environmentalists and privacy advocates seemed much broader than just focusing on technical issues. That mix was everywhere: for instance the corridor walls had graffiti concerned with the the excessive consumption of CO2 on the planet close to others highlighting the fact that every human being generates, on average, 1.7 MB of data per second. The posters of the anarchist or anti-fascist movements were also mixed with the flyers of the collective Hackers against Climate Change, which attracted the curious with a joke typical of the place: "cli / mate crisis is real", in reference to Club-Mate, an ubiquitous drink at the event.
This community of views between hackers and climate activists comes as little surprise in Germany, where both movements are very present, and even less in Leipzig, the flagship city of the former GDR where pre-Internet mass surveillance tools from the Stasi were also directed against environmental activists in the 1980s. Some environmental movements feel naturally close to the anarchist spirit of the German hackers of the Chaos Computer Club, which organizes the CCC: a talk by Extinction Rebellion detailed the security measures they took to ensure their privacy, and how they didn't depend on third-party tools from Facebook, Google or Amazon - responsible, in their eyes, both for mass surveillance and green washing.
However, in this atmosphere of global collaboration some questions remained unanswered. In some specific cases, better IT security can also consume more resources. A talk dedicated to encrypted messaging thus presented many tools which make it possible to reinforce the confidentiality of exchanges, but also require using more computing power, to encrypt or decrypt messages, or even request the sending of large quantities of data to scramble the origin or volume of a message. This first CCC under the sign of environmental protection did not really address this contradiction - pending, perhaps, a next edition?
Hide




Deploying authoritative OCaml-DNS servers as MirageOS unikernels — Hannes Mehnert (hannes), Dec 23, 2019
Goal
Have your domain served by OCaml-DNS authoritative name servers. Data is stored in a git remote, and let's encrypt certificates can be requested to DNS. This software is deployed since more than two years for several domains such as nqsb.io and robur.coop. This present the authoritative server side, and certificate library of the OCaml-DNS implementation formerly known as µDNS.
Prerequisites
You need to own a domain, and be able to delegate the name service to your own servers. You…
Read more...Goal
Have your domain served by OCaml-DNS authoritative name servers. Data is stored in a git remote, and let's encrypt certificates can be requested to DNS. This software is deployed since more than two years for several domains such as nqsb.io and robur.coop. This present the authoritative server side, and certificate library of the OCaml-DNS implementation formerly known as µDNS.
Prerequisites
You need to own a domain, and be able to delegate the name service to your own servers. You also need two spare public IPv4 addresses (in different /24 networks) for your name servers. A git server or remote repository reachable via git over ssh. Servers which support solo5 guests, and have the corresponding tender installed. A computer with opam (>= 2.0.0) installed.
Data preparation
Figure out a way to get the DNS entries of your domain in a "master file format", i.e. what bind uses.
This is a master file for the mirage domain, defining $ORIGIN to avoid typing the domain name after each hostname (use @ if you need the domain name only; if you need to refer to a hostname in a different domain end it with a dot (.), i.e. ns2.foo.com.). The default time to live $TTL is an hour (3600 seconds).
The zone contains a start of authority (SOA) record containing the nameserver, hostmaster, serial, refresh, retry, expiry, and minimum.
Also, a single name server (NS) record ns1 is specified with an accompanying address (A) records pointing to their IPv4 address.
git-repo> cat mirage
$ORIGIN mirage.
$TTL 3600
@ SOA ns1 hostmaster 1 86400 7200 1048576 3600
@ NS ns1
ns1 A 127.0.0.1
www A 1.1.1.1
git-repo> git add mirage && git commit -m initial && git push
Installation
On your development machine, you need to install various OCaml packages. You don't need privileged access if common tools (C compiler, make, libgmp) are already installed. You have opam installed.
Let's create a fresh switch for the DNS journey:
$ opam init
$ opam update
$ opam switch create udns 4.09.0
# waiting a bit, a fresh OCaml compiler is getting bootstrapped
$ eval `opam env` #sets some environment variables
The last command set environment variables in your current shell session, please use the same shell for the commands following (or run eval $(opam env) in another shell and proceed in there - the output of opam switch sohuld point to udns).
Validation of our zonefile
First let's check that OCaml-DNS can parse our zonefile:
$ opam install dns-cli #installs ~/.opam/udns/bin/ozone and other binaries
$ ozone <git-repo>/mirage # see ozone --help
successfully checked zone
Great. Error reporting is not great, but line numbers are indicated (ozone: zone parse problem at line 3: syntax error), lexer and parser are lex/yacc style (PRs welcome).
FWIW, ozone accepts --old <filename> to check whether an update from the old zone to the new is fine. This can be used as pre-commit hook in your git repository to avoid bad parse states in your name servers.
Getting the primary up
The next step is to compile the primary server and run it to serve the domain data. Since the git-via-ssh client is not yet released, we need to add a custom opam repository to this switch.
# git via ssh is not yet released, but this opam repository contains the branch information
$ opam repo add git-ssh git+https://github.com/roburio/git-ssh-dns-mirage3-repo.git
# get the `mirage` application via opam
$ opam install lwt mirage
# get the source code of the unikernels
$ git clone -b future https://github.com/roburio/unikernels.git
$ cd unikernels/primary-git
# let's build the server first as unix application
$ mirage configure --prng fortuna #--no-depext if you have all system dependencies
$ make depend
$ make
# run it
$ ./primary_git
# starts a unix process which clones https://github.com/roburio/udns.git
# attempts to parse the data as zone files, and fails on parse error
$ ./primary-git --remote=https://my-public-git-repository
# this should fail with ENOACCESS since the DNS server tries to listen on port 53
# which requires a privileged user, i.e. su, sudo or doas
$ sudo ./primary-git --remote=https://my-public-git-repository
# leave it running, run the following programs in a different shell
# test it
$ host ns1.mirage 127.0.0.1
ns1.mirage has address 127.0.0.1
$ dig any mirage @127.0.0.1
# a DNS packet printout with all records available for mirage
That's exciting, the DNS server serving answers from a remote git repository.
Securing the git access with ssh
Let's authenticate the access by using ssh, so we feel ready to push data there as well. The primary-git unikernel already includes an experimental ssh client, all we need to do is setting up credentials - in the following a RSA keypair and the server fingerprint.
# collect the RSA host key fingerprint
$ ssh-keyscan <git-server> > /tmp/git-server-public-keys
$ ssh-keygen -l -E sha256 -f /tmp/git-server-public-keys | grep RSA
2048 SHA256:a5kkkuo7MwTBkW+HDt4km0gGPUAX0y1bFcPMXKxBaD0 <git-server> (RSA)
# we're interested in the SHA256:yyy only
# generate a ssh keypair
$ awa_gen_key # installed by the make depend step above in ~/.opam/udns/bin
seed is pIKflD07VT2W9XpDvqntcmEW3OKlwZL62ak1EZ0m
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5b2cSSkZ5/MAu7pM6iJLOaX9tJsfA8DB1RI34Zygw6FA0y8iisbqGCv6Z94ZxreGATwSVvrpqGo5p0rsKs+6gQnMCU1+sOC4PRlxy6XKgj0YXvAZcQuxwmVQlBHshuq0CraMK9FASupGrSO8/dW30Kqy1wmd/IrqW9J1Cnw+qf0C/VEhIbo7btlpzlYpJLuZboTvEk1h67lx1ZRw9bSPuLjj665yO8d0caVIkPp6vDX20EsgITdg+cFjWzVtOciy4ETLFiKkDnuzHzoQ4EL8bUtjN02UpvX2qankONywXhzYYqu65+edSpogx2TuWFDJFPHgcyO/ZIMoluXGNgQlP awa@awa.local
# please run your own awa_gen_key, don't use the numbers above
The public key needs is in standard OpenSSH format and needs to be added to the list of accepted keys on your server - the exact steps depend on your git server, if you're running your own with gitosis, add it as new public key file and grant that key access to the data repository. If you use gitlab or github, you may want to create a new user account and with the generated key.
The private key is not displayed, but only the seed required to re-generate it, when using the same random number generator, in our case fortuna implemented by nocrypto - used by both awa_gen_key and primary_git. The seed is provided as command-line argument while starting primary_git:
# execute with git over ssh, authenticator from ssh-keyscan, seed from awa_gen_key
$ ./primary_git --authenticator=SHA256:a5kkkuo7MwTBkW+HDt4km0gGPUAX0y1bFcPMXKxBaD0 --seed=pIKflD07VT2W9XpDvqntcmEW3OKlwZL62ak1EZ0m --remote=ssh://git@<git-server>/repo-name.git
# started up, you can try the host and dig commands from above if you like
To wrap up, we now have a primary authoritative name server for our zone running as Unix process, which clones a remote git repository via ssh on startup and then serves it.
Authenticated data updates
Our remote git repository is the source of truth, if you need to add a DNS entry to the zone, you git pull, edit the zone file, remember to increase the serial in the SOA line, run ozone, git commit and push to the repository.
So, the primary_git needs to be informed of git pushes. This requires a communication channel from the git server (or somewhere else, e.g. your laptop) to the DNS server. I prefer in-protocol solutions over adding yet another protocol stack, no way my DNS server will talk HTTP REST.
The DNS protocol has an extension for notifications of zone changes (as a DNS packet), usually used between the primary and secondary servers. The primary_git accepts these notify requests (i.e. bends the standard slightly), and upon receival pulls the remote git repository, and serves the fresh zone files. Since a git pull may be rather excessive in terms of CPU cycles and network bandwidth, only authenticated notifications are accepted.
The DNS protocol specifies in another extension authentication (DNS TSIG) with transaction signatures on DNS packets including a timestamp and fudge to avoid replay attacks. As key material hmac secrets distribued to both the communication endpoints are used.
To recap, the primary server is configured with command line parameters (for remote repository url and ssh credentials), and serves data from a zonefile. If the secrets would be provided via command line, a restart would be necessary for adding and removing keys. If put into the zonefile, they would be publicly served on request. So instead, we'll use another file, still in zone file format, in the top-level domain _keys, i.e. the mirage._keys file contains keys for the mirage zone. All files ending in ._keys are parsed with the normal parser, but put into an authentication store instead of the domain data store, which is served publically.
For encoding hmac secrets into DNS zone file format, the DNSKEY format is used (designed for DNSsec). The bind software comes with dnssec-keygen and tsig-keygen to generate DNSKEY output: flags is 0, protocol is 3, and algorithm identifier for SHA256 is 163 (SHA384 164, SHA512 165). This is reused by the OCaml DNS library. The key material itself is base64 encoded.
Access control and naming of keys follows the DNS domain name hierarchy - a key has the form name._operation.domain, and has access granted to domain and all subdomains of it. Two operations are supported: update and transfer. In the future there may be a dedicated notify operation, for now we'll use update. The name part is ignored for the update operation.
Since we now embedd secret information in the git repository, it is a good idea to restrict access to it, i.e. make it private and not publicly cloneable or viewable. Let's generate a first hmac secret and send a notify:
$ dd if=/dev/random bs=1 count=32 | b64encode -
begin-base64 644 -
kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg=
====
[..]
git-repo> echo "personal._update.mirage. DNSKEY 0 3 163 kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg=" > mirage._keys
git-repo> git add mirage._keys && git commit -m "add hmac secret" && git push
# now we need to restart the primary git to get the git repository with the key
$ ./primary_git --seed=... # arguments from above, remote git, host key fingerprint, private key seed
# now test that a notify results in a git pull
$ onotify 127.0.0.1 mirage --key=personal._update.mirage:SHA256:kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg=
# onotify was installed by dns-cli in ~/.opam/udns/bin/onotify, see --help for options
# further changes to the hmac secrets don't require a restart anymore, a notify packet is sufficient :D
Ok, this onotify command line could be setup as a git post-commit hook, or run manually after each manual git push.
Secondary
It's time to figure out how to integrate the secondary name server. An already existing bind or something else that accepts notifications and issues zone transfers with hmac-sha256 secrets should work out of the box. If you encounter interoperability issues, please get in touch with me.
The secondary subdirectory of the cloned unikernels repository is another unikernel that acts as secondary server. It's only command line argument is a list of hmac secrets used for authenticating that the received data originates from the primary server. Data is initially transferred by a full zone transfer (AXFR), later updates (upon refresh timer or notify request sent by the primary) use incremental (IXFR). Zone transfer requests and data are authenticated with transaction signatures again.
Convenience by OCaml DNS is that transfer key names matter, and are of the form <primary-ip>.<secondary-ip>._transfer.domain, i.e. 1.1.1.1.2.2.2.2._transfer.mirage if the primary server is 1.1.1.1, and the secondary 2.2.2.2. Encoding the IP address in the name allows both parties to start the communication: the secondary starts by requesting a SOA for all domains for which keys are provided on command line, and if an authoritative SOA answer is received, the AXFR is triggered. The primary server emits notification requests on startup and then on every zone change (i.e. via git pull) to all secondary IP addresses of transfer keys present for the specific zone in addition to the notifications to the NS records in the zone.
$ cd ../secondary
$ mirage configure --prng fortuna
# make depend should not be needed since all packages are already installed by the primary-git
$ make
$ ./secondary
IP addresses and routing
Both primary and secondary serve the data on the DNS port (53) on UDP and TCP. To run both on the same machine and bind them to different IP addresses, we'll use a layer 2 network (ethernet frames) with a host system software switch (bridge interface service), the unikernels as virtual machines (or seccomp-sandboxed) via the solo5 backend. Using xen is possible as well. As IP address range we'll use 10.0.42.0/24, and the host system uses the 10.0.42.1.
The primary git needs connectivity to the remote git repository, thus on a laptop in a private network we need network address translation (NAT) from the bridge where the unikernels speak to the Internet where the git repository resides.
# on FreeBSD:
# configure NAT with pf, you need to have forwarding enabled
$ sysctl net.inet.ip.forwarding: 1
$ echo 'nat pass on wlan0 inet from 10.0.42.0/24 to any -> (wlan0)' >> /etc/pf.conf
$ service pf restart
# make tap interfaces UP on open()
$ sysctl net.link.tap.up_on_open: 1
# bridge creation, naming, and IP setup
$ ifconfig bridge create
bridge0
$ ifconfig bridge0 name service
$ ifconfig bridge0 10.0.42.1/24
# two tap interfaces for our unikernels
$ ifconfig tap create
tap0
$ ifconfig tap create
tap1
# add them to the bridge
$ ifconfig service addm tap0 addm tap1
Primary and secondary setup
Let's update our zone slightly to reflect the IP changes.
git-repo> cat mirage
$ORIGIN mirage.
$TTL 3600
@ SOA ns1 hostmaster 2 86400 7200 1048576 3600
@ NS ns1
@ NS ns2
ns1 A 10.0.42.2
ns2 A 10.0.42.3
# we also need an additional transfer key
git-repo> cat mirage._keys
personal._update.mirage. DNSKEY 0 3 163 kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg=
10.0.42.2.10.0.42.3._transfer.mirage. DNSKEY 0 3 163 cDK6sKyvlt8UBerZlmxuD84ih2KookJGDagJlLVNo20=
git-repo> git commit -m "udpates" . && git push
Ok, the git repository is ready, now we need to compile the unikernels for the virtualisation target (see other targets for further information).
# back to primary
$ cd ../primary-git
$ mirage configure -t hvt --prng fortuna # or e.g. -t spt (and solo5-spt below)
# installs backend-specific opam packages, recompiles some
$ make depend
$ make
[...]
$ solo5-hvt --net:service=tap0 -- primary_git.hvt --ipv4=10.0.42.2/24 --ipv4-gateway=10.0.42.1 --seed=.. --authenticator=.. --remote=ssh+git://...
# should now run as a virtual machine (kvm, bhyve), and clone the git repository
$ dig any mirage @10.0.42.2
# should reply with the SOA and NS records, and also the name server address records in the additional section
# secondary
$ cd ../secondary
$ mirage configure -t hvt --prng fortuna
$ make
$ solo5-hvt --net:service=tap1 -- secondary.hvt --ipv4=10.0.42.3/24 --keys=10.0.42.2.10.0.42.3._transfer.mirage:SHA256:cDK6sKyvlt8UBerZlmxuD84ih2KookJGDagJlLVNo20=
# an ipv4-gateway is not needed in this setup, but in real deployment later
# it should start up and transfer the mirage zone from the primary
$ dig any mirage @10.0.42.3
# should now output the same information as from 10.0.42.2
# testing an update and propagation
# edit mirage zone, add a new record and increment the serial number
git-repo> echo "foo A 127.0.0.1" >> mirage
git-repo> vi mirage <- increment serial
git-repo> git commit -m 'add foo' . && git push
$ onotify 10.0.42.2 mirage --key=personal._update.mirage:SHA256:kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg=
# now check that it worked
$ dig foo.mirage @10.0.42.2 # primary
$ dig foo.mirage @10.0.42.3 # secondary got notified and transferred the zone
You can also check the behaviour when restarting either of the VMs, whenever the primary is available the zone is synchronised. If the primary is down, the secondary still serves the zone. When the secondary is started while the primary is down, it won't serve any data until the primary is online (the secondary polls periodically, the primary sends notifies on startup).
Dynamic data updates via DNS, pushed to git
DNS is a rich protocol, and it also has builtin updates that are supported by OCaml DNS, again authenticated with hmac-sha256 and shared secrets. Bind provides the command-line utility nsupdate to send these update packets, a simple oupdate unix utility is available as well (i.e. for integration of dynamic DNS clients). You know the drill, add a shared secret to the primary, git push, notify the primary, and voila we can dynamically in-protocol update. An update received by the primary via this way will trigger a git push to the remote git repository, and notifications to the secondary servers as described above.
# being lazy, I reuse the key above
$ oupdate 10.0.42.2 personal._update.mirage:SHA256:kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg= my-other.mirage 1.2.3.4
# let's observe the remote git
git-repo> git pull
# there should be a new commit generated by the primary
git-repo> git log
# test it, should return 1.2.3.4
$ dig my-other.mirage @10.0.42.2
$ dig my-other.mirage @10.0.42.3
So we can deploy further oupdate (or nsupdate) clients, distribute hmac secrets, and have the DNS zone updated. The source of truth is still the git repository, where the primary-git pushes to. Merge conflicts and timing of pushes is not yet dealt with. They are unlikely to happen since the primary is notified on pushes and should have up-to-date data in storage. Sorry, I'm unsure about the error semantics, try it yourself.
Let's encrypt!
Let's encrypt is a certificate authority (CA), which certificate is shipped as trust anchor in web browsers. They specified a protocol for automated certificate management environment (ACME), used to get X509 certificates for your services. In the protocol, a certificate signing request (publickey and hostname) is sent to let's encrypt servers, which sends a challenge to proof the ownership of the hostnames. One widely-used way to solve this challenge is running a web server, another is to serve it as text record from the authoritative DNS server.
Since I avoid persistent storage when possible, and also don't want to integrate a HTTP client stack in the primary server, I developed a third unikernel that acts as (hidden) secondary server, performs the tedious HTTP communication with let's encrypt servers, and stores all data in the public DNS zone.
For encoding of certificates, the DANE working group specified TLSA records in DNS. They are quadruples of usage, selector, matching type, and ASN.1 DER-encoded material. We set usage to 3 (domain-issued certificate), matching type to 0 (no hash), and selector to 0 (full certificate) or 255 (private usage) for certificate signing requests. The interaction is as follows:
- Primary, secondary, and let's encrypt unikernels are running
- A service (
ocertify,unikernels/certificate, or thedns-certify.miragelibrary) demands a TLS certificate, and has a hmac-secret for the primary DNS - The service generates a certificate signing request with the desired hostname(s), and performs an nsupdate with TLSA 255 <DER encoded signing-request>
- The primary accepts the update, pushes the new zone to git, and sends notifies to secondary and let's encrypt unikernels which (incrementally) transfer the zone
- The let's encrypt unikernel notices while transferring the zone a signing request without a certificate, starts HTTP interaction with let's encrypt
- The let's encrypt unikernel solves the challenge, sends the response as update of a TXT record to the primary nameserver
- The primary pushes the TXT record to git, and notifies secondaries (which transfer the zone)
- The let's encrypt servers request the TXT record from either or both authoritative name servers
- The let's encrypt unikernel polls for the issued certificate and send an update to the primary TLSA 0 <DER encoded certificate>
- The primary pushes the certificate to git, notifies secondaries (which transfer the zone)
- The service polls TLSA records for the hostname, and use it upon retrieval
Note that neither the signing request nor the certificate contain private key material, thus it is fine to serve them publically. Please also note, that the service polls for the certificate for the hostname in DNS, which is valid (start and end date) certificate and uses the same public key, this certificate is used and steps 3-10 are not executed.
The let's encrypt unikernel does not serve anything, it is a reactive system which acts upon notification from the primary. Thus, it can be executed in a private address space (with a NAT). Since the OCaml DNS server stack needs to push notifications to it, it preserves all incoming signed SOA requests as candidates for notifications on update. The let's encrypt unikernel ensures to always have a connection to the primary to receive notifications.
# getting let's encrypt up and running
$ cd ../lets-encrypt
$ mirage configure -t hvt --prng fortuna
$ make depend
$ make
# run it
$ solo5-hvt --net:service=tap2 -- letsencrypt.hvt --keys=...
# test it
$ ocertify 10.0.42.2 foo.mirage
For actual testing with let's encrypt servers you need to have the primary and secondary deployed on your remote hosts, and your domain needs to be delegated to these servers. Good luck. And ensure you have backup your git repository.
As fine print, while this tutorial was about the mirage zone, you can stick any number of zones into the git repository. If you use a _keys file (without any domain prefix), you can configure hmac secrets for all zones, i.e. something to use in your let's encrypt unikernel and secondary unikernel. Dynamic addition of zones is supported, just create a new zonefile and notify the primary, the secondary will be notified and pick it up. The primary responds to a signed SOA for the root zone (i.e. requested by the secondary) with the SOA response (not authoritative), and additionally notifications for all domains of the primary.
Conclusion and thanks
This tutorial presented how to use the OCaml DNS based unikernels to run authoritative name servers for your domain, using a git repository as the source of truth, dynamic authenticated updates, and let's encrypt certificate issuing.
There are further steps to take, such as monitoring -- have a look at the monitoring branch of the opam repository above, and the future-robur branch of the unikernels repository above, which use a second network interface for reporting syslog and metrics to telegraf / influx / grafana. Some DNS features are still missing, most prominently DNSSec.
I'd like to thank all people involved in this software stack, without other key components, including git, irmin 2.0, nocrypto, awa-ssh, cohttp, solo5, mirage, ocaml-letsencrypt, and more.
If you want to support our work on MirageOS unikernels, please donate to robur. I'm interested in feedback, either via twitter, hannesm@mastodon.social or an issue on the data repository.
Hide




Reproducible MirageOS unikernel builds — Hannes Mehnert (hannes), Dec 16, 2019
Reproducible builds summit
I'm just back from the Reproducible builds summit 2019. In 2018, several people developing OCaml and opam and MirageOS, attended the Reproducible builds summit in Paris. The notes from last year on opam reproducibility and MirageOS reproducibility are online. After last years workshop, Raja started developing the opam reproducibilty builder orb, which I extended at and after this years summit. This year before and after the facilitated summit there were hacking …
Read more...Reproducible builds summit
I'm just back from the Reproducible builds summit 2019. In 2018, several people developing OCaml and opam and MirageOS, attended the Reproducible builds summit in Paris. The notes from last year on opam reproducibility and MirageOS reproducibility are online. After last years workshop, Raja started developing the opam reproducibilty builder orb, which I extended at and after this years summit. This year before and after the facilitated summit there were hacking days, which allowed further interaction with participants, writing some code and conduct experiments. I had this year again an exciting time at the summit and hacking days, thanks to our hosts, organisers, and all participants.
Goal
Stepping back a bit, first look on the goal of reproducible builds: when compiling source code multiple times, the produced binaries should be identical. It should be sufficient if the binaries are behaviourally equal, but this is pretty hard to check. It is much easier to check bit-wise identity of binaries, and relaxes the burden on the checker -- checking for reproducibility is reduced to computing the hash of the binaries. Let's stick to the bit-wise identical binary definition, which also means software developers have to avoid non-determinism during compilation in their toolchains, dependent libraries, and developed code.
A checklist of potential things leading to non-determinism has been written up by the reproducible builds project. Examples include recording the build timestamp into the binary, ordering of code and embedded data. The reproducible builds project also developed disorderfs for testing reproducibility and diffoscope for comparing binaries with file-dependent readers, falling back to objdump and hexdump. A giant test infrastructure with lots of variations between the builds, mostly using Debian, has been setup over the years.
Reproducibility is a precondition for trustworthy binaries. See why does it matter. If there are no instructions how to get from the published sources to the exact binary, why should anyone trust and use the binary which claims to be the result of the sources? It may as well contain different code, including a backdoor, bitcoin mining code, outputting the wrong results for specific inputs, etc. Reproducibility does not imply the software is free of security issues or backdoors, but instead of a audit of the binary - which is tedious and rarely done - the source code can be audited - but the toolchain (compiler, linker, ..) used for compilation needs to be taken into account, i.e. trusted or audited to not be malicious. I will only ever publish binaries if they are reproducible.
My main interest at the summit was to enhance existing tooling and conduct some experiments about the reproducibility of MirageOS unikernels -- a unikernel is a statically linked ELF binary to be run as Unix process or virtual machine. MirageOS heavily uses OCaml and opam, the OCaml package manager, and is an opam package itself. Thus, checking reproducibility of a MirageOS unikernel is the same problem as checking reproducibility of an opam package.
Reproducible builds with opam
Testing for reproducibility is achieved by taking the sources and compile them twice independently. Afterwards the equality of the resulting binaries can be checked. In trivial projects, the sources is just a single file, or originate from a single tarball. In OCaml, opam uses a community repository where OCaml developers publish their package releases to, but can also use custom repositores, and in addition pin packages to git remotes (url including branch or commit), or a directory on the local filesystem. Manually tracking and updating all dependent packages of a MirageOS unikernel is not feasible: our hello-world compiled for hvt (kvm/BHyve) already has 79 opam dependencies, including the OCaml compiler which is distribued as opam package. The unikernel serving this website depends on 175 opam packages.
Conceptually there should be two tools, the initial builder, which takes the latest opam packages which do not conflict, and exports exact package versions used during the build, as well as hashes of binaries. The other tool is a rebuilder, which imports the export, conducts a build, and outputs the hashes of the produced binaries.
Opam has the concept of a switch, which is an environment where a package set is installed. Switches are independent of each other, and can already be exported and imported. Unfortunately the export is incomplete: if a package includes additional patches as part of the repository -- sometimes needed for fixing releases where the actual author or maintainer of a package responds slowly -- these package neither the patches end up in the export. Also, if a package is pinned to a git branch, the branch appears in the export, but this may change over time by pushing more commits or even force-pushing to that branch. In PR #4040 (under discussion and review), also developed during the summit, I propose to embed the additional files as base64 encoded values in the opam file. To solve the latter issue, I modified the export mechanism to embed the git commit hash (PR #4055), and avoid sources from a local directory and which do not have a checksum.
So the opam export contains the information required to gather the exact same sources and build instructions of the opam packages. If the opam repository would be self-contained (i.e. not depend on any other tools), this would be sufficient. But opam does not run in thin air, it requires some system utilities such as /bin/sh, sed, a GNU make, commonly git, a C compiler, a linker, an assembler. Since opam is available on various operating systems, the plugin depext handles host system dependencies, e.g. if your opam package requires gmp to be installed, this requires slightly different names depending on host system or distribution, take a look at conf-gmp. This also means, opam has rather good information about both the opam dependencies and the host system dependencies for each package. Please note that the host system packages used during compilation are not yet recorded (i.e. which gmp package was installed and used during the build, only that a gmp package has to be installed). The base utilities mentioned above (C compiler, linker, shell) are also not recorded yet.
Operating system information available in opam (such as architecture, distribution, version), which in some cases maps to exact base utilities, is recorded in the build-environment, a separate artifact. The environment variable SOURCE_DATE_EPOCH, used for communicating the same timestamp when software is required to record a timestamp into the resulting binary, is also captured in the build environment.
Additional environment variables may be captured or used by opam packages to produce different output. To avoid this, both the initial builder and the rebuilder are run with minimal environment variables: only PATH (normalised to a whitelist of /bin, /usr/bin, /usr/local/bin and /opt/bin) and HOME are defined. Missing information at the moment includes CPU features: some libraries (gmp?, nocrypto) emit different code depending on the CPU feature.
Tooling
TL;DR: A build builds an opam package, and outputs .opam-switch, .build-hashes.N, and .build-environment.N. A rebuild uses these artifacts as input, builds the package and outputs another .build-hashes.M and .build-environment.M.
The command-line utility orb can be installed and used:
$ opam pin add orb git+https://github.com/hannesm/orb.git#active
$ orb build --twice --keep-build-dir --diffoscope <your-favourite-opam-package>
It provides two subcommands build and rebuild. The build command takes a list of local opam --repos where to take opam packages from (defaults to default), a compiler (either a variant --compiler=4.09.0+flambda, a version --compiler=4.06.0, or a pin to a local development version --compiler-pin=~/ocaml), and optionally an existing switch --use-switch. It creates a switch, builds the packages, and emits the opam export, hashes of all files installed by these packages, and the build environment. The flags --keep-build retains the build products, opam's --keep-build-dir in addition temporary build products and generated source code. If --twice is provided, a rebuild (described next) is executed after the initial build.
The rebuild command takes a directory with the opam export and build environment to build the opam package. It first compares the build-environment with the host system, sets the SOURCE_DATE_EPOCH and switch location accordingly and executes the import. Once the build is finished, it compares the hashes of the resulting files with the previous run. On divergence, if build directories were kept in the previous build, and if diffoscope is available and --diffoscope was provided, diffoscope is run on the diverging files. If --keep-build-dir was provided as well, diff -ur can be used to compare the temporary build and sources, including build logs.
The builds are run in parallel, as opam does, this parallelism does not lead to different binaries in my experiments.
Results and discussion
All MirageOS unikernels I have deployed are reproducible \o/. Also, several binaries such as orb itself, opam, solo5-hvt, and all albatross utilities are reproducible.
The unikernel range from hello world, web servers (e.g. this blog, getting its data on startup via a git clone to memory), authoritative DNS servers, CalDAV server. They vary in size between 79 and 200 opam packages, resulting in 2MB - 16MB big ELF binaries (including debug symbols). The unikernel opam repository contains some reproducible unikernels used for testing. Some work-in-progress enhancements are needed to achieve this:
At the moment, the opam package of a MirageOS unikernel is automatically generated by mirage configure, but only used for tracking opam dependencies. I worked on mirage PR #1022 to extend the generated opam package with build and install instructions.
As mentioned above, if locale is set, ocamlgraph needs to be patched to emit a (locale-dependent) timestamp.
The OCaml program crunch embeds a subdirectory as OCaml code into a binary, which we use in MirageOS quite regularly for static assets, etc. This plays in several ways into reproducibility: on the one hand, it needs a timestamp for its last_modified functionality (and adheres since June 2018 to the SOURCE_DATE_EPOCH spec, thanks to Xavier Clerc). On the other hand, it used before version 3.2.0 (released Dec 14th) hashtables for storing the file contents, where iteration is not deterministic (the insertion is not sorted), fixed in PR #51 by using a Map instead.
In functoria, a tool used to configure MirageOS devices and their dependencies, can emit a list of opam packages which were required to build the unikernel. This uses opam list --required-by --installed --rec <pkgs>, which uses the cudf graph (thanks to Raja for explanation), that is during the rebuild dropping some packages. The PR #189 avoids by not using the --rec argument, but manually computing the fixpoint.
Certainly, the choice of environment variables, and whether to vary them (as debian does) or to not define them (or normalise) while building, is arguably. Since MirageOS does neither support time zone nor internationalisation, there is no need to prematurely solving this issue. On related note, even with different locale settings, MirageOS unikernels are reproducible apart from an issue in ocamlgraph #90 embedding the output of date, which is different depending on LANG and locale (LC_*) settings.
Prior art in reproducible MirageOS unikernels is the mirage-qubes-firewall. Since early 2017 it is reproducible. Their approach is different by building in a docker container with the opam repository pinned to an exact git commit.
Further work
I only tested a certain subset of opam packages and MirageOS unikernels, mainly on a single machine (my laptop) running FreeBSD, and am happy if others will test reproducibility of their OCaml programs with the tools provided. There could as well be CI machines rebuilding opam packages and reporting results to a central repository. I'm pretty sure there are more reproducibility issues in the opam ecosystem. I developed an reproducible testing opam repository with opam packages that do not depend on OCaml, mainly for further tooling development. Some tests were also conducted on a Debian system with the same result. The variations, apart from build time, were using a different user, and different locale settings.
As mentioned above, more environment, such as the CPU features, and external system packages, should be captured in the build environment.
When comparing OCaml libraries, some output files (cmt / cmti / cma / cmxa) are not deterministic, but contain minimal diverge where I was not able to spot the root cause. It would be great to fix this, likely in the OCaml compiler distribution. Since the final result, the binary I'm interested in, is not affected by non-identical intermediate build products, I hope someone (you?) is interested in improving on this side. OCaml bytecode output also seems to be non-deterministic. There is a discussion on the coq issue tracker which may be related.
In contrast to initial plans, I did not used the BUILD_PATH_PREFIX_MAP environment variable, which is implemented in OCaml by PR #1515 (and followups). The main reasons are that something in the OCaml toolchain (I suspect the bytecode interpreter) needed absolute paths to find libraries, thus I'd need a symlink from the left-hand side to the current build directory, which was tedious. Also, my installed assembler does not respect the build path prefix map, and BUILD_PATH_PREFIX_MAP is not widely supported. See e.g. the Debian zarith package with different build paths and its effects on the binary.
I'm fine with recording the build path (switch location) in the build environment for now - it turns out to end up only once in MirageOS unikernels, likely by the last linking step, which hopefully soon be solved by llvm 9.0.
What was fun was to compare the unikernel when built on Linux with gcc against a built on FreeBSD with clang and lld - spoiler: they emit debug sections with different dwarf versions, it is pretty big. Other fun differences were between OCaml compiler versions: the difference between minor versions (4.08.0 vs 4.08.1) is pretty small (~100kB as human-readable output), while the difference between major version (4.08.1 vs 4.09.0) is rather big (~900kB as human-readable diff).
An item on my list for the future is to distribute the opam export, build hashes and build environment artifacts in a authenticated way. I want to integrate this as in-toto style into conex, my not-yet-deployed implementation of tuf for opam that needs further development and a test installation, hopefully in 2020.
If you want to support our work on MirageOS unikernels, please donate to robur. I'm interested in feedback, either via twitter, hannesm@mastodon.social or an issue on the data repository.
Hide




Using Python and OCaml in the same Jupyter notebook — Jane Street, Dec 16, 2019





Tarides wins the FIC 2020 startup award — Tarides (Céline Laplassotte), Dec 11, 2019
We are very excited to announce that Tarides has won an award from the International Cybersecurity Forum (FIC 2020).
Organized every year in Lille (France), the International Cybersecurity Forum has become the leading European event on cybersecurity and digital trust. Its main goal is to foster reflection and exchanges within the European cybersecurity ecosystem.
We are very happy to have won the "Coup de Coeur" Prize, which will bring great visibility to our technological innovations. It is als…
Read more...We are very excited to announce that Tarides has won an award from the International Cybersecurity Forum (FIC 2020).
Organized every year in Lille (France), the International Cybersecurity Forum has become the leading European event on cybersecurity and digital trust. Its main goal is to foster reflection and exchanges within the European cybersecurity ecosystem.
We are very happy to have won the "Coup de Coeur" Prize, which will bring great visibility to our technological innovations. It is also an opportunity for us to meet experts in the cybersecurity sector and to consider additional use-cases for our work. We would like to thank the CEIS for organising the event and the members of the jury for commending MirageOS and OSMOSE.
The next FIC will be held on the 28th, 29th and 30th of January 2020 in Lille. For more details about the FIC and to register, visit their website.
Hide





Deep-Learning the Hardest Go Problem in the World — Jane Street, Dec 06, 2019
Updates and a New Run





Frama-C 20.0 (Calcium) is out. Download it here. — Frama-C, Dec 04, 2019





MirageOS talk at the Paris Open Source Summit — Tarides (Céline Laplassotte), Dec 04, 2019
We are thrilled to have been selected by the Paris Open Source Summit committee to talk about “Secure-by-design IoT applications using MirageOS”.
The Paris Open Source Summit is an annual event where you can connect to open-source communities and learn from tech leaders, project committers and CTOs about the latest technical solutions, innovative uses and societal challenges of open digital technology.
Thomas Gazagnaire, Tarides CEO/CTO, will explain what makes MirageOS a good framework to b…
Read more...We are thrilled to have been selected by the Paris Open Source Summit committee to talk about “Secure-by-design IoT applications using MirageOS”.
The Paris Open Source Summit is an annual event where you can connect to open-source communities and learn from tech leaders, project committers and CTOs about the latest technical solutions, innovative uses and societal challenges of open digital technology.
Thomas Gazagnaire, Tarides CEO/CTO, will explain what makes MirageOS a good framework to build IoT applications and how we can use embedded devices running on ARMv8, ESP32 or RISC-V to run secure and end-to-end open-source infrastructure services such as VPN proxies and email servers. He will also highlight how this infrastructure will be used to form the basis of OSMOSE: a secure, distributed and privacy-preserving platform to write user-centric IoT applications.
MirageOS is a library operating system (using the MIT license) which enables the construction of unikernels: specialized services where the runtime binary contains only the necessary code for execution and no more. Unikernels have a drastically smaller attack surface than service deployments in traditional operating systems and could lead to 1000x less code for the full application stack. Moreover, as MirageOS is written in a memory safe language (OCaml), a full class of bugs related to memory corruption – representing 70% of the released CVEs in classic operating systems written in C – can no longer appear. These two properties combined (and more!) allow MirageOS to build “secure-by-design” applications where everything – from the high-level business logic to the low-level device drivers – has been designed to be as secure as possible.
To learn more about the project, attend the Paris Open Source Summit! The talk will take place during the 'Embedded & IOT' section at 14:50 – 15:20 on December 10th, 2019.
Hide





Testing OCaml releases with opamcheck — GaGallium (Florian Angeletti), Dec 03, 2019
I (Florian Angeletti) have started working at Inria Paris this August. A part of my new job is to help deal with the day-to-day care for the OCaml compiler, particularly during the release process. This blog post is a short glimpse into the life of an OCaml release.
OCaml and the opam repository
Currently, the song of the OCaml development process is a canon with two voices: a compiler release spends the first 6 months of its life as the trunk branch of the OCaml compiler git repository…
I (Florian Angeletti) have started working at Inria Paris this August. A part of my new job is to help deal with the day-to-day care for the OCaml compiler, particularly during the release process. This blog post is a short glimpse into the life of an OCaml release.
OCaml and the opam repository
Currently, the song of the OCaml development process is a canon with two voices: a compiler release spends the first 6 months of its life as the trunk branch of the OCaml compiler git repository. Then after those 6 first months, it is named and given a branch on its own. For instance, this happened on October 18 2019 for OCaml 4.10. Starting from this point, the branch is feature-frozen: only bug fixes are merged, whereas new feature development happens in trunk. Our objective is then to release the new version 3 months later. If we succeed, there are at most two branches active at the same time.
Dev
Dev
Dev
Dev
Dev
Dev
Bug Dev
Bug Dev
RCs Dev
Dev
Dev
Dev
Bug Dev
Bug Dev
RCs Dev
Dev
Dev
Dev
Bug
Bug
Rcs
However, the OCaml compiler does not live in isolation. It makes little point to release a new version of OCaml which is not compatible with other parts of the OCaml ecosystem.
The release cycle of OCaml 4.08 was particularly painful from this point of view: we refactored parts of the compiler API that were not previously versioned by ocaml-migrate-parsetree, making it more difficult to update. In turn, without a working version of ocaml-migrate-parsetree, ppxses could not be built, breaking all packages that depend on them. It took months to correct the issue. This slip of schedule affected the 4.09.0 release and can still be felt on the 4.11 schedule.
Catching knifes before the fall
Lesson learned, we need to test the packages in the opam repository more often. Two tools in current usage can automate such testing: opamcheck and opam-health-check.
The two tools attack the problem with a different angle. The opam-health-check monitoring tool is developed to check the coherency of the opam repository, for released OCaml versions.
In a complementary way, opamcheck was written by Damien Doligez to check how well new versions of the OCaml compiler fare in terms of building the packages in the opam repository.
A typical difference between opamcheck and opam-health-check is that opamcheck is biased towards newer versions of the compiler: if an opam package builds on the latest unreleased version of the compiler, we don’t need to test it with older compilers. After all, we are mostly interested in packages that are broken by the new release. The handful of packages that may be coincidentally fixed by an unreleased compiler are at most a curiosity; pruning those unlikely events saves us some precious time.
Since I started at Inria, in the midst of the first beta of OCaml 4.09.0, I have been working with opamcheck to monitor the state of the opam repository.
The aim here is twofold. First, we want to detect expected breakages that are just a sign that a package needs to be updated before the final release of the new compiler version. The earliest we catch those, the more time the maintainers have to patch their packages. Second, we want to detect unexpected compatibility issues and breakages.
One fun example of such unexpected compatibility issue appeared in the 4.09.0 release cycle. When I first used opamcheck to test the state of the first 4.09.0 beta, there was a quite problematic broken package: dune. This was quite stunning at first, because the 4.09.0 version of OCaml contained mostly bug fixes and small quality-of-life improvements. That was at least what I had told a few worried people a few days earlier…
So what was happening here? The issue stemmed from a small change of behaviour in presence of missing compiled interface (cmi) files : dune was relying on an unspecified OCaml compiler behaviour in such cases, and this behaviour had been altered by a mostly unrelated improvement in the typechecker.
This change of behaviour was patched and dune worked fine in the second beta release of 4.09.0. And this time, the next run of opamcheck confirmed that that 4.09.0 was a quiet release.
This is currently the main use of opamcheck: check the health status of the opam repository on unreleased version of OCaml before opam-health-check more extensive coverage takes the relay. One of our objectives for the future 4.10.0 release is to achieve a much more extensive test coverage, before the first beta.
Opam and the PRs
There is another possible use of opamcheck that is probably much more useful to the anxious OCaml developer: opamcheck can be used to check that a PR or an experimental branch does not break opam packages. A good example is #8900: this PR proposes to remove the special handling of abstract types defined inside the current module. This special case looks nice locally, but it allows to write some code which is valid if and only if it is located in the right module, without any possibility to correct this behaviour by making module signatures more precise.
It is therefore quite tempting to try to remove this special case from the typechecker, but is it reasonable?
This was another task for opamcheck. First, I added a new opamcheck option to easily check any pull request on the OCaml compiler. After some work, there was some good news: this pattern is mostly unused in the current opam repository.
Knowing whether there are any opam packages that rely on this feature is definitively a big help when taking these decisions.
Using opamcheck
So if you are a worried OCaml developer and want to test your fancy compiler PR on the anvil of the opam repositoy, what are the magical incantations?
One option is to download the docker image octachron/opamcheck with
docker pull octachron/opamcheck
Beware that the image weighs around 7 GiB. It is also possible to build and run opamcheck locally as detailed in the readme. However, in this short blog post, I will present the latter option. It has the advantages of being relatively lightweight in terms of configuration, and makes it easier to test your legions of PRs simultaneously (you don’t have legions of PRs, do you?)
Once the image is downloaded, there are three main ways to run it. If you want to compare several versions of the compiler (given as switch names), let’s say 4.05, and 4.08.1+flambda and 4.10.0+trunk, you can run:
docker run -v opamcheck:/app/log -p 8080:80 --name=opamcheck opamcheck run -online-summary=10 4.05.0 4.08.1+flambda 4.10.0
The -name option is the docker container name. The -p option maps the port 80 of the container to the port 8080 of the host, this is used to connect to the http server embedded in the image. Finally, the -v flag let us choose the location of opamcheck log directory in the host file system. (If you forget this option, a random docker volume name will be used.) Here, it will be at /var/lib/docker/volumes/opamcheck.
While opamcheck is runnning, its progress can be checked with either
sudo tail -f /var/lib/docker/volumes/opamcheck_log/_data/results
or by pointing a web browser to localhost:8080/fullindex.html. Note that the first summary is only generated after the OCaml compiler is built and all uninstallable packages have been discovered. On my machine, this rounds up at a 15 minutes wait before the first summary is generated. Later updates should be more frequent.
The result should look like this summary run for OCaml 4.10.0. The integer parameter in -online-summary=n corresponds to the update period for this html summary. If the option is not provided, the html summary is only built at the end of the run.
If you are more interested by testing a specific PR, for instance #8900, the prmode mode works better
docker run -p 80:8080 opamcheck --name=opamcheck prmode -online-summary=10 -pr 8900 4.09.0
This command tries to rebase the given PR on top of the given OCaml version (switch name); it fails immediately if the PR cannot be rebased; in this case you should use the latest trunk switch as base or use the branch option, described below. When possible, it is a good idea to use a released version as the base: with a released version, there are more unbroken packages to test your patch against.
If the branch that you want to test is not yet a PR, or needs some manual rebasing to be compared against a specific compiler version, there is a -branch flag. For instance, let’s say that you have a branch my_very_experimental_branch at the location nowhere.org. You can run
docker run -p 80:8080 opamcheck --name=opamcheck prmode -online-summary=10 -branch https://nowhere.org:my_very_experimental_branch 4.09.0
This command downloads the branch at nowhere.org and compare it against the 4.09.0 switch.
Currently, a full run of opamcheck takes one or two days: you will likely get the results before your first PR review. A limitation is the false positive rate: most opam package descriptions are incomplete or out of date, so packages will fail for reasons unrelated to your PR. Unfortunately, this means that there is still some manual triage needed after an opamcheck run.
There are four main objectives for opamcheck in the next months:
- improve its usability
- share more code with opam-health-check, at least on the frontend
- reduce the false positive rate
- reduce the time required by a CI run
If you want to check on future development for opamcheck, and a potentially more up-to-date readme, you can have a look at Octachron/opamcheck.
Hide




Introducing the GraphQL API for Irmin 2.0 — Tarides (Andreas Garnaes), Nov 27, 2019
With the release of Irmin 2.0.0, we are happy to announce a new package - irmin-graphql, which can be used to serve data from Irmin over HTTP. This blog post will give you some examples to help you get started, there is also a section in the irmin-tutorial with similar information. To avoid writing the same thing twice, this post will cover the basics of getting started, plus a few interesting ideas for queries.
Getting the irmin-graphql server running from the command-line is easy:
$ irmin grap…With the release of Irmin 2.0.0, we are happy to announce a new package - irmin-graphql, which can be used to serve data from Irmin over HTTP. This blog post will give you some examples to help you get started, there is also a section in the irmin-tutorial with similar information. To avoid writing the same thing twice, this post will cover the basics of getting started, plus a few interesting ideas for queries.
Getting the irmin-graphql server running from the command-line is easy:
$ irmin graphql --root=/tmp/irminwhere /tmp/irmin is the actual path to your repository. This will start the server on localhost:8080, but it's possible to customize this using the --address and --port flags.
The new GraphQL API has been added to address some of the shortcomings that have been identified with the old HTTP API, as well as enable a number of new features and capabilities.
GraphQL
GraphQL is a query language for exposing data as a graph via an API, typically using HTTP as a transport. The centerpiece of a GraphQL API is the schema, which describes the graph in terms of types and relationships between these types. The schema is accessible by the consumer, and acts as a contract between the API and the consumer, by clearly defining all API operations and fully assigning types to all interactions.
Viewing Irmin data as a graph turns out to be a natural and useful model. Concepts such as branches and commits fit in nicely, and the stored application data is organized as a tree. Such highly hierarchical data can be challenging to interact with using REST, but is easy to represent and navigate with GraphQL.
(image from Pro Git)
As a consumer of an API, one of the biggest initial challenges is understanding what operations are exposed and how to use them. Conversely, as a developer of an API, keeping documentation up-to-date is challenging and time consuming. Though no substitute for more free-form documentation, a GraphQL schema provides an excellent base line for understanding a GraphQL API that is guaranteed to be accurate and up-to-date. This issue is definitely true of the old Irmin HTTP API, which was hard to approach for newcomers due to lack of documentation.
Being able to inspect the schema of a GraphQL API enables powerful tooling. A great example of this is GraphiQL, which is a browser-based IDE for GraphQL queries. GraphiQL can serve both as an interactive API explorer and query designer with intelligent autocompletion, formatting and more.
The combination of introspection and a strongly typed schema also allows creating smart clients using code generation. This is already a quite wide-spread idea with Apollo for iOS, Apollo for Android or graphql_ppx for OCaml/Reason. Though generic GraphQL client libraries will do a fine job interacting with the Irmin GraphQL API, these highlighted libraries will offer excellent ergonomics and type-safety out of the box.
One of the problems that GraphQL set out to solve is that of over- and underfetching. When designing REST API response payloads, there is always a tension between including too little data, which will require clients to make more network requests, and including too much data, which wastes resources for both client and server (serialization, network transfer, deserialization, etc).
The existing low-level Irmin HTTP API is a perfect example of this. Fetching the contents of a particular file on the master branch requires at least 4 HTTP requests (fetch the branch, fetch the commit, fetch the tree, fetch the blob), i.e. massive underfetching. By comparison, this is something easily solved with a single request to the new GraphQL API. More generally, the GraphQL API allows you to fetch exactly the data you need in a single request without making one-off endpoints.
For the curious, here's the GraphQL query to fetch the contents of README.md from the branch master:
query {
master {
tree {
get(key: "README.md")
}
}
}The response will look something like this:
{
"data": {
"master": {
"tree": {
"get": "The contents of README.md"
}
}
}
}The GraphQL API is not limited to only reading data, you can also write data to your Irmin store. Here's a simple example that will set the key README.md to "foo", and return the hash of that commit:
mutation {
set(key: "README.md", value: "foo") {
hash
}
}By default, GraphQL allows you to do multiple operations in a single query, so you get bulk operations for free. Here's a more complex example that modifies two different branches, branch-a and branch-b, and then merges branch-b into branch-a all in a single query:
mutation {
branch_a: set(branch: "branch-a", key: "foo", value: "bar") {
hash
}
branch_b: set(branch: "branch-a", key: "baz", value: "qux") {
hash
}
merge_with_branch(branch: "branch-b", from: "branch-a") {
hash
tree {
list_contents_recursively {
key
value
}
}
}
}Here's what the response might look like:
{
"data": {
"branch_a": {
"hash": "0a1313ae9dfe1d4339aee946dd76b383e02949b6"
},
"branch_b": {
"hash": "28855c277671ccc180c81058a28d3254f17d2f7b"
},
"merge_with_branch": {
"hash": "7b17437a16a858816d2710a94ccaa1b9c3506d1f",
"tree": {
"list_contents_recursively": [
{
"key": "/foo",
"value": "bar"
},
{
"key": "/baz",
"value": "qux"
}
]
}
}
}
}Overall, the new GraphQL API operates at a much higher level than the old HTTP API, and offers a number of complex operations that were tricky to accomplish before.
Customizable
With GraphQL, all request and response data is fully described by the schema. Because Irmin allows the user to have custom content types, this leaves the question of what type to assign to such values. By default, the GraphQL API will expose all values as strings, i.e. the serialized version of the data that your application stores. This works quite well when Irmin is used as a simple key-value store, but it can be very inconvenient scheme when storing more complex values. As an example, consider storing contacts (name, email, phone, tags, etc) in your Irmin store, where values have the following type:
(* Custom content type: a contact *)
type contact = {
name : string;
email : string;
(* ... *)
}Fetching such a value will by default be returned to the client as the JSON encoded representation. Assume we're storing a contact under the key john-doe, which we fetch with the following query:
query {
master {
tree {
get(key: "john-doe")
}
}
}The response would then look something like this:
{
"master": {
"tree": {
"get": "{\"name\":\"John Doe\", \"email\": \"john.doe@gmail.com/", ...}"
}
}
}The client will have to parse this JSON string and cannot choose to only fetch parts of the value (say, only the email). Optimally we would want the client to get a structured response such as the following:
{
"master": {
"tree": {
"get": {
"name": "John Doe",
"email": "john.doe@gmail.com",
...
}
}
}
}To achieve this, the new GraphQL API allows providing an "output type" and an "input type" for most of the configurable types in your store (contents, key, metadata, hash, branch). The output type specifies how data is presented to the client, while the input type controls how data can be provided by the client. Let's take a closer look at specifying a custom output type.
Essentially you have to construct a value of type (unit, 'a option) Graphql_lwt.Schema.typ (from the graphql-lwt package), assuming your content type is 'a. We could construct a GraphQL object type for our example content type contact as follows:
(* (unit, contact option) Graphql_lwt.Schema.typ *)
let contact_schema_typ = Graphql_lwt.Schema.(obj "Contact"
~fields:(fun _ -> [
field "name"
~typ:(non_null string)
~args:[]
~resolve:(fun _ contact ->
contact.name
)
;
(* ... more fields *)
])
)To use the custom type, you need to instantiate the functor Irmin_unix.Graphql.Server.Make_ext (assuming you're deploying to a Unix target) with an Irmin store (type Irmin.S) and a custom types module (type Irmin_graphql.Server.CUSTOM_TYPES). This requires a bit of plumbing:
(* Instantiate the Irmin functor somehow *)
module S : Irmin.S with type contents = contact =
(* ... *)
(* Custom GraphQL presentation module *)
module Custom_types = struct
(* Construct default GraphQL types *)
module Defaults = Irmin_graphql.Server.Default_types (S)
(* Use the default types for most things *)
module Key = Defaults.Key
module Metadata = Defaults.Metadata
module Hash = Defaults.Hash
module Branch = Defaults.Branch
(* Use custom output type for contents *)
module Contents = struct
include Defaults.Contents
let schema_typ = contact_schema_typ
end
end
module Remote = struct
let remote = Some s.remote
end
module GQL = Irmin_unix.Graphql.Server.Make_ext (S) (Remote) (Custom_types)With this in hand, we can now query specifically for the email of john-doe:
query {
master {
tree {
get(key: "john-doe") {
email
}
}
}
}... and get a nicely structured JSON response back:
{
"master": {
"tree": {
"get": {
"email": "john.doe@gmail.com"
}
}
}
}The custom types is very powerful and opens up for transforming or enriching the data at query time, e.g. geocoding the address of a contact, or checking an on-line status.
Watches
A core feature of Irmin is the ability to watch for changes to the underlying data store in real-time. irmin-graphql takes advantage of GraphQL subscriptions to expose Irmin watches. Subscriptions are a relative recent addition to the GraphQL spec (June 2018), which allows clients to subscribe to changes. These changes are pushed to the client over a suitable transport mechanism, e.g. websockets, Server-Sent Events, or a chunked HTTP response, as a regular GraphQL response.
As an example, the following query watches for all changes and returns the new hash:
subscription {
watch {
commit {
hash
}
}
}For every change, a message like the following will be sent:
{
"watch": {
"commit": {
"hash": "c01a59bacc16d89e9cdd344a969f494bb2698d8f"
}
}
}Under the hood, subscriptions in irmin-graphql are implemented using Irmin watches, but this is opaque to the client -- this will work with any GraphQL spec compliant client!
Here's a video, which hows how the GraphQL response changes live as the Irmin store is being manipulated:
Note that the current implementation only supports websockets with more transport options coming soon.
Wrap-up
Irmin 2.0 ships with a powerful new GraphQL API, that makes it much easier to interact with Irmin over the network. This makes Irmin available for many more languages and contexts, not just applications using OCaml (or Javascript). The new API operates at a much high level than the old API, and offers advanced features such as "bring your own GraphQL types", and watching for changes via GraphQL subscriptions.
We're looking forward to seeing what you'll build with it!
Hide




Announcing Irmin 2.0.0 — MirageOS (Anil Madhavapeddy), Nov 26, 2019
We are pleased to announce Irmin 2.0.0, a major release of the Git-like distributed branching and storage substrate that underpins MirageOS. We began the release process for all the components that make up Irmin back in May 2019, and there have been close to 1000 commits since Irmin 1.4.0 release back in June
- To celebrate this milestone, we have a new logo and opened a dedicated website: irmin.org.
You can read more details about the new features in the Irmin v2 blog post. Enjoy the new releas…
Read more...We are pleased to announce Irmin 2.0.0, a major release of the Git-like distributed branching and storage substrate that underpins MirageOS. We began the release process for all the components that make up Irmin back in May 2019, and there have been close to 1000 commits since Irmin 1.4.0 release back in June
- To celebrate this milestone, we have a new logo and opened a dedicated website: irmin.org.
You can read more details about the new features in the Irmin v2 blog post. Enjoy the new release, and stay tuned for the upcoming Wodan integration in 2020 that will be a stable filesystem for the hypervisor targets for MirageOS that do not have a conventional OS kernel underneath them!
Hide




Irmin v2 — Tarides (Thomas Gazagnaire), Nov 21, 2019
We are pleased to announce Irmin 2.0.0, a major release of the Git-like distributed branching and storage substrate that underpins MirageOS. We began the release process for all the components that make up Irmin back in May 2019, and there have been close to 1000 commits since Irmin 1.4.0 released back in June 2018. To celebrate this milestone, we have a new logo and opened a dedicated website: irmin.org.
Our focus this year has been on ensuring the production success of our early adopters -- s…
Read more...We are pleased to announce Irmin 2.0.0, a major release of the Git-like distributed branching and storage substrate that underpins MirageOS. We began the release process for all the components that make up Irmin back in May 2019, and there have been close to 1000 commits since Irmin 1.4.0 released back in June 2018. To celebrate this milestone, we have a new logo and opened a dedicated website: irmin.org.
Our focus this year has been on ensuring the production success of our early adopters -- such as the Tezos blockchain and the Datakit 9P stack -- as well as spawning new research projects into the practical application of distributed and mergeable data stores. We are also very pleased to welcome several new maintainers into the Mirage project for their contributions to Irmin, namely Ioana Cristescu, Craig Ferguson, Andreas Garnaes, Clément Pascutto and Zach Shipko.
New Major Features
New CLI
While Irmin is normally used as a library, it is obviously useful to
be able to interact with a data store from a shell. The irmin-unix
opam package now provides an irmin binary that is configured via a
Yaml file and can perform queries and mutations against a Git store.
$ echo "root: ." > irmin.yml
$ irmin init
$ irmin set foo/bar "testing 123"
$ irmin get foo/barTry irmin --help to see all the commands and options available.
Tezos and irmin-pack
Another big user of Irmin is the Tezos blockchain, and we have been optimising the persistent space usage of Irmin as their network grows. Because Tezos doesn’t require full Git format support, we created a hybrid backend that grabs the best bits of Git (e.g. the packfile mechanism) and engineered a domain-specific backend tailored for Tezos usage. Crucially, because of the way Irmin is split into clean libraries and OCaml modules, we only had to modify a small part of the codebase and could also reuse elements of our OCaml-git codebase as well.
The irmin-pack backend is available for use in the CLI and provides a significant improvement in disk usage. There is a corresponding Tezos merge request using the Irmin 2.0 code that has been integrated downstream and will become available via their release process in due course.
As part of this development process, we also released an efficient multi-level index implementation (imaginatively dubbed index in opam). Our implementation takes an arbitrary IO implementation and user-supplied content types and supplies a standard key-value interface for persistent storage. Index provides instance sharing by default, so each OCaml runtime shares a common singleton instance.
Irmin-GraphQL and “browser Irmin”
Another new area of huge interest to us is GraphQL in order to provide frontends with a rich query language for Irmin-hosted applications. Irmin 2.0 includes a built-in GraphQL server so you can manipulate your Git repo via GraphQL.
If you are interested in (for example) compiling elements of Irmin to JavaScript or wasm, for usage in frontends, then the Irmin 2.0 release makes it significantly easier to support this architecture. We’ve already seen some exploratory efforts report issues when doing this, and we’ve had it working ourselves in Irmin 1.0 Cuekeeper so we are excited by the potential power of applications built using this model. If you have ideas/questions, please get in touch on the issue tracker with your usecase.
Wodan
Irmin’s storage layer is also well abstracted, so backends other than a Unix filesystem or Git are supported. Irmin can run in highly diverse and OS-free environments, and so we began engineering the Wodan filesystem as a domain-specific filesystem designed for MirageOS, Irmin and modern flash drives. See the OCaml Workshop 2017 abstract on it for more design rationale.
As part of the Irmin 2.0 release, Wodan is also being prepared for a release, and you can find Irmin 2.0 support in the source. If you’d like a standalone block-device based persistence environment for Irmin, please try this out. This is the preferred backend for using Irmin storage in a unikernel.
Versioned CalDAV
An application pulling all these pieces together is being developed by our friends at Robur: an Irmin-based CalDAV calendaring server that even hosts its DNS server using a versioned Irmin store. We'll blog more about this as the components get released and stabilised, but the unikernel enthusiasts among you may want to browse the Robur unikernels future branch to see how they are deploying them today.
A huge thank you to all our commercial customers, end users and open-source developers who have contributed their time, expertise and financial support to help us achieve our goal of delivering a modern storage stack in the spirit of Git. Our next steps for Irmin are to continue to increase the performance and optimise the storage, and to build more end-to-end applications using the application core on top of MirageOS.
Hide




BAP 2.0 is released — The BAP Blog, Nov 19, 2019
The Carnegie Mellon University Binary Analysis Platform (CMU BAP) is a suite of utilities and libraries that enables analysis of programs in their machine representation. BAP is written in OCaml, relies on dynamically loaded plugins for extensibility, and is widely used for security analysis, program verification, and reverse engineering.
This is a major update that includes lots of new features, libraries, and tools, as well as improvements and bug fixes to the existing code. The following sma…
Read more...The Carnegie Mellon University Binary Analysis Platform (CMU BAP) is a suite of utilities and libraries that enables analysis of programs in their machine representation. BAP is written in OCaml, relies on dynamically loaded plugins for extensibility, and is widely used for security analysis, program verification, and reverse engineering.
This is a major update that includes lots of new features, libraries, and tools, as well as improvements and bug fixes to the existing code. The following small demo showcases the modern BAP look and feel. In this announcement we would like to focus on two very important features of BAP 2.0:
- knowledge representation and reasoning system;
- the tagless final representation of program semantics.
The Knowledge Base
The Knowledge Representation and Reasoning System, or the Knowledge Base (KB) for short, is the central part of our new architecture. The KB is a step forward from the conventional approach to staging multiple analyses in which dependent analyses (aka passes) are ordered topologically, based on their dependencies. The KB is inspired by logic programming and enables an optimal and seamless staging of mutually dependent analyses. Mutually dependent analyses are also present in the source-based program analysis but are much more evident in the field of binary analysis and reverse engineering, where even such basic artifacts as control flow graph and call graph are not available as ground truth (and in general are not even computable).
Object properties in the KB are represented with directed-complete partially ordered sets. The KB also imposes the monotonicity restriction that requires that all updates to the property are monotonic, i.e., each consequent value of the same property is a refinement of the previous value. These restrictions enable the KB to compute the least fixed point of any property, is computed. A property representation could be optionally refined into a complete lattice, which gives the KB users extra control on how properties are computed.
By storing all information in an external location the KB addresses the scalability issue so relevant to binary analysis and reverse engineering. In the future, we plan to implement a distributed storage for our Knowledge Base as well as experiment with other inference engines. Soon, it should also possible to work with the knowledge base in non-OCaml programs, including our BAP Lisp dialect. That, practically, turns the knowledge base into a common runtime for binary analysis. In the current version of BAP, the Knowledge Base state is fully serializable and portable between different versions of BAP, OCaml, and even between native and bytecode runtimes. The Knowledge Base state could be imported into an application and is integrated with the BAP caching system.
New Program Representation
Employing the tagless final embedding together with our new Knowledge Base we were able to achieve our main goal - to switch to an extensible program representation without compromising any existing code that uses the current, non-extensible, BAP Intermediate Language (BIL). The new representation allows us to add new language features (like floating-point operations or superscalar pipeline manipulations) without breaking (or even recompiling) the existing analyses. The new representation also facilitates creation of new intermediate languages which all can coexist with each other, making it easier to write formally verified analyses.
Links
HideView older blog posts.
Syndications


- Ahrefs
- Amir Chaudhry
- Andrej Bauer
- Andy Ray
- Anil Madhavapeddy
- Ashish Agarwal
- CUFP
- Cameleon news
- Caml INRIA
- Caml Spotting
- Coherent Graphics
- Coq
- Cranial Burnout
- Cryptosense
- Daniel Bünzli
- Daniel Bünzli (log)
- Dario Teixeira
- David Baelde
- David Mentré
- David Teller
- Eray Özkural
- Erik de Castro Lopo
- Etienne Millon
- Frama-C
- Functional Jobs
- GaGallium
- Gabriel Radanne
- Gaius Hammond
- Gemma Gordon (OCaml Labs)
- Gerd Stolpmann
- GitHub Jobs
- Grant Rettke
- Hannes Mehnert
- Hong bo Zhang
- Jake Donham
- Jane Street
- KC Sivaramakrishnan
- Leo White
- Magnus Skjegstad
- Marc Simpson
- Matthias Puech
- Matías Giovannini
- Mike Lin
- Mike McClurg
- Mindy Preston
- MirageOS
- OCaml Book
- OCaml Labs
- OCaml Labs compiler hacking
- OCaml Platform
- OCaml-Java
- OCamlCore Forge News
- OCamlCore Forge Projects
- OCamlCore.com
- OCamlPro
- ODNS project
- Ocaml XMPP project
- Ocsigen blog
- Ocsigen project
- Opa
- Orbitz
- Paolo Donadeo
- Perpetually Curious
- Peter Zotov
- Psellos
- Richard Jones
- Rudenoise
- Rudi Grinberg
- Sebastien Mondet
- Shayne Fletcher
- Stefano Zacchiroli
- Sylvain Le Gall
- Tarides
- The BAP Blog
- Thomas Leonard
- Till Varoquaux
- Xinuo Chen
- Yan Shvartzshnaider


