New Software: Rewriting the software driving our site

In today’s post we start a blog series about our “New Software”. It will consist of three parts:
1. Rewriting the software driving our site
2. Modernising the Web Frontend
3. The Heart of Gold

As some of the details outlined are still work in progress there’s a chance of things to change, but in general these are things presented to get a grasp of what the software team is being working on for the last year, despite being rather silent otherwise. Feedback to these plans is gladly welcome.

Rewriting the software driving our site

Over the past few months the software team along with other groups like NRE (New Root and Escrow) and Policy Group has been and still is working on rewriting the software that the whole CAcert website is running from scratch. The step was planned for quite some time already, but due to the complexity and the routine work required to keep the current software running it only became possible recently. In this blog post we’d like to describe some of the decisions that were made in this process as some of those decisions are commonly questioned and somewhat controversial.

When we set down to plan the new software one of the main reasons to do so was avoiding the pitfalls we are having with the old software from a maintainability point of view. The old software was written back in 2003 and has ever since only seen patches here and there. If you look closely at the current source you’ll notice about ten different coding styles interleaved into each other. Combined with the quirks of PHP this makes quite a mess that easily gets unmaintainable once you want to implement new features or try to update the software runtime.

Another important aspect when planning the software was redundancy. As the last year has shown you want to mistrust your SSL implementation and every other peace of software you use as much as is practical for your purpose. Yet you still want to protect yourself from those errors if there is a way to do so.

Discussions in the software team meetings were quite heated when the initial thoughts were shared and the basic decisions made. One of those decisions was to drop PHP in the new system due to its quirks and inconsistent API. When surveying other options for a replacement we also took into account that we did not want to have both the front-end and the back-end written in the same language to keep the redundancy we currently have (PHP in the front-end, Perl in the back-end). The list of possible languages contained PHP, Perl, Python, Java, C++, C. Other languages like Groovy, Ruby, Erlang, Haskell and many more were looked at but were discarded due to lack of enough software assessors confident enough to write security-related code with them OR because of weak typing which would reifter heavy discussion we settled for Java in the front-end which was not that undisputed in our team either. Thus when selecting Java we knew of the issues that its runtime has and decided to go along this way for mainly three reasons:

Given the number of CVEs it’s not worse then PHP even if many people think it is. When surveying the options we explicitly looked into the security track record and did a count on how many of the issues would have affected us if we restrained our use feature set to some extend. By doing so and leaving out several features like JSPs, JRMI, and many other advanced features we do not need we found it is possible to cut back on attack surface to an acceptable measure.
There is really good tooling support for software design, quality control and testing as well as for refactoring. Given the right tools you can prove that before and after a change was made your software still works the same – and all you need to do is writing a few rules saying how you get from A to B.
There are lots of people who know the language enough to write code with it that runs without falling over at any occasion – something which needs years of experience with languages like PHP and C++. Comparing these aspects to Python we weren’t as confident about the tooling support and were in addition lacking people in our team who could have done the software reviews. Thus even as we’d love to use something provable like Haskell there’s no point to do so if nobody can assess if a given patch was correct.

A similar situation arose with the signer (currently this part is called CommModule). Although this part basically “works” it’s always a hassle if anything needs to be changed as only few people in our team are confident enough to write Perl. Thus Perl was excluded for the signer too. Instead we settled for C++ for the signer, which might be inintuitive considering the common prejudices regarding the memory management, which, while similar to the broken one from C, has a lot of abstractions actually making live easier. By wrapping the cryptography interfaces of the used library you can keep memory management at a safe distance. Also use of C++ enables us to freely choose among all the (broken) SSL/TLS implementations available; something most other languages would not yield without writing some kind of wrapper.

On the cryptography PoV we discussed several implementations available but ruled one thing quite at the start: We will NOT implement any cryptography ourself on the signer. Thus the cryptography should come from an existing implementation like OpenSSL, LibreSSL, GnuTLS, libNSS, NaCl, CyaSSL/WolffSSL, PolarSSL/embedSSL. As several implementations head differing issues in recent time we decided to keep this part in the new system flexible to switch the implementation without much effort should problems arise. Our favoured backends here were GnuTLS with OpenSSL/LibreSSL as fallbacks – well knowing both have their quite distinct set of issues.

With the question of programming language answered let’s head over to the software design decisions made apart from the mere question of programming language. One aspect in the old system was its inherent lack of any meaningful documentation – something which made the old software basically unfit for any meaningful audit and a hell to maintain. Additionally the old software has a structure which makes writing sensible tests a nightmare (if at all possible). To work around these issues in the new software we plan to maintain documentation inline as part of the code to make updating it along changes to the code as easy as possible. Also most parts of the new software have been covered by a set of unit and integration tests which simplifies the life of our testers: In recent times not only once the test instruction to the old software read “Complete Retest” as there was a lack of integration tests.

Given all these details on the old software and our plans for the future we hope it gives a short introduction to what the software team is working on and why we react quite allergic to people asking when things will be “in their browsers”. It’s not that we are slacking off, but that there is much too much work to do than to communicate every tiny change. If you are interested in some more detail of the new software I’d recommend you reading the follow up parts of this little series which will detail both the features and architecture of the web front-end and will have a closer look at the new signer.

Rewriting the software driving our site

Leave a Reply Cancel reply