Tag Archives: critical systems

Partially restricted operation / most services available

As a result of a chain of technical failures of old equipment already scheduled for replacement, there are currently certain limitations in the services provided to members of the CAcert community. We regret this terribly.

  • bugs.cacert.org ?Bug management: normal operation
  • community.cacert.org ?Service hub: normal operation
  • irc.cacert.org ?IRC: normal operation
  • secure.cacert.org ?reduced service
  • selfservice.cacert.org ?password reset: normal operation
  • webmail.cacert.org ?webmail: normal operation
  • wiki.cacert.org ?wiki/help centre: normal operation
  • www.cacert.org ?main page: reduced service

Mid september, we discovered that a partition contained a corrupt file system. A
subsequent hardware test showed that one of the hard drives was reporting hardware errors. In order to be able to continue using the system, we have moved this partition to a second drive.

Since the end of september the system no longer responds. We suspect that other partitions are defective. Neither web access nor SSH access work, so the only way to find the error can only be analysed in more detail by a visit to our data centre.
In order to still be able to offer as many services as possible to the CAcert community until the repair, we redirected the connections for www.cacert.org and secure.cacert.org in the incoming firewall to the second system. As a result of ongoing hardware renewal, however, this fall-back level is not quite complete: There is no working signer and no up-to-date copy of the CAcert database attached to this system.

That is, why the main page can be used as a start for informing our users about the blog for now, while certificate issuing and WoT access has to be postponed until our technical volunteers have made the several-hour trip to the data centre for troubleshooting. As they are doing this in their spare time and at their own expense, we are very grateful to these volunteers that they will probably be able to do this in mid-October.

If you would like to know what you can do yourself to ensure that such interruptions occur less frequently and are resolved more quickly, read this!

DEUTSCH: Infolge einer kaskadierten technischen Störung sind zur Zeit leider nicht alle Dienstleistungen übers Netz abrufbar. Alle Fernwartschritte haben unsere technischen Freiwilligen bereits unternommen. Bis zu einen Vororteinsatz im Rechenzentrum im Ausland voraussichtlich Mitte Oktober ist der Zugriff auf den Signer und die Datenbank nicht möglich. Wir bedauern dies sehr. Was Sie tun können, um solche Ausfallzeiten künftig zu verringern, lesen Sie hier!

FRANÇAIS: Suite à une panne technique en cascade, tous les services ne sont malheureusement pas accessibles en ligne pour le moment. Toutes les démarches de télémaintenance ont déjà été effectuées par nos volontaires techniques. L’accès au Signer et à la base de données est impossible jusqu’à une intervention sur place dans le centre de calcul à l’étranger, probablement mi-octobre. Nous le regrettons vivement. Vous pouvez lire ici ce que vous pouvez faire pour réduire ces temps d’arrêt à l’avenir!

PORTUGUÊS: Devido a uma falha técnica em cascata, infelizmente nem todos os serviços estão disponíveis pela rede no momento. Todas as medidas de manutenção remota já foram tomadas por nossos voluntários técnicos. O acesso ao signatário e ao banco de dados não será possível até uma visita no local ao centro de dados no exterior, provavelmente em meados de outubro. Lamentamos muito o ocorrido. Leia aqui o que você pode fazer para reduzir esses períodos de inatividade no futuro!

ESPAÑOL: Debido a un fallo técnico en cascada, lamentablemente no todos los servicios están disponibles actualmente a través de la red. Nuestros voluntarios técnicos ya han tomado todas las medidas de mantenimiento a distancia. El acceso al firmante y a la base de datos no será posible hasta una visita in situ al centro de datos en el extranjero, probablemente a mediados de octubre. Lo lamentamos mucho. Lea aquí lo que puede hacer para reducir estos tiempos de inactividad en el futuro.

New signer proves itself in use

EN: Signer is running again

DE: Signer ist wieder in Betrieb

FR: Signataire fonctionne à nouveau

ES: Firmante vuelve a funcionar

IT: Firmatario è di nuovo in funzione

The signer has been running again since yesterday, Friday, around 13:00 CEST. We then (while we were doing other work) watched the processing for about another hour… Around 0:30 CEST all outstanding certificate requests (~3000) were processed.

Things didn’t quite go as planned in June. As soon as something cannot be done remotely – there is no remote access to critical systems for security reasons – someone who is authorised to do so has to go the data centre in the Netherlands. Despite Corona, quarantine, floods, overtime at the company and whatever else comes up. That’s maybe two hours. Then two hours home again and in between the actual work. During the opening hours of the data centre, in your free time and paying for your own train ticket or petrol. It’s not always easy to reconcile all that. On Friday afternoon, however, the time had come and the Signer has now been running smoothly again for over a day.

As can be seen from the Critical Team’s plan published yesterday, preliminary work is already underway to make the system redundant throughout and even more robust, so that failures should no longer be noticed by users, because no one is interested in such failures! We are very sorry that you had to wait so long. At the same time, we thank the small core team who have sacrificed nights and weekends over the last five weeks to get the technology back up and running for the CAcert community!

Datacenter-Visit on 2021-07-16 *UPDATE*

The activation of signer machine was successful, all pending certificates were processed in the last hours.

Short version: There is a visit at the datacenter planned to enable the signer again (and do some other maintenance there).

Long version:

Unfortunately it was not possible to get the signer back to work again during the last visit due to a hardware-issue with the harddrive.

To get the server running on the (pre-)created backup drive did fail, too …

Therefore we took the time during the last weeks (when it was not possible to visit the datacenter due to different business and personal reasons) to rebuild a test-environment on spare hardware and to train ourselves.

We should now be able to do the necessary steps to bring back the signer machine to work.

In the background we’re currently adjusting our processes to make it easier to visit the datacenter during out-of-office-times (as every trip to the datacenter takes several hours additionally to the time we’re working at the servers).

In future we plan to set up an additional confuguration, which can take over in case of a failure in the datacenter, but this will still take time. However: The exact procedure needs to be worked out as the machines are not to be connected to the internet, but need to communicate (e.g. for CRL-creation, certificate serial numbers etc.).

Report of visit at datacenter on 2021-04-19

After a new member was added to the access engineers team it was possible to visit the datacenter following the epidemiological guidelines for SARS-CoV-2, as well as our own security guidelines.

During this visit we applied the long-awaited patch for bug 1438 by adding the serial number to certificate revocation lists.

This visit also provided an opportunity to instal a new infrastructure-server, courtesy of Abil’I.T. , a Luxembourg based free software service provider. Many thanks again!

… and …

We did the Class-3-resigning during this visit. Currently we’re testing this new Class-3-certificate and will publish it real soon.

A new visit in the summer will be necessary to replace hardware (and maybe apply further patches on the signer).

Engineers nominated

The free certificate authority CAcert is making progress in increasing the number of its working groups. In the past few days, the committee approved the appointment of Jan to the post of Critical Engineer.

The appointment of Michaela as Access Engineer was also approved. Both have a broad range of experience and are distinguished by their specialist knowledge and sense of responsibility. We wish both engineers much success and fulfilment in their voluntary work for the CAcert community. These are challenging tasks and come with great responsibility. CAcert offers interested volunteers a variety of tasks, the opportunity to gain exciting experience and stimulating career opportunities.

Signer is working again

Today we were able to investigate the signer machine at the datacenter.

As previously assumed, the signer machine was powered off. It was not possible to power it on again, so either both PSUs or other components died.

As we ordered a replacement-machine of the same type we were able to use the existing harddrives to power up the signer again.

Currently the signer is catching up, which will take some hours. As soon as your certificate was processed, you’ll get an email from our server.

The certificate of www.cacert.org is in the queue (together with your certificates and revocations), so we need to wait until it’s ready. It will get updated as soon as possible.

Update 2020-05-05: All pending certificates requests are processed now, new requests should now processed on the fly again.

Dirk Astrath
CAcert critical admin

scheduled systems downtime – 15th June

Wytze reports on a planned outage for CAcert main systems, as the systems are moved from one rack to another:

“The move has been scheduled for Tuesday June 15, starting at 10:00 CEST, and hopefully ending before 18:00 CEST.

During a significant part of that period, all systems will be down. We will take care of providing a backup during the outage for ocsp.cacert.org (to avoid inconveniencing browser users which have OCSP enabled for CAcert, as they should!), and a placeholder for www.cacert.org which report the downtime and the reason for it.”