Questions about the CoTech Nextcloud server

chris · 18 December 2017 15:34

Sorry for the delay in posting this, I started working on it some days ago but have only just found time to finish it today.

@chrislowis asked some questions on a thread on Loomio:

If we are compelled to use Nextcloud for commercially important collaborations on a deadline, for example, I think I’d like some more information on how it’s going to be supported and whether we can rely on it being available 24/7.

Ratify the Nextcloud recommendations from Wortley Hall

I was thinking primarily about reliability (an uptime guarantee or SLA?) and backups/disaster recovery… If we were to move important parts of our business to Nextcloud, I’d expect to have to pay (and I think someone should be paid) for those kind of guarantees.

Ratify the Nextcloud recommendations from Wortley Hall

Compulsory use of Nextcloud

I didn’t come up with the recommendations from Wortley Hall so I don’t feel that I need to defend them or explain what “should” means in this context and whether it can be equated with being “compelled” — other people are in a better position than I am to answer your first question.

Support and updates

Regarding “how it’s going to be supported” — we are, and intend to continue, supporting the Nextcloud server at office.coops.tech in the same manner in which we would support a Nextcloud server provided for a client, this involves:

Applying Debian security updates as soon as they are available (I start getting alerts from the server every 5 mins on my phone when security updates are available). Of course I take heed of the nature of the updates and depending on the application being updated and the reason for the update take more or less care in applying the update, for example an update to a package that isn’t used on a regular basis would be applied without much testing or thought (for example there was a new version of rsync for Stretch and Jessie servers that came out last night after I had gone to bed, so I did the updates while in bed) whereas a security update to PHP7.0 would require reading up on what exactly was changed and checking that this wouldn’t cause any problems, applying it to a test server first and then applying it to the office.coops.tech server and after applying it testing that everything is still working by logging into the server etc. These updates are often applied outside of office hours using a Linux phone.
Applying Docker CE updates, the quarterly version updates and security point releases haven’t caused issues in the past and these would be therefore be applied quickly, of course these update would be tested on a development server first.
Applying CODE updates, since the online editor is only available to authenticated users and is behind a proxy, security updates are less critical so updates would probably done as needs be or when important bugs are fixed or new features made available. CODE updates are also a little tricky as there appears to be unidentified bugs.
Applying Nextcloud security updates, within a working day, these updates are a little more complicated so need a little time to first deploy them on test / development servers and then on lives ones.

Nextcloud app updates can’t (last time I checked) be done using the command line, but can be done by anyone with admin rights in Nextcloud, we would generally leave these to clients to update for this reason but in this case, if nobody else does them then I expect that I would pick this up as and when. If a more formal agreement / process is needed for app updates then perhaps we can discuss this in the thread below.

Debian version updates, the point releases to Debian versions generally contain non-urgent security updates (urgent ones are pushed out straight away), for example see the updates in the recent 9.3 release and are applied quickly.

Relialibility, uptime and backups

We generally have very good reliability on the infrastructure we have built in Sheffield (file servers running FreeBSD and ZFS and front facing servers running Debian and Xen) and we take regular snapshots of all the virtual servers we host and back these up onto another server and keep 60 days worth of these snapshots.

We did, and will again soon, then backup the backups to another server in another city but this isn’t happening right now due to a hardware failure, we have however ordered a pair of new backup servers in the last week (2 x 1U, 8 cores, 64GB RAM, 4 x 10TB disks and a 128GB SSD system drive) and hope to have these up and running early in the new year.

We don’t have any uptime figures from a third party (though we should get this set up) so we can’t claim 99.9% uptime or whatever, but we probably do achieve that. We were however, before the installation of our latest front facing server, having occasional issues with disk write time to the ZFS server and there were four incidences earlier this year where we had to reboot almost everything as a result (5th October, 24th August, 5th July and 30th April). We hope to be able to upgrade our main file server in 2018 to ensure that incidents like this never happen again.

This year we also had a data centre room move, but we don’t expect to have another of those for many years.

In terms of restoring data from the filesystem backups, this is a fairly straightforward task and don’t take long, it is however something we don’t need to do very often. In the event of a major accident in Sheffield (for example a meteor strike destroying the rack and all the server in it) we would have to quickly rent hardware elsewhere and restore from the Manchester backups, this is something that would take a little while but I expect that we would be able to get everything up and running again in a matter of days.

24/7 availability

There are only two techies who work for Webarchitects (the other worker members are non-technical) and neither of us work shifts so in the event of a problem at, for example, 4am then you would have to wait untill we wake up to fix things — would this really be a problem? Are you expecting to need to use the Nextcloud server through the night? Also given the nature of the service (all data is synced to client machines) I find it hard to imagine a situation where a brief unavailability would cause a major problem.

However I do intend to create a CoTech shared address book on the Nextcloud server and I will put my mobile and home landline numbers in it so if there was a problem that you considered it worth waking me about (for example just before a deadline for a massive joint bid for work and somehow some data being on the server but not on any client machines or something) then of course I’d be happy to be woken up in order to help.

Of course when we have had problems with hardware and have needed to work through the night we have done so (we had some RAM fail after moving a 10 year old server some months ago and didn’t get to leave the data centre untill past 4:30am) — keeping things up and running is something that we take very seriously and is a very high priority for us.

But, I’m afraid to say that we don’t have a verbose, legal type document to link to that put all these things in a language designed for lawyers to argue over.

Money

As I said to you a few other people at Wortley Hall, I have been working on the assumption that until there is a legal CoTech entity there isn’t really an organisation to take on things like the legal ownership of the coops.tech domain name or to do things like employ people. Furthermore a lot of the co-ops have been providing their time and services and in some case paying for things (of example Outlandish printing flyers) to help get the network established and I have viewed our hosting, sysadmin and devops services in this light — a contribution to help build this network

kawaiipunk · 23 December 2017 21:53

Thanks for the great explanation and insight into your setup and practices. It makes me think we should all be sharing info like this!

Quick question… what do you use for security update alerting?

chrislowis · 23 December 2017 22:59

@chris - thank you so much for a detailed and informative write-up. As well as helping everyone to decide whether and how to migrate to Nextcloud you’ve also given us a great insight into the huge amount of “behind the scenes” work you and WA put in to making this kind of service available to everyone.

Are you expecting to need to use the Nextcloud server through the night?

Not necessarily through the night, but I know that some coops have folks who travel (as it happens I’m on GMT-5 at the moment) so it’s good to know a bit more about what would happen “out of hours”. I don’t think this is a big concern at all for the collaborative projects we’re getting involved in at the moment, but it might be a bit more of a concern if we moved all of our coops stuff away from Google to Nextcloud.

Furthermore a lot of the co-ops have been providing their time and services and in some case paying for things (of example Outlandish printing flyers) to help get the network established and I have viewed our hosting, sysadmin and devops services in this light — a contribution to help build this network

I think this is very generous of you all and very much appreciated. I also think you should feel free to ask for contributions / people and organisations to join WA / cobudget etc if and when you feel you need it.

Thanks again Chris, and have a great festive break.

chris · 24 December 2017 00:16

Cheers guys, have a great holiday