As we in Autonomic are managing quite a few servers now, I was wondering what solutions co-ops are using to monitor (and possibly apply) updates on multiple Linux servers that scales well?
All our servers are Debian or Ubuntu and I would prefer notifications to not use email for notifications if possible. We do use Ansible, so that’s a possible solution but we don’t really have time to write stuff from scratch. It has to be free and open source software ofc
Alerts will be sent by email or perhaps Telegram. We don’t have any longer term graphing right now.
The netdata dashboard is served on localhost:19999 on the remote server and then forwarded to the admin’s local browser via ssh using this command: ssh -f user@host.foo -L 19998:localhost:19999 -N
This is good for security as no ports need to be opened as all traffic is outgoing (apart from ssh).
p.s. I should add that there is a another layer of monitoring at the provider level in case the server crashes or what not.
We also use Munin and it is set to send emails when updates are available, however this fix and this fix are needed for it work properly.
When we first started using Xen on CentOS5 servers (probably about 12 years ago) I found that doing a yum update on multiple virtual servers at exactly the same time on the same physical server caused such a load spike that they would stop responding, so since then I have been updating servers sequentially, using this script, it is rather old and could probably do with improving but it means that to update a server it is a matter of sshing to it and running sudo -i and then a-up, this writes the changes that are to be made to a /root/Changelog file, see the logchange script.
To make life easier my (Ansible provisioned) ~/.bash_aliases file contains sections like this:
ssh-stretch() {
ssh server1
ssh server2
}
And my (Ansible provisioned) ~/.ssh/config file contains corresponding entries like this:
So to update all the Stretch servers I type ssh-stretch and then sudo -i, a-up, exit, exit and then do the next one, to make this easier when out and about I have shortcuts for all these commands in the terminal client on my Ubuntu Touch phone, which has an encrypted Debian chroot on it, so it is four button presses per server…
If all these servers were on other peoples hardware and I didn’t need to worry about the impact of updating 30 virtual servers on a physical host all at the same time then I’d consider sorting out a quicker why of thing this, however, I’d still worry about the one in a fifty, or so, updates that require interaction — I guess most people enable automatic updates, but again I had some bad experiences of this over a decade ago and vowed to do things manually, however most updates don’t require much thought, but when there are ones that change a key PHP or Apache or Nginx or whatever config file for security reasons then you do need to looks at the diffs and manually sort things out — my approach is time consuming but it minimises the risk of an automatic update leaving a key service unable to restart.
We’re using Ansible to check the status of all our servers on AWS.
All our servers are identified using their Name tag.
We have an inventory.yml that lists all our servers and puts them into groups:
tag_Name_bgv_wordpress_staging:
tag_Name_bgv_wordpress_production:
...
active:
children:
# The "active" group is used as the "host" value in the playbooks
# If you don't include your project in here it will never be run.
bgv:
bgv:
children:
tag_Name_bgv_wordpress_staging:
tag_Name_bgv_wordpress_production:
...
We can then run playbooks against each group of servers:
ansible-playbook check.yml --limit bgv
This playbook checks for Meltdown and Spectre, and whether a reboot is required, and looks like this:
---
- name: Check for Meltdown/Spectre vulnerabilities
hosts: active
remote_user: ubuntu
become: yes
gather_facts: yes
roles:
- reboot-required
- meltdown
The roles are specific to Ubuntu (perhaps Debian also) distributions. The reboot-required role looks like this:
We use Ubuntu’s unattended-upgrades service (documentation) to automatically install security updates, so we only need to check if reboot is required.
Our code isn’t open source and freely available, as it contains references to all our servers and login users. However, we could share a sanitised version of the repo if there’s interest.
We’ve been testing the Netdata to Telegram bot alerts today and it’s working great. Should tide us over for alerts for the time being so we can focus on our clients. We could perhaps try and write a Signal plugin sometime.
Deffo agree with @chris about the unattended-upgrades. It’s only on non-critical and somewhat simplistic servers that we do that.
Interesting to see the range of solutions that folks are using. This has been super useful thanks
To elaborate a bit on what Finn has said, we use Icinga2 to monitor when security updates are required. I then have an Ansible script that runs through all of our servers, checks for updates, applies any required and optionally reboots the servers. I generally do this once a week or when a patch for a serious issue is released.
I’m happy to supply this Ansible script is others would find it useful. It only works on Ubuntu servers, but could easily be adapted to other flavours of Linux.
This works for about 70 servers and I can see it working on over 100, but I don’t think it will scale much beyond that.
I’d be interested to see that, you could upload to a repo at git.coop and make it only available to people with accounts if you just want to share with other co-operators?
Has anyone experimented with using Prometheus to track software updates on a server? The node_exporter provides metrics on a software updates on the server. This could then be displayed on a dashboard like Grafana making it easier for people to see the state of a server. You could also use Alertmanager with Prometheus to create alarms when the number of packages goes over a threshold.
I’ve been working on deploying Prometheus and Grafana using docker-compose and found it relatively easy to do and could share the Ansible scripts if people also wanted to give it a go.
I’ve been working on deploying Prometheus and Grafana using docker-compose and found it relatively easy to do and could share the Ansible scripts if people also wanted to give it a go.
The Ansible code above was failing today as the latest version of Docker also installs two new packages (containerd.io and docker-ce-cli) and apt-show-versions -b -u wasn’t listing these, so I have switched to using this command to see what is available to be upgraded and also show new packages:
- name: Check for updates and new packages using apt-get dist-upgrade -q -s
shell: apt-get dist-upgrade -q -s | grep '^The following' -A1 | grep '^ ' | xargs
args:
warn: False
executable: /bin/bash
register: packages
And I have moved this code from a private repo to a public Ansible role:
@stephen hope you can come to the Ansible session at the CoTech hack — the use of Ansible Galaxy to make roles public and shareable is one of the mains thing we want to discuss.
I’m now about to use this role to upgrade the server running this Discourse site so it is going to go down for a few minutes…