To reboot or not to reboot

To reboot or not to reboot - Servers Administration, Networking, & Virtualization

Users browsing this thread: 1 Guest(s)

	venam Offline \| 18-04-2019, 01:53 AM \| #1

Hello fellow nixers,
In this thread we'll discuss an old myth that we've actually never brought as a discussion on the forums yet: To reboot or not to reboot.

It is often said that rebooting to solve an issue is the equivalent of giving up. It might be the norm on some other operating systems but on Unix-like OS with users that should intimately know their machine it's regarded as heresy.

having a high uptime is looked at as an epic tale. A show off of stability and rigidness.

https://hardware.slashdot.org/story/13/0...-of-uptime
https://web.archive.org/web/201806232200...cs/uptimes
https://www.infoworld.com/article/262344...boxes.html

[Image: devotion_to_duty.png]

There's even a command uprecords for keeping track of reliability.

However, sometimes, especially when the machine runs as a development one and not a server, it seems mandatory to reboot. Let's say for a kernel upgrade or full system upgrade. Or init system upgrade. Or hardware upgrade. Or even in some cases some kernel modules update (if the system doesn't have a dynamic way to load those).

Still even when updating the kernel some OS, for example Linux, allow to update/patch the kernel without rebooting via features such as kexec.

So here we go:
Nixers, what's your opinion on this topic of rebooting. Do you reboot often, on which type of machines. Do you have any special tricks, any story to share about a time you rebooted and it was worthless, or a time when you actually never found out why the issue was happening in the first place.

My take on the topic. I usually avoid rebooting unless there's a kernel update. I have uptimes going around a 100 days in a row.
On my server I actually almost never reboot.

Code:
00:27:49 up 391 days, 10:18,  1 user,  load average: 0.00, 0.00, 0.00

Code:
00:42:28 up 359 days, 11:43,  1 user,  load average: 0.08, 0.03, 0.05

So on both of my two servers I haven't rebooted in around a year.

When the service is running fine I think there's not much to worry about unless there's an important security update to do on the kernel.

I reboot most often, on my home machine and work dev machine, when I run into the kind of issues that make me pull my hair off and after trying everything else and that didn't work.

My latest story was about a card reader that wasn't being recognized by the official proprietary driver. For those who have no clue about card readers on Unix, they work through a daemon called pcscd that will match the appropriate driver (/usr/lib/pcsc/drivers/ for instance) depending on the device vendor ID and product ID. There are some drivers that are open that can come from the package manager or along with other tools such as opensc.
In all cases, the reader I wanted to try wasn't recognized by pcscd even after following all I thought was right. So I gave up, thought something might have been messed up in udev or something that I didn't caught.

But still after rebooting there was nothing.
Oh no the uptime went away!

Digging further in the issue. Checking dmesg for the venderID and productID that was being reported and trying to match it with the Info.plist that should contain what points to the proprietary driver, I found out that the IDs aren't even listed. I tried adding them but in vain, the driver doesn't support the device. Apparently they forgot to update their Linux driver.

So I ended up using another card reader (acr38) that's widely supported.

Reboot lesson learned.

So what do you think about rebooting and how frequently it should be done.

	biniar Offline \| 18-04-2019, 03:11 AM \| #2

(18-04-2019, 01:53 AM)venam Wrote: I reboot most often, on my home machine and work dev machine

Ditto.

(18-04-2019, 01:53 AM)venam Wrote: So what do you think about rebooting and how frequently it should be done.

Only when necessary for production, esp. for bigger groups of people that rely on $services.

If using a web server that doesn't host mission critical $services or something, reboots might not be a big concern. Anything in the virtualization realm as far as uptime is considered should not be taken seriously in my opinion.

	pkal Offline \| 18-04-2019, 04:42 AM \| #3

I usually don't ever reboot my machines, except if I accidentality hit the power button, and it turns of.

	jkl Offline \| 18-04-2019, 05:09 AM \| #4

I reboot to apply kernel upgrades and/or - on FreeBSD - to solve the problem of a full RAM. (Weirdly, FreeBSD doesn't clean it up too well.)

--
<mort> choosing a terrible license just to be spiteful towards others is possibly the most tux0r thing I've ever seen

	z3bra Offline \| 18-04-2019, 09:14 AM \| #5

I think rebooting is pretty important in a computer's life, be it a laptop, desktop or server.

I shutdown my dekstop everyday to save polar bears, and turn it back on in the morning when I use it.
For my servers, I reboot for each OpenBSD upgrade at least, and every now and then, just to make sure that everything comes back up as expected.

This last point is very important to me, especially in the era of auto-configured servers, micro-services shit and similar stuff. Rebooting your service makes sure that it is configured in a reliable and reproducible state.
When your server boots, it MUST be able to reach 100% of it's usability without any manual intervention. If it does not, it means you have a configuration problem.
When I perform "reboot tests" on my servers, I tend to reboot multiple times in a row, to make sure I'm always in a good shape and everything goes in a pretty deterministic way.

Of course, I find huge uptimes to be sexy. I used to value that a lot!
But as time goes by, I realized that my fear of rebooting was growing exponentially to the uptime.

Big uptimes used to be a proof of reliability, in a time were services were bound to the servers they were running on. Nowadays, it's fairly easy to keep a box up and running for years. All you have to do is waiting.

I think that today's true goal is to reach 99.9999% service uptime with servers rebooting every hour.

Now, all this said, the facts:

Code:
$ for host in doom apophis daemia lucy klesk bitterman; do

> ssh -n $host 'hostname; cat /BIRTHDAY; uptime'

> echo

> done

doom.z3bra.org

Tue Apr 18 06:46:04 UTC 2017

 14:14:01 up  5:54,  2 users,  load average: 0.15, 0.11, 0.09

apophis.z3bra.org

Tue Jan 15 19:32:01 UTC 2019

 2:14PM  up 26 days, 14:44, 1 user, load averages: 0.00, 0.00, 0.00

daemia.z3bra.org

Thu Jun 22 22:49:37 UTC 2017

 2:14PM  up 21 days, 14:40, 0 users, load averages: 0.60, 0.64, 0.59

lucy.z3bra.org

Fri Dec 28 15:34:14 UTC 2018

 2:14PM  up 107 days, 22:10, 0 users, load averages: 0.02, 0.02, 0.00

klesk.z3bra.org

Thu Apr 11 15:03:21 UTC 2019

 2:14PM  up 6 days, 22:38, 0 users, load averages: 0.00, 0.00, 0.00

bitterman.z3bra.org

Wed Jan  2 12:59:19 UTC 2019

 2:14PM  up 103 days, 20:13, 0 users, load averages: 0.00, 0.01, 0.00

Note: the above speech was mostly to justify my ridiculous uptimes :P

	venam Offline \| 18-04-2019, 09:34 AM \| #6

(18-04-2019, 09:14 AM)z3bra Wrote: I shutdown my dekstop everyday to save polar bears, and turn it back on in the morning when I use it.

Good point, it's easy to forget that computers use resources.

(18-04-2019, 09:14 AM)z3bra Wrote: When your server boots, it MUST be able to reach 100% of it's usability without any manual intervention.

I agree, especially if your service provider has the possibility to reboot machines without your knowledge. Like that you'll know that everything will be back in place without a human hand.

	pkal Offline \| 18-04-2019, 01:32 PM \| #7

(18-04-2019, 09:14 AM)z3bra Wrote: I shutdown my dekstop everyday to save polar bears, and turn it back on in the morning when I use it.

Now I'm not an expert when it comes to energy usage, but if you suspend or hibernate a device, don't you save some energy as compared to reinitializing the operating system from a cold state over and over again?

	movq Offline \| 18-04-2019, 01:59 PM \| #8

Well, rebooting to "solve an issue" usually means that you have no idea what you're doing, so you hit the panic button, right? That's why it's being ridiculed.

On the other hand, rebooting might be *required* to indeed solve an issue, if your operating system is crap. Another reason why it's being ridiculed.

So you're either a bad admin or you're using an inferiour OS. ;-)

Some people make this a strategy ("huh, what's going on ... let's reboot"). This is something that annoys me very, very much, especially at work. After the reboot, the actual error might indeed be gone, but ... me, as a sysadmin, I have nothing left to debug. I now have to wait until the error shows up again. And *then* I can try to find the real cause, assuming I get there first. Sometimes that's impossible, so I get the same error report from the user over and over: "$foo broke again, so I had to reboot!" Good lord.

All this has nothing to do with high uptimes, of course. My machines all have an uptime of just a few days or hours, but that's simply because I don't care about it. I don't try to prolong the uptime just for the sake of it. :-) I boot my computer in the morning and shut it down when I'm done. Servers, well, they keep running of course, but I reboot them on updates, too, so their uptime doesn't exceed a few days or weeks, either.

Why reboot my private servers after updates? a) As z3bra said, test-reboots safe you from trouble later on, b) I'm simply not bound by strict SLAs that force me to use something like kexec. :-) Just reboot, it's the easiest thing to do, why add any more complexity? It just takes a couple of seconds on my servers ... Yes, if you run massive applications, you might have a different opinion here.

I also rarely use standby/hibernation. It tends to introduce complexity (little annoying things like needing to re-apply hdparm after standby) with little gain, and to be honest, it happened to be a little buggy for me in the past. If I do use standby, I do it because I have to go to another office building and I don't want to re-open all my windows when I get there. :-)

Finally, as I do a loooooot of work in /tmp, rebooting has the nice effect of cleaning up stuff. This includes browser sessions, XDG caches, stuff like that. I only keep files that I really want to keep.

	prx* Offline \| 19-04-2019, 04:30 AM \| #9

(18-04-2019, 09:14 AM)z3bra Wrote: I think rebooting is pretty important in a computer's life, be it a laptop, desktop or server.

I shutdown my dekstop everyday to save polar bears, and turn it back on in the morning when I use it.
For my servers, I reboot for each OpenBSD upgrade at least, and every now and then, just to make sure that everything comes back up as expected.

This last point is very important to me, especially in the era of auto-configured servers, micro-services shit and similar stuff. Rebooting your service makes sure that it is configured in a reliable and reproducible state.
When your server boots, it MUST be able to reach 100% of it's usability without any manual intervention. If it does not, it means you have a configuration problem.
When I perform "reboot tests" on my servers, I tend to reboot multiple times in a row, to make sure I'm always in a good shape and everything goes in a pretty deterministic way.

Of course, I find huge uptimes to be sexy. I used to value that a lot!
But as time goes by, I realized that my fear of rebooting was growing exponentially to the uptime.

Big uptimes used to be a proof of reliability, in a time were services were bound to the servers they were running on. Nowadays, it's fairly easy to keep a box up and running for years. All you have to do is waiting.

I totally agree.
I think it's a huge mistake to consider a server with a big uptime as reliable. It doesn't mean it's functional or secure or up to date.

If it is necessary to reboot, just do it. At least for a server.
On you desktop, shut it down when you don't use it. It saves power, atmosphere and money.

	xero Offline \| 19-04-2019, 03:44 PM \| #10

my personal machines are like booleans... either on when im using them or off when im done. none of that s{leep,uspend} bs.

as for servers i only really reboot them when i make a ton of up{dat,grad}es and need to be sure they're working as expected.

	venam Offline \| 21-04-2019, 05:47 AM \| #11

There's an issue I remembered that almost always forces me to reboot my machines.
On certain Linux based distros after upgrading the kernel you loose the ability to load kernel modules as they'll be installated in /lib/module-<whatever new version> instead of the currently running kernel.

For instance this is noticeable if you've been running a machine for a while without rebooting, during that time you've upgraded the kernel without anything out of the ordinary happening, then one day you plug an external device that needs a module and the module can't be loaded.

Thus, at every kernel update I've built the habit of rebooting.

	JoshuaRLi Offline \| 21-04-2019, 06:19 PM \| #12

(21-04-2019, 05:47 AM)venam Wrote: There's an issue I remembered that almost always forces me to reboot my machines.
On certain Linux based distros after upgrading the kernel you loose the ability to load kernel modules as they'll be installated in /lib/module-<whatever new version> instead of the currently running kernel.

For instance this is noticeable if you've been running a machine for a while without rebooting, during that time you've upgraded the kernel without anything out of the ordinary happening, then one day you plug an external device that needs a module and the module can't be loaded.

Thus, at every kernel update I've built the habit of rebooting.

This, Arch Linux does this and it's quite annoying.

	z3bra Offline \| 22-04-2019, 11:31 AM \| #13

Arch Linux doesn't do this. It's what you get with libe kernel upgrades, so every rolling release has this issue.
It's also an issue with crux for example where you upgrade the kernel yourself whenever you want.
Kernel headers are left in place so you can compile new modules and load them, but it might break afyer reboot

	Dworin Offline \| 08-05-2019, 05:07 AM \| #14

(22-04-2019, 11:31 AM)z3bra Wrote: Arch Linux doesn't do this. It's what you get with libe kernel upgrades, so every rolling release has this issue.

I remember from my Debian days that there were two kernels installed. That way, the modules from the old kernel version would still be around as it was the other kernel that got overwritten after kernel upgrades. This was really convenient.
Debian may not be a rolling release but security updates are very frequent, or else I may have missed your point.

	Deathbox Offline \| 19-07-2019, 07:31 AM \| #15

I have a HP microserver N36L that I use as my home NAS for last few years. It's still going strong, my uptime is usually 2-3 months. Sometimes the power needs to go off, right now it's sitting at 29 days. Like you've said I only reboot when I feel that all other solutions have failed. My daily driver docked T430 gets shutdown when it's not in use but I often leave it on for days at a time, swapping the battery for the 9 cell if I'm on my way to University.

View a Printable Version