venam
Hello fellow nixers,
In this thread we'll discuss an old myth that we've actually never brought as a discussion on the forums yet: To reboot or not to reboot.

It is often said that rebooting to solve an issue is the equivalent of giving up. It might be the norm on some other operating systems but on Unix-like OS with users that should intimately know their machine it's regarded as heresy.

having a high uptime is looked at as an epic tale. A show off of stability and rigidness.

https://hardware.slashdot.org/story/13/0...-of-uptime
https://web.archive.org/web/201806232200...cs/uptimes
https://www.infoworld.com/article/262344...boxes.html

[Image: devotion_to_duty.png]

There's even a command uprecords for keeping track of reliability.

However, sometimes, especially when the machine runs as a development one and not a server, it seems mandatory to reboot. Let's say for a kernel upgrade or full system upgrade. Or init system upgrade. Or hardware upgrade. Or even in some cases some kernel modules update (if the system doesn't have a dynamic way to load those).

Still even when updating the kernel some OS, for example Linux, allow to update/patch the kernel without rebooting via features such as kexec.

So here we go:
Nixers, what's your opinion on this topic of rebooting. Do you reboot often, on which type of machines. Do you have any special tricks, any story to share about a time you rebooted and it was worthless, or a time when you actually never found out why the issue was happening in the first place.



My take on the topic. I usually avoid rebooting unless there's a kernel update. I have uptimes going around a 100 days in a row.
On my server I actually almost never reboot.

Code:
00:27:49 up 391 days, 10:18,  1 user,  load average: 0.00, 0.00, 0.00
Code:
00:42:28 up 359 days, 11:43,  1 user,  load average: 0.08, 0.03, 0.05
So on both of my two servers I haven't rebooted in around a year.

When the service is running fine I think there's not much to worry about unless there's an important security update to do on the kernel.

I reboot most often, on my home machine and work dev machine, when I run into the kind of issues that make me pull my hair off and after trying everything else and that didn't work.

My latest story was about a card reader that wasn't being recognized by the official proprietary driver. For those who have no clue about card readers on Unix, they work through a daemon called pcscd that will match the appropriate driver (/usr/lib/pcsc/drivers/ for instance) depending on the device vendor ID and product ID. There are some drivers that are open that can come from the package manager or along with other tools such as opensc.
In all cases, the reader I wanted to try wasn't recognized by pcscd even after following all I thought was right. So I gave up, thought something might have been messed up in udev or something that I didn't caught.

But still after rebooting there was nothing.
Oh no the uptime went away!

Digging further in the issue. Checking dmesg for the venderID and productID that was being reported and trying to match it with the Info.plist that should contain what points to the proprietary driver, I found out that the IDs aren't even listed. I tried adding them but in vain, the driver doesn't support the device. Apparently they forgot to update their Linux driver.

So I ended up using another card reader (acr38) that's widely supported.

Reboot lesson learned.


So what do you think about rebooting and how frequently it should be done.
biniar
(18-04-2019, 01:53 AM)venam Wrote: I reboot most often, on my home machine and work dev machine

Ditto.

(18-04-2019, 01:53 AM)venam Wrote: So what do you think about rebooting and how frequently it should be done.

Only when necessary for production, esp. for bigger groups of people that rely on $services.

If using a web server that doesn't host mission critical $services or something, reboots might not be a big concern. Anything in the virtualization realm as far as uptime is considered should not be taken seriously in my opinion.
zge
I usually don't ever reboot my machines, except if I accidentality hit the power button, and it turns of.
jkl
I reboot to apply kernel upgrades and/or - on FreeBSD - to solve the problem of a full RAM. (Weirdly, FreeBSD doesn't clean it up too well.)
z3bra
I think rebooting is pretty important in a computer's life, be it a laptop, desktop or server.

I shutdown my dekstop everyday to save polar bears, and turn it back on in the morning when I use it.
For my servers, I reboot for each OpenBSD upgrade at least, and every now and then, just to make sure that everything comes back up as expected.

This last point is very important to me, especially in the era of auto-configured servers, micro-services shit and similar stuff. Rebooting your service makes sure that it is configured in a reliable and reproducible state.
When your server boots, it MUST be able to reach 100% of it's usability without any manual intervention. If it does not, it means you have a configuration problem.
When I perform "reboot tests" on my servers, I tend to reboot multiple times in a row, to make sure I'm always in a good shape and everything goes in a pretty deterministic way.

Of course, I find huge uptimes to be sexy. I used to value that a lot!
But as time goes by, I realized that my fear of rebooting was growing exponentially to the uptime.

Big uptimes used to be a proof of reliability, in a time were services were bound to the servers they were running on. Nowadays, it's fairly easy to keep a box up and running for years. All you have to do is waiting.

I think that today's true goal is to reach 99.9999% service uptime with servers rebooting every hour.

Now, all this said, the facts:

Code:
$ for host in doom apophis daemia lucy klesk bitterman; do
> ssh -n $host 'hostname; cat /BIRTHDAY; uptime'
> echo
> done
doom.z3bra.org
Tue Apr 18 06:46:04 UTC 2017
14:14:01 up  5:54,  2 users,  load average: 0.15, 0.11, 0.09

apophis.z3bra.org
Tue Jan 15 19:32:01 UTC 2019
2:14PM  up 26 days, 14:44, 1 user, load averages: 0.00, 0.00, 0.00

daemia.z3bra.org
Thu Jun 22 22:49:37 UTC 2017
2:14PM  up 21 days, 14:40, 0 users, load averages: 0.60, 0.64, 0.59

lucy.z3bra.org
Fri Dec 28 15:34:14 UTC 2018
2:14PM  up 107 days, 22:10, 0 users, load averages: 0.02, 0.02, 0.00

klesk.z3bra.org
Thu Apr 11 15:03:21 UTC 2019
2:14PM  up 6 days, 22:38, 0 users, load averages: 0.00, 0.00, 0.00

bitterman.z3bra.org
Wed Jan  2 12:59:19 UTC 2019
2:14PM  up 103 days, 20:13, 0 users, load averages: 0.00, 0.01, 0.00

Note: the above speech was mostly to justify my ridiculous uptimes :P
venam
(18-04-2019, 09:14 AM)z3bra Wrote: I shutdown my dekstop everyday to save polar bears, and turn it back on in the morning when I use it.
Good point, it's easy to forget that computers use resources.

(18-04-2019, 09:14 AM)z3bra Wrote: When your server boots, it MUST be able to reach 100% of it's usability without any manual intervention.
I agree, especially if your service provider has the possibility to reboot machines without your knowledge. Like that you'll know that everything will be back in place without a human hand.
zge
(18-04-2019, 09:14 AM)z3bra Wrote: I shutdown my dekstop everyday to save polar bears, and turn it back on in the morning when I use it.

Now I'm not an expert when it comes to energy usage, but if you suspend or hibernate a device, don't you save some energy as compared to reinitializing the operating system from a cold state over and over again?
vain
Well, rebooting to "solve an issue" usually means that you have no idea what you're doing, so you hit the panic button, right? That's why it's being ridiculed.

On the other hand, rebooting might be *required* to indeed solve an issue, if your operating system is crap. Another reason why it's being ridiculed.

So you're either a bad admin or you're using an inferiour OS. ;-)

Some people make this a strategy ("huh, what's going on ... let's reboot"). This is something that annoys me very, very much, especially at work. After the reboot, the actual error might indeed be gone, but ... me, as a sysadmin, I have nothing left to debug. I now have to wait until the error shows up again. And *then* I can try to find the real cause, assuming I get there first. Sometimes that's impossible, so I get the same error report from the user over and over: "$foo broke again, so I had to reboot!" Good lord.


All this has nothing to do with high uptimes, of course. My machines all have an uptime of just a few days or hours, but that's simply because I don't care about it. I don't try to prolong the uptime just for the sake of it. :-) I boot my computer in the morning and shut it down when I'm done. Servers, well, they keep running of course, but I reboot them on updates, too, so their uptime doesn't exceed a few days or weeks, either.

Why reboot my private servers after updates? a) As z3bra said, test-reboots safe you from trouble later on, b) I'm simply not bound by strict SLAs that force me to use something like kexec. :-) Just reboot, it's the easiest thing to do, why add any more complexity? It just takes a couple of seconds on my servers ... Yes, if you run massive applications, you might have a different opinion here.

I also rarely use standby/hibernation. It tends to introduce complexity (little annoying things like needing to re-apply hdparm after standby) with little gain, and to be honest, it happened to be a little buggy for me in the past. If I do use standby, I do it because I have to go to another office building and I don't want to re-open all my windows when I get there. :-)

Finally, as I do a loooooot of work in /tmp, rebooting has the nice effect of cleaning up stuff. This includes browser sessions, XDG caches, stuff like that. I only keep files that I really want to keep.
thuban
(18-04-2019, 09:14 AM)z3bra Wrote: I think rebooting is pretty important in a computer's life, be it a laptop, desktop or server.

I shutdown my dekstop everyday to save polar bears, and turn it back on in the morning when I use it.
For my servers, I reboot for each OpenBSD upgrade at least, and every now and then, just to make sure that everything comes back up as expected.

This last point is very important to me, especially in the era of auto-configured servers, micro-services shit and similar stuff. Rebooting your service makes sure that it is configured in a reliable and reproducible state.
When your server boots, it MUST be able to reach 100% of it's usability without any manual intervention. If it does not, it means you have a configuration problem.
When I perform "reboot tests" on my servers, I tend to reboot multiple times in a row, to make sure I'm always in a good shape and everything goes in a pretty deterministic way.

Of course, I find huge uptimes to be sexy. I used to value that a lot!
But as time goes by, I realized that my fear of rebooting was growing exponentially to the uptime.

Big uptimes used to be a proof of reliability, in a time were services were bound to the servers they were running on. Nowadays, it's fairly easy to keep a box up and running for years. All you have to do is waiting.
I totally agree.
I think it's a huge mistake to consider a server with a big uptime as reliable. It doesn't mean it's functional or secure or up to date.

If it is necessary to reboot, just do it. At least for a server.
On you desktop, shut it down when you don't use it. It saves power, atmosphere and money.
xero
my personal machines are like booleans... either on when im using them or off when im done. none of that s{leep,uspend} bs.

as for servers i only really reboot them when i make a ton of up{dat,grad}es and need to be sure they're working as expected.




Members  |  Stats  |  Night Mode