Since a lot of what everyone does on those pesky devices called “comp-you-tars” is becoming increasingly more business-critical, and we’ve come to a point where a web company that has “one server that we all use” is going nowhere, we have piles of lovely silicon and metal, with electric pulses flowing through them to create the world as we see it today.
I love these machines, as they have extended our abilities far beyond a single person, they have connected us in ways that our ancestors could only imagined and written about in fiction, and they provide a central part of our everyday lives.
Developing complex systems has provided us with a challenge of building and maintaining large amounts of machines, and done correctly, a single person can easily control thousands, if not tens-of-thousands, of machines with a high degree of stability, confidence and grace.
Back in the olden days, systems were small, resource constraints were very much a real problem, and this provided developers the incentive, nay, the requirement, of knowing about their system and how to write efficient and clean code within the constraints.
As time goes by, each resource constraint is alleviated, for a while, by hardware manufacturers making bigger and better toys. More CPU power, more Memory, bigger hard drives, faster links. Moving the bottleneck around. “Oh, your application needs more memory to run? Let’s get more memory!”  “Not enough processing cycles to do your bidding? Let’s get some more powerful servers!” This is definitely one approach to achieve results.
Now, as we progress into complex systems, more often than not nobody is sitting there watching the bits and bytes flow from place to place, rather a system is designed to do certain things, and usually runs in a “headless” mode – meaning there’s no monitor, no keyboard, what a lot of people would term “a server” today.
Because of the amount of servers we have, and because there’s more and more complicated processes happening on a given server, a common practice is to create log file, entries of what happened and when, commonly used in what is termed “the debug process”.
Some log files provide more use than others, such as HTTP access logs – some business live and die by recording the amount of traffic that come into their site, and mine that data for interesting patterns, such as where people are visiting from, what kind of browser they use, and much more.
Now, don’t get me wrong – logs are absolutely great when you want to figure out “what happened” – but this comes at a cost.
A lot of software today has the concept of logging levels – critical, error, warning, info, debug – allowing the user (developer, system admin) to define how deep down the rabbit hole they want to go and keep record of it. new software packages should be written to with the same kind of thought process – log critical problems in the “critical” layer and do not log debug messages at the same place.
Since some systems log a lot of theoretically useful information at the info and debug levels, some people would argue that leaving the high levels of logging on all the time is inherently useful, and then you have some sort of record already of what happened, and do not need to try to reproduce the issue again.
I do not doubt the usefulness of this attitude, but this requires a degree of discipline in order to maintain a healthy server.
What discipline you ask? The discipline to realize that the Observer Effect may come into play when dealing with log files, as well as resource constraint management. The resources typically consumed in a logging scenario are disk space, as well as I/O.
If you log to much, then the disk I/O may be too slow for you to write each log entry, thereby slowing down the performance of the application.
If you log too much, you may run out of disk space, thereby causing other problems with your application, like it hanging, crashing, exploding, or general unhappiness.
So while we get bigger and better hardware, this constraint is seldom dealt with, only postponed by getting bigger drives, faster drives, etc.
Instead of figuring out the answer to the real question: “How much logging do I need?” and “How long do I need it for, realistically?”
The level of logging should be determined by the application developer, and should hopefully follow some sort of standard. A few good posts on this are here, here and here.
The other part, the retention, should be a combination of the developer’s realistic view of how far back they are going to use the data, as well as the physical constrains of the real world. And find a happy medium.
But do not simply let log files run amok, or you will spend countless hours to determine which files to remove, and sometimes arbitrarily remove the oldest X amount, because you’ve run out of space, just to keep your application running.
Enter a wonderful tool called “logrotate“. I don’t think I’ve seen a Linux system without it. It typically runs once a day, and performs some simply scripted actions to move files around. Most software packages include an /etc/logrotate.d/
directive file to handle the application-specific logs.
The structure of this file is fairly simple, the amount of directive is takes is not overwhelming, and is very well documented.
So there’s really no excuse to not have a rotation file for your application. And make everyone’s lives a little easier.
So do your fellow tech people a favor – if you write a log file, make sure you handle (aka rotate) that log file.
That’s not too much to ask, is it?