Context
While there is a certain intellectual satisfaction to be had by getting a system just right, the purpose of monitoring is to help you run whatever it is that you’re monitoring. That could be a website used directly by your customers, a backend only used internally, or even industrial systems. Fundamentally your customers don’t care how well your monitoring is working, they care how well the services they’re interacting with function. Part of you providing that service should involve some level of monitoring (in the broadest sense of the word) to ensure that things aren’t going badly wrong, but that’s just one part of providing the service.
Tradeoff
As with any part of providing a service all of the usual tradeoffs apply, as time and resources aren’t infinite. Getting perfect monitoring at the cost of a service working poorly, being unmaintainable, or requiring strenuous ongoing operational effort is unlikely to be a good tradeoff. The other way applies too of course, having no monitoring would be unwise no matter how good a service is.
The same engineering tradeoffs apply for monitoring and have to be considered in the broader sense of the service. Is it worth having a human immediately jump on every log message that each of your applications produces? How about if the error ratio goes from 0.01% to 0.02%? What if it takes an extra 30s to get an alert notification sometimes, is that okay? Would a team spending year developing a machine learning algorithm to automatically determine the threshold for one alert be a better use of that team than having them develop features for the service itself? Would it be worth having to spend a few hours a year maintaining an intricate alerting rule that avoids one false positive a year, versus a straightforward threshold?
Complexity is never free. Every alert you create, every dashboard, every metric, every log line, every trace span and every if statement has a cost. On one hand in many cases the cost is going to be low enough that it’s not worth worrying about. On the other hand more than once I’ve ended up later regretting being fancy and making things more sophisticated than they really needed to be.
Conclusion
So when designing, improving, or even just tweaking your monitoring, always consider if a simpler solution might make more sense overall.