We recently migrated our entire product to Apple Unified Logging due to the various benefits it provides. However we immediately started hitting the "log quarantine" problem ("QUARANTINED DUE TO HIGH LOGGING VOLUME"). This is partly because we are indeed over logging in a few cases (which we have to work on fixing), but also partly because it's a complicated product with potentially hundreds of libraries, and some of the code can legitimately be very busy. For example we have a system extension that's implemented both as a NetworkExtension client and an EndpointSecurity client, if we were to log decent information about each network or file system event so we can troubleshoot something, they are bound to be high volume logs.
Now when our app is running in a normal user environment, this is not a problem. We can disable certain heavy log levels, or at least disable persisting for certain logs (one of the benefits of Apple Unified Logging we really like is that it allows very flexible controls, log config command, OSLogPreferences, configuration profile, we can employ whatever that suits a specific case). But ultimately, the question is what if we end up with a troubleshooting case we don't know exactly where a problem is so we just need the full logs at debug level? And not only just enabled, but because we might not know when the issue can happen either we also need to persist the full set of logs for as long as possible? We will start hitting log quarantine again. Granted this is a very extreme case, but if worst comes to worst, how can we even do that with Apple Unified Logging? Is there an option that allows us to override the quarantine, if but temporarily?
I've searched a few relevant forum posts, some of which described log quarantine but no one had mentioned any solution for it (besides having to stop logging so much from the app but as I explained we do have legitimate cases where log volume can still be huge). I've also read The Eskimo's "Your Friend the System Log" and browsed some of the troubleshooting config profiles provided by Apple hoping to discover some hidden payloads but found none so far.
There is an OSLogRateLimit environment variable that I noticed if I run a launchctl print system/<a-launch-daemon-lable> and it's usually 64. Is this something relevant? And knowing Apple it's probably something that can't be tampered with?
Well, that was fun.
Lemme start with some disclaimers.
It’s impossible to talk about logging at this level without discussing implementation details, that is, information about how the system works today but which isn’t considered API. This stuff has changed in the past and could easily change again in the future. Don’t build knowledge of this into a product that you ship to a wide range of users.
Also, the limits imposed by the system log are not arbitrary. They represent a trade-off between convenience — when debugging problems that come in from the field, more logging is always better — and cost. There are three specific costs of concern:
- Logging consumes CPU cycles, which leaves less available for real computation and also takes energy.
- Persistent logging consumes I/O bandwidth and even more energy.
- Persistent logging can contribute to SSD wear.
The system log is a shared resource and it’s important to Apple that it remain useful for debugging a wide range of problems. I touch on this in Your Friend the System Log and I recently went into more detail in this post.
All of the above puts a limit on what I can actually talk about here. I don’t mind straying a little into the world of implementation details, but I’m not going to fully describe everything.
And with that out of the way, let’s return to the actual issue.
Lemme start with a very high-level description of how quarantine works:
- The system log regularly rotates its log files [1].
- When it rotates a file, the system calculates how fast it filled up.
- If it filled up too quickly, that’s a sign of problems, so it takes a deeper look at the cause.
- If it finds that a process logged too much, it quarantines that process.
- Once quarantined, the process stays that way until it terminates.
- That causes its log entries to be dropped.
IMPORTANT This process is based on log entries that persist. Non-persistent log entries aren’t a factor here.
I’m being deliberately vague about what constitutes “too quickly” and “too much”. Sorry.
So, coming back to your direct question:
Is there an option that allows us to override the quarantine, if but temporarily?
AFAICT there’s no good [2] way to do this on a public release of macOS.
Now, normally this is the point where I suggest filing an enhancement request. However, your current situation is largely theoretical: You’re concerned that you might encounter a problem in the future where debugging that problem requires you to enable persistent logging for all your log points. I’m not confident that an ER based on that theoretical concern will get traction.
So I guess my advice here is:
- Structure your logging subsystems and categories to give you flexibility in how you enable and persist them your log entries.
- Deploy this in the real world.
- If you hit a concrete case where you can’t get the logging you need, file an enhancement request with those details.
One other thing to note is that log snapshots (log collect) capture ephemeral log entries. So, you can take a page out of DTrace’s speculative logging approach and log non-persistently until you see a problem and then trigger a log snapshot to go ‘back in time’.
Oh, and speaking of DTrace, that’s still available on macOS. You have to disable SIP [3], but that’s generally acceptable when you dealing with the really gnarly problems.
Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"
[1] This isn’t as simple as traditional Unix-y logs, where there’s a single text file that gets rotated, but the general concept of log file rotation applies.
[2] And by “good” I mean something that I’m comfortable sharing here. Honestly, I’m not sure if this qualifier is even required. The controls that I uncovered to disable quarantine are not available on public releases of macOS.
[3] At least partially (-:
https://stackoverflow.com/questions/60908765/mac-osx-using-dtruss