Recently, I was notified of an “out of disk space” error message which showed up in one of our applications while trying to send an email. I knew right away the source of the problem and while the short term solution was easy (I renamed the problem directory and created a new one – these files were only for logging) we definitely needed a long term solution.
All I found in the logs were a bunch of “Too many links” error messages and didn’t know exactly what it could mean. Hey, I’m a programer, not a systems admin; this was a new one for me. I could see we hadn’t run out of disk space but I knew “links” had to do with the filesystem.
So what is “Too many links” and how do we fix it? There was still plenty of disk space but the system was refusing to create a new subdirectory. As it turns out, there’s a limit on these.
We had recently added logging of every outgoing email. This creates an entry in the database with some details plus we store the body and any attachments on the disk. For each database log entry there is a corresponding directory to hold the email body and attachments. This was going fine till we hit 31,998 log entries (and therefore 31,998 subdirectories).
On an ext3 file-system 31,998 subdirectories is the limit. Each subdirectory is considered a “link” by the system and this is where we get the somewhat cryptic “too many links” message.
By the way, there is technically a limit of 32,000 subdirectories but each directory always includes two links – one to reference itself and another to reference the parent directory – that leaves us with 31,998 to work with. You can easily find the number of subdirectories with one of these bash commands:
# If you’re in the directory to explore
ls -d */ | wc -l
# If not
ls -d your/path/*/ | wc -l
Now that I knew the problem, I had to find a solution. Ryan Rampersad has a good solution there which involves hashing the file name, but I wanted to be able to find the files easily (as a human) without having to hash a value. Also, I wanted to apply this to the three other logs we’re writing to disk in a similar manner (and any others we may add in the future).
Each of these log systems utilizes primary keys. This means I could use that unique integer in the place of Ryan’s hash – making it easier to navigate while having no chance of a collision.
There were many ways I could have divided up the groups of directories I needed to get around this limit. I wanted to keep it all easy to remember and use (as a human) when necessary. In the end I went with a billion/million/thousand schema. Each subdirectory’s name will be a number based on the primary key of the log entry we’re working with.
As an example, suppose the ID in question is 5397248. This is 0 billion, 5 million, 397 thousand. So the path would look like emails/0/5/397/log_5397248 using the billion/million/thousand schema. The final directory (into which we will put our files) is the entire ID prefixed with log_ so that it is distinguished from the others and easily recognizable. Having the full ID in the final containing directory can be helpful if we need to send that one directory to someone or otherwise deal with it outside of that path.
With this schema there will never be more than 1,000 subdirectories in any given directory. Using our example above, only a number like 5397XXX will use the path 0/5/397/ and XXX can only be 000-999. One thousand is far under the 32,000 limit and we have them organized logically.
This schema will allow for 31,998 billion total log entries which I should never see in my lifetime (based on projected use). It can easily be extended, however, to account for trillions and so on as the application demands.