My workplace has a subscription to SpringerLink. So I’ve been having a blast looking at the different books over there. I found lots of books that are on my Amazon wish list and so i was like a child at a candy store.
The problem is that the books are provided via separate pdf files for chapters. I am not sure why they do that, I think to make it tougher to pirate the files? Although that doesn’t really make any sense. Regardless, it is a bit annoying and I’d like to have the books as one pdf to transfer over to my trusty Sony Reader(more about it later) to read them while on the go.
So i hacked up a python script to rename the individual files by their chapter number making it easier to join. I then hacked up another script that reads these files and joins them in correct order into one file. Sadly, the scripts are only tested on Linux and will probably only run there. The first version used pyPDF and was probably cross-platform, but it had some problems with the pdfs so i reverted to using pdftotext that, as far as i know, only runs under a GNU Linux system.
This is what the code looks like for the script that does the renaming.
This is what the code looks like for the script that does the joining. pyPDF is needed for this to function.
You can download the scripts from the My Projects section. The code will be licensed under the GNU GPL.
You can check the code in its updated glory at my github account.
Note to publishers, you really should step into the 21st century. Especially for the technical books, I really don’t want to pay shipping costs that are more than the price of the book, the printing costs and all the other unneeded crap…I just want the content. This is especially true for technical books that get outdated very quickly such as books about frameworks.