Joining SpringerLink pdfs

My workplace has a subscription to SpringerLink. So I’ve been having a blast looking at the different books over there. I found lots of books that are on my Amazon wish list and so i was like a child at a candy store.

The problem is that the books are provided via separate pdf files for chapters. I am not sure why they do that, I think to make it tougher to pirate the files? Although that doesn’t really make any sense. Regardless, it is a bit annoying and I’d like to have the books as one pdf to transfer over to my trusty Sony Reader(more about it later) to read them while on the go.

So i hacked up a python script to rename the individual files by their chapter number making it easier to join. I then hacked up another script that reads these files and joins them in correct order into one file. Sadly, the scripts are only tested on Linux and will probably only run there. The first version used pyPDF and was probably cross-platform, but it had some problems with the pdfs so i reverted to using pdftotext that, as far as i know, only runs under a GNU Linux system.

This is what the code looks like for the script that does the renaming.

This is what the code looks like for the script that does the joining. pyPDF is needed for this to function.

You can download the scripts from the My Projects section. The code will be licensed under the GNU GPL.

You can check the code in its updated glory at my github account.

Note to publishers, you really should step into the 21st century. Especially for the technical books, I really don’t want to pay shipping costs that are more than the price of the book, the printing costs and all the other unneeded crap…I just want the content. This is especially true for technical books that get outdated very quickly such as books about frameworks.

3 replies on “Joining SpringerLink pdfs”

  1. Hi,

    nice scripts you’ve got there – thanks for sharing.
    I’m in a comparable situation, my university has a Springerlink subscription, and I’d like to read their books on a Sony reader.
    The issues I’m having aren’t related to them splitting their books in multiple pdfs though.
    It’s the enormous amount of screen space wasted by the borders, and the inability of the reader to change the fontsize when there are diagrams present – which is every page for the books I’m reading.
    Converting the pdfs to epubs didn’t work out very well.
    Have you come across those issues, and maybe even know of a solution?

    1. Hey JMK,

      I couldn’t agree more. The pdfs have a lot of white space and, as you said, when there are diagrams or code involved, resizing doesn’t quite work. I believe the epub format is better for this sort of thing. Sadly, I haven’t found any conversion tool that does a good job at it. If you have the money, apparently a Kindle DX is much better for big pdfs.

  2. I am really moved by the mode that you write, and the subject is quality. But do you know that Kindle DRM has been cracked for about a year. It?s just standard Mobipocket DRM. I?m not sure if the Kindle 2 uses the same DRM though.

Comments are closed.