Splitting PDFs
2018-07-07
Manipulating PDFs is quite a bit more difficult than I was expecting. Someone posted on the Mac Power Users forum asking if there was a way to programmatically split a pdf into multiple pdfs of differing lengths. In other words, taking a 10 page pdf and splitting it into 4 pdfs with 1,3,4, and 2 pages respectively.
Much of the difficulty I found with this was in conceptualizing how to keep track of the current page and which pdf it becomes a part of. This is what I used.
# For every split option
for index, split in enumerate(splits):
new_file = PyPDF2.PdfFileWriter()
# For every page that adds to
for count in range(int(split)):
page = self.original_reader.getPage(count)
new_file.addPage(page)
file = open(self.path + '/' + self.out_name + '-page' + str(index) + '.pdf', 'wb')
new_file.write(file)
print self.out_name + '-page' + str(index) + '.pdf' + ' has been made.'
file.close()
Where splits is the array generated from user input:
# Get path
path = raw_input("Which PDF would you like to split? (full path)")
# Get name of output files
name = raw_input('What would you like the output files to start with?')
# Get split of pages
# NOTE: Must add up to number of pages in PDF or script will crash.
splits = raw_input('Tell me how to split it up! i.e. (1, 3, 7)')
# Generate list of ints from split
splits = list(splits)
# Convert to list of numbers
numbers = []
for split in splits:
if str(split).isdigit():
numbers.append(int(split))
splitter = pdfSplitter(path, name, numbers)
splitter.split()