How to Generate PDFs in Python for Google App Engine

One of my last projects based on google app engine and python involved storing form data in GAE datastore and generating PDF documents that the user can download. Whilst data storing was the easier part as google’s big data API it is pretty well documented, the trickier aspect was to convert it to PDF using python.

pdf

This was especially difficult in the face of GAE not providing an easy mechanism for disk writing that most PDF generation libraries require. To share my endeavors, I’m writing this post about how to generate pdfs in python for Google app engine.

The solution I came across was, as far as I know, the only possible way of generating PDFs in python! There are about three PDF generation utilities in python, each differing in terms of their area of usage:

I figured out after researching the above three libraries that a combination of xhtml2pdf and pyPDF is what I needed. Since I already had the html document template ready, I just put placeholders for my form data like __name__ , __occupation__, etc so that I can fill these before converting to PDF.

Now, I could fill these values from my python program, but the real challenge was storing the resulting PDF to disk, which was not allowed by google app engine! Turns out, we don’t need to actually store anything to disk. By sending the CreatePDF() output to a StringIO object, which is stored in memory instead of the filesystem, I could bypass the need to actually store anything to disk!!

f=open('template.htm','r')
sourceHtml = unicode(f.read(), errors='ignore')
f.close()
sourceHtml = template.render(tvals)
sourceHtml = sourceHtml.replace('__name__',sname)
sourceHtml = sourceHtml.replace('__address__',saddress)
sourceHtml = sourceHtml.replace('__occupation__',will.occupation)
packet = StringIO.StringIO() #write to memory
pisa.CreatePDF(sourceHtml,dest=packet)

Now, it would have been simple to just self.response.write(packet) to send this pdf download to the user, but in my case, I had to merge this generated pdf with another template-pdf which contained information like symbols, images and page-numbers that for some reason, could not be placed into the html document. So, I had to create a PdfFileReader object (coutesy of PyPDF library!), and then merge each page of my generated document with this template document. Then where do I write this merged output? Any guesses? – another StringIO object!! And then finally, write this StringIO object to self.response, so the user can download it.

packet.seek(0)
new =PdfFileReader(packet) #generated pdf
template = PdfFileReader(file("template.pdf", "rb")) #template pdf
output=PdfFileWriter() #writer for the merged pdf
for i in range(new.getNumPages()):
    page=template.getPage(i)
    page.mergePage(new.getPage(i))
    output.addPage(page)

outputStream = StringIO.StringIO()
output.write(outputStream) #write merged output to the StringIO object

self.response.headers['Content-Type'] = 'application/pdf'
fname = (will.name if mirror=='n' else will.partner)
self.response.headers['Content-Disposition'] = 'attachment; filename=' + str(fname).replace(' ','_') + '.pdf'
self.response.write(outputStream.getvalue())

Remember to add and include the below libraries before you do this:

import StringIO
import xhtml2pdf.pisa as pisa
from pyPdf import PdfFileWriter,PdfFileReader
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.pdfbase import pdfmetrics,ttfonts

 

References:
https://developers.google.com/appengine/docs/python/datastore/
http://www.reportlab.com/software/documentation/
http://www.xhtml2pdf.com/
https://pypi.python.org/pypi/pyPdf
http://stackoverflow.com/questions/4899885/how-to-set-any-font-in-reportlab-canvas-in-python
http://stackoverflow.com/questions/13522638/paginating-in-pisa-xhtml2pdf-just-repeats-last-page
http://docs.python.org/library/stringio.html
 

10 comments

  1. Punita Ojha says:

    Nice article . I would want to say that this is one of the best articles, I encountered on a web search . I tried this but I am still facing some difficulties . Will this work without using a django app . I want to use it in Google App Engine Project .

  2. linhui718611 says:

    Hello, thanks for your post.
    I can not install xhtml2pdf package to google appengine.
    When deploy project I got following error.

    running build_ext
    The headers or library files could not be found for zlib,
    a required dependency when compiling Pillow from source.
    Please see the install instructions at:
    https://pillow.readthedocs.io/en/latest/installation.html
    Traceback (most recent call last):
    File “”, line 1, in
    File “/tmp/pip_build_dservice/Pillow/setup.py”, line 804, in
    raise RequiredDependencyException(msg)
    __main__.RequiredDependencyException:

    And this is my requirements.txt file.
    Flask==0.12.1
    Werkzeug==0.12.2
    Cerberus==1.1
    attrs==16.3.0
    SQLAlchemy==1.1.9
    python-dateutil==2.6.1
    GoogleAppEngineCloudStorageClient>=1.9.22.1,<2.0.0.0
    XlsxWriter==0.8.7
    whtml2pdf==0.21

    Then this is app.ymal file.
    libraries:
    – name: MySQLdb
    version: "1.2.5"
    – name: PIL
    version: "1.1.7"

    Expect your quick reply.
    Thanks.

    • Prahlad Yeri says:

      If you cannot perform a ‘standard install’ using pip, then just extract their repo from below github link and then copy the xhtml2pdf package folder into your project root:

      https://github.com/xhtml2pdf/xhtml2pdf

      If I remember correctly, that’s how I had used it last time. Also, google app-engine doesn’t allow a pip install of all python packages, but only those in the approved list from Google, at least that’s how it was the last time I’d used it.

  3. linhui says:

    Thanks for your reply.
    But is there any other solution?

    • Prahlad Yeri says:

      None that I’m aware of. Your problem relates to google app-engine’s restriction of python package installs to their own filtered list. I’m not aware of any way to install custom packages except the one that I suggested (copy the package folder to your own app folder).

      Try posting in appengine specific forums or google groups to find more about it.

  4. linhui says:

    Is it possible for google appengine?
    I know it’s possible only in google computer engine.
    Maybe I have to create computer engine instance?

  5. linhui says:

    Which packages I have to upload?
    There are following packages in xhtml2pdf.
    html5lib, httplib2, PIL, pip, pkg_resources, PyPDF2, reportlab, setuptools, webencodings, xhtml2pdf.

    • Prahlad Yeri says:

      When I had used it with app-engine, I had copied all the following in my source folder:

      1. xhtml2pdf
      2. reportlab
      3. PyPDF (now PyPDF2)
      4. html5lib

      But that was back in 2014 when app-engine hardly allowed you to install any official python packages at all and my app made extensive use of all of them. I don’t know about the current situation, you can still copy all four of them, or try installing some by “official” way of app-engine first and then copy the rest, its up to you! If you are in doubt, I’d recommend to copy all four of them.

  6. linhui says:

    Thank you very much.

Leave a Reply

Your email address will not be published. Required fields are marked *