Python by itself doesn't a native module to read PDF files. So, we have to rely on a 3rd party tool, Apache Tika. The Apache Tika™ toolkit can extracts text from over a thousand different file types such as PPT, XLS, PDF and etc.
Installation
pip install tika
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Read PDF file
from tika import parser raw = parser.from_file('sample.pdf') print(raw['content'])