Python Utilities: Convert Text files(.txt) and Word files(.doc/.docx) to PDF

SaurabhG
2 min readApr 20, 2023

Hi Folks,

Not very often, but sometimes you may have requirement to convert your text files( ends with .txt extension) & Microsoft Word files ( ends with .doc or .docx extension).

In Today’s scenario where data can be manipulated or facing hard times to have structure data in-place, it becomes import to have data in PDF (portable document format) which can be read by PDF reader, Document Scanner or Search services like (Azure Cognitive search service).

Here I am sharing Snippet of code showing how to convert .txt/ .doc / .docx files to pdf.

Convert Text file to PDF

Python package/modules used
OS module (inbuilt. no need to install)
FPDF module (pip install fpdf)
import os
import fpdf

# function declaration
def func_convet_text_to_pdf(txt_file_path, output_dir):
"""
func_convet_text_to_pdf() is method for creating pdf from test file
"""
try:
print(f"Input file :{txt_file_path}\n")
file_name = os.path.splitext(os.path.basename(txt_file_path))[0]

# read file content
text_data = ''
with open(txt_file_path,"r",encoding="utf8", errors='ignore') as data:
text_data = data.read()
# resolve encoding issue
text_data = text_data.encode('latin-1', 'replace').decode('latin-1')

# deaclare pdf
pdf = fpdf.FPDF(orientation='P', unit='mm',format='A4')
pdf.add_page()
pdf.set_auto_page_break(True, margin=4)
pdf.set_font(family='Times',style='',size=10)

pdf.multi_cell(w=0, h=5, txt=text_data,align="L")
# create output file path
output_file_path = f"{output_dir}\\{file_name}.pdf"
pdf.output(output_file_path, 'F')
print("Pdf has been created")
except Exception as e:
print(f"Error occurred for {txt_file_path} : {e}")

# call function

func_convet_text_to_pdf(txt_file_path="txt file path", output_dir="output dir path where pdf to be saved")

Convert Word file to PDF

Python package/modules used
OS module (inbuilt. no need to install)
win32com module (pip install pywin32)
word = win32com.client.Dispatch('Word.Application')
new_name = file.replace(".docx", ".pdf")
in_file =(dirpath + '/'+ file)
new_file =(output_dir_path + '/' + new_name)
doc = word.Documents.Open(in_file)
doc.SaveAs(new_file, FileFormat = 17)
doc.Close()
word.Quit()

These are just code snippet I followed following help/assitence available in stackover flow.

--

--

SaurabhG

I am an enthusiastic learner. Always want to challenge my last learning & keep hunting for new learning. about.me/saurabh.gangrade