How To Extract Data From A PDF Document In JAVA

Posted By : Sanjay Saini | 31-May-2018

Java

Hi Guys,

In this Blog, I'm going to show, how to read/extract data from a PDF using Java Program.Many times we have need to reading PDF and doing some work with PDF data.

In Java, we have an API "PDF BOX" for doing this work easily.PDF BOX API is provided by Apache and it is open source API. It helps us to create, delete and manipulate a PDF document in the application.

Before Writing a sample program I'm giving you brief Detail about this API.

What Is PDF BOX?

Apache PDFBox is a free Java library that helps the improvement and change of PDF papers.Using this library, you can write Java programs that produce, transform and manipulate PDF papers. In addition to this, PDFBox also introduces a command line utility for executing various controls over PDF utilizing the available Jar file.

Features of PDFBox :

Following are the important characteristics of PDFBox −

Extract Text − With the help of PDFBox, you can extract Unicode text from PDF documents.

Break & Mix − With the help of PDFBox, you can divide an individual PDF document into multiple documents, and mix them back into a single document.

Fill Forms − With the help of PDFBox, you can fill the application data in a document.

Print − With the help of PDFBox, you can print a PDF file using the official Java printing API.

Save as Image − With the help of PDFBox, you can save PDFs as image files, such as PNG or JPEG.

Create PDFs − With the help of PDFBox, you can create a new PDF file by building Java applications and, you can also insert images and fonts.

Signing − With the assistance of PDFBox, you can add computerized signs to the PDF records.

Components of PDFBox

The following are the four main components of PDFBox −

PDFBox − This includes the classes and interfaces associated to data extraction and manipulation.

FontBox − This includes the classes and interfaces related to font, and using these classes we can change the font of the text of the PDF document.

XmpBox − This includes the classes and interfaces that manipulate XMP metadata.

Preflight − This part is used to check the PDF files upon the PDF/A-1b measure.

Sample Program for Printing PDF file Data using Java

<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.7</version>
</dependency>

package com.sanjay;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;

import java.io.File;
import java.io.IOException;

public class PrintPdf {

    public static void main(String[] args) throws IOException {

        try (PDDocument pdfDocument = PDDocument.load(new File("F:/Test.pdf"))) {

        	pdfDocument.getClass();

            if (!pdfDocument.isEncrypted()) {
			
                PDFTextStripperByArea pdfTextStripperByArea = new PDFTextStripperByArea();
                pdfTextStripperByArea.setSortByPosition(Boolean.TRUE);

                PDFTextStripper pdfTextStripper = new PDFTextStripper();

                String pdfFileInText = pdfTextStripper.getText(pdfDocument);
              
                String lines[] = pdfFileInText.split("\\r?\\n");
                for (String line : lines) {
                    System.out.println(line);
                }

            }

        }

    }
}

Thanks

Sanjay Saini

Related Tags

About Author

Sanjay Saini

Sanjay has been working on web application development using frameworks like Java, groovy and grails. He loves listening to music , playing games and going out with friends in free time.

Ready to innovate? Let's get in touch

Attach files

Recaptcha is required.

Backend

Full Stack

Frontend

Blockchain

Mobile

Video Streaming

E-commerce

ERP

CMS

Devops

AR/VR

Software Development Services

Metaverse Innovation & Consulting

Digital Experience

Digital Trivergence

Data Services

Scaffold

Company

How To Extract Data From A PDF Document In JAVA

Posted By : Sanjay Saini | 31-May-2018

Related Tags

About Author

Sanjay Saini

More From Oodles

Ready to Expand? Discover PWA Tactics That Drive Growth

In this article, we will explore in-depth how progressive enhancement can elevate your digital expansion approach, its principles, and how you can implement it to improve your accessibility.

Arpita Pal | 22-Jan-2025

Essential Security Practices for Securing Your Business’s PWA

In this article, we will explore the potential threats to your PWA and impactful strategies to safeguard it against these risks.

Arpita Pal | 30-Dec-2024

Elevate Your Digital Presence with Our PWA Optimization Solutions

In this article, we have provided a detailed discussion of the most effective strategies to elevate its user experience, performance and functionality.

Arpita Pal | 29-Nov-2024

Ready to innovate? Let's get in touch

Valued Services

Resources

Expertise

Connect with us

© Copyright 2025 Oodles Technologies Pvt Ltd. All rights reserved.