一、工具选型与准备
Apache PDFBox :开源 Java 工具包,解析 PDF 准确,支持内容提取、文档创建等,适合初学者。
项目依赖 :在 pom.xml 中添加以下关键依赖:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.27</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</dependency>
</dependencies>
二、Spring Boot 实现 PDF 转图片
1. 文件上传功能
前端表单 :用于上传 PDF 文件。
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>PDF 转换为图片</title>
</head>
<body>
<form action="/upload" method="post" enctype="multipart/form-data">
<input type="file" name="file" accept=".pdf"/>
<button type="submit">上传 PDF</button>
</form>
</body>
</html>
后端接口 :处理文件上传。
@PostMapping("/upload")
public ResponseEntity<String> uploadFile(@RequestParam("file") MultipartFile file) {
try {
// 设置文件保存路径
Path uploadPath = Paths.get("uploads");
if (!Files.exists(uploadPath)) {
Files.createDirectories(uploadPath);
}
// 保存文件到本地
Path filePath = Paths.get(uploadPath + "/" + file.getOriginalFilename());
Files.copy(file.getInputStream(), filePath, StandardCopyOption.REPLACE_EXISTING);
return ResponseEntity.ok("文件上传成功!");
} catch (IOException e) {
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body("文件上传失败!");
}
}
2. PDF 转图片核心逻辑
@Service
public class PdfToImageService {
public void convertPdfToImage(String pdfPath, String outputFolder) throws IOException {
// 加载 PDF 文件
try (PDDocument document = PDDocument.load(new File(pdfPath))) {
PDFRenderer renderer = new PDFRenderer(document);
// 遍历每一页进行渲染
for (int pageIndex = 0; pageIndex < document.getNumberOfPages(); pageIndex++) {
BufferedImage image = renderer.renderImageWithDPI(pageIndex, 300, ImageType.RGB);
// 保存为图片
String imageName = "page_" + (pageIndex + 1) + ".png";
ImageIO.write(image, "PNG", new File(outputFolder + "/" + imageName));
}
}
}
}
3. 图片下载功能
@GetMapping("/download/image/{fileName}")
public ResponseEntity<InputStreamResource> downloadImage(@PathVariable String fileName) throws IOException {
// 设置资源文件路径
String filePath = "uploads/" + fileName;
InputStreamResource resource = new InputStreamResource(new FileInputStream(filePath));
HttpHeaders headers = new HttpHeaders();
headers.add(HttpHeaders.CONTENT_DISPOSITION, "attachment;filename=" + fileName);
return ResponseEntity.ok()
.headers(headers)
.contentLength(new File(filePath).length())
.contentType(MediaType.IMAGE_PNG)
.body(resource);
}
三、Spring Boot 实现 PDF 转 Word
(技术文章篇幅限制,此处仅展示关键代码片段)
// 引入 OpenAPI 库
// 在 porm.xml 中添加依赖
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.28.3</version>
</dependency>
<dependency>
<groupId> org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.2.3</version>
</dependency>
private void convertPdfToWord(String pdfPath, String docPath) throws Exception {
// 解析 PDF 文本
InputStream input = new FileInputStream(pdfPath);
ContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
PDFParser pdfParser = new PDFParser();
pdfParser.parse(input, handler, metadata, context);
String text = handler.toString();
// 将文本写入 Word 文档
XWPFDocument document = new XWPFDocument();
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText(text);
// 保存 Word 文档
FileOutputStream out = new FileOutputStream(docPath);
document.write(out);
out.close();
}
四、测试与运行
- 启动项目 :运行 DocumentConverterApplication 类中的 main 方法。
- 上传测试 :访问 /upload 接口上传 PDF 文件。
- 转换测试 :调用 PDF 转换服务,将在指定输出路径生成图片或 Word 文件。
- 下载测试 :访问 /download/image/{fileName} 接口,下载转换后的图片或 Word 文件。