borb
is a powerful and flexible Python library for creating and manipulating PDF files.
The table below represents a small selection of commits that highlight the progress made in developing the new borb library. These are only a handful of the contributions so far; the library is shaping up to be a powerful tool for working with PDFs.
At this stage, the core functionality of reading and writing PDF documents is nearly complete. Remaining features, such as HTML-to-PDF, Markdown-to-PDF, and OCR integration, are tangential to the core purpose and will follow once the foundational functionality is polished.
Currently, I am focused on reading PDF documents. My approach is pragmatic: I process a large corpus of PDF documents and analyze the library’s behavior when something fails. This iterative approach helps identify edge cases, improve the code, and ensure robust support for a wide variety of PDF documents.
Date | Commit | Description |
---|---|---|
10/04/2024 | 044340ea |
Initial commit |
10/21/2024 | 09ae5a49 |
Add DropDownList, CountryDropDownList, GenderDropDownList and associated tests |
10/27/2024 | 2daf9ced |
Add markdown files for code of conduct, license |
10/29/2024 | 83927fd3 |
Re-do SmartArt |
11/04/2024 | 1e1baf8a |
Start work on reading PDF documents |
11/09/2024 | b798b0af |
Writing PDF documents, reading PDF documents, adding reference type |
11/28/2024 | 83b4dc8f |
Get rid of most typing issues |
11/30/2024 | c4210ebf |
Trying out mermaid for documentation diagramming |
12/01/2024 | d8138db6 |
Apply black formatting |
12/01/2024 | d8138db6 |
Apply black formatting |
12/06/2024 | 0dc509fc |
Add TrueTypeFont |
12/08/2024 | 90b0ff67 |
Add MarkdownParagraph |
12/11/2024 | 61dc1c42 |
Add GoogleTrueTypeFont |
12/11/2024 | 12a9c3ed |
Add tests for TrueTypeFont |
12/13/2024 | 0ab7270c |
Enable write/read/write cycle |
12/15/2024 | 5eb03a88 |
Add license mechanism, and README.md (for license) |
12/23/2024 | ???????? |
Revise (server side) license mechanism, update AWS setup, document AWS setup |
The table below summarizes the latest test results for text extraction in borb
. The first column represents the difference in character count (as a percentage) between the extracted text from a PDF and the expected “ground truth” character count. For instance, if a PDF is expected to have 200 characters but the extraction yields 180 characters, this represents a 10% difference and falls into the “10” bucket. The second column shows the fraction of documents that fall into each bucket.
Difference (%) | Fraction of Documents |
---|---|
0 | 82 |
10 | 2 |
20 | 2 |
30 | 1 |
40 | 0 |
50 | 1 |
60 | 1 |
70 | 0 |
80 | 3 |
90 | 2 |
100 | 4 |
While the results are not yet ideal, they represent an early stage of development. Future improvements will include:
borb
provides a pure Python solution for PDF document management, allowing users to read, write, and manipulate PDFs. It models PDF files in a JSON-like structure, using nested lists, dictionaries, and primitives (numbers, strings, booleans, etc.). Created and maintained as a solo project, borb
prioritizes common PDF use cases for practical and straightforward usage.
Explore borb
’s capabilities in the examples repository for practical, real-world applications, including:
PageLayout
…and much more!
Install borb
directly via pip
:
pip install borb
To ensure you have the latest version, consider the following commands:
pip uninstall borb
pip install --no-cache borb
Create your first PDF in just a few lines of code with borb
:
from pathlib import Path
from borb.pdf import Document, Page, PageLayout, SingleColumnLayout, Paragraph, PDF
# Create an empty Document
d: Document = Document()
# Create an empty Page
p: Page = Page()
d.append_page(p)
# Create a PageLayout
l: PageLayout = SingleColumnLayout(p)
# Add a Paragraph
l.append_layout_element(Paragraph('Hello World!'))
# Write the PDF
PDF.write(what=d, where_to="assets/output.pdf")
borb
is dual-licensed under AGPL and a commercial license.
The AGPL (Affero General Public License) is an open-source license, but commercial use cases require a paid license, especially if you intend to:
borb
in closed-source projectsborb
in any closed-source productFor more information, contact our sales team.
Special thanks to:
Your contributions and guidance have been invaluable to borb
’s development.