| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112 |
- Metadata-Version: 2.4
- Name: pdfminer.six
- Version: 20250506
- Summary: PDF parser and analyzer
- Author: Yusuke Shinyama, Pieter Marsman
- Author-email: Philippe Guglielmetti <pdfminer@goulu.net>
- License-Expression: MIT
- Project-URL: Homepage, https://github.com/pdfminer/pdfminer.six
- Keywords: layout analysis,pdf converter,pdf parser,text mining
- Classifier: Development Status :: 5 - Production/Stable
- Classifier: Environment :: Console
- Classifier: Intended Audience :: Developers
- Classifier: Intended Audience :: Science/Research
- Classifier: Programming Language :: Python
- Classifier: Programming Language :: Python :: 3 :: Only
- Classifier: Programming Language :: Python :: 3.9
- Classifier: Programming Language :: Python :: 3.10
- Classifier: Programming Language :: Python :: 3.11
- Classifier: Programming Language :: Python :: 3.12
- Classifier: Programming Language :: Python :: 3.13
- Classifier: Topic :: Text Processing
- Requires-Python: >=3.9
- Description-Content-Type: text/markdown
- License-File: LICENSE
- Requires-Dist: charset-normalizer>=2.0.0
- Requires-Dist: cryptography>=36.0.0
- Provides-Extra: dev
- Requires-Dist: atheris; python_version < "3.12" and extra == "dev"
- Requires-Dist: black; extra == "dev"
- Requires-Dist: mypy==0.931; extra == "dev"
- Requires-Dist: nox; extra == "dev"
- Requires-Dist: pytest; extra == "dev"
- Provides-Extra: docs
- Requires-Dist: sphinx; extra == "docs"
- Requires-Dist: sphinx-argparse; extra == "docs"
- Provides-Extra: image
- Requires-Dist: Pillow; extra == "image"
- Dynamic: license-file
- pdfminer.six
- ============
- [](https://github.com/pdfminer/pdfminer.six/actions/workflows/actions.yml)
- [](https://pypi.python.org/pypi/pdfminer.six/)
- [](https://gitter.im/pdfminer-six/Lobby?utm_source=badge&utm_medium)
- *We fathom PDF*
- Pdfminer.six is a community maintained fork of the original PDFMiner. It is a tool for extracting information from PDF
- documents. It focuses on getting and analyzing text data. Pdfminer.six extracts the text from a page directly from the
- sourcecode of the PDF. It can also be used to get the exact location, font or color of the text.
- It is built in a modular way such that each component of pdfminer.six can be replaced easily. You can implement your own
- interpreter or rendering device that uses the power of pdfminer.six for other purposes than text analysis.
- Check out the full documentation on
- [Read the Docs](https://pdfminersix.readthedocs.io).
- Features
- --------
- * Written entirely in Python.
- * Parse, analyze, and convert PDF documents.
- * Extract content as text, images, html or [hOCR](https://en.wikipedia.org/wiki/HOCR).
- * Support for PDF-1.7 specification (well, almost).
- * Support for CJK languages and vertical writing.
- * Support for various font types (Type1, TrueType, Type3, and CID) support.
- * Support for extracting embedded images (JPG, PNG, TIFF, JBIG2, bitmaps).
- * Support for decoding various compressions (ASCIIHexDecode, ASCII85Decode, LZWDecode, FlateDecode, RunLengthDecode,
- CCITTFaxDecode)
- * Support for RC4 and AES encryption.
- * Support for AcroForm interactive form extraction.
- * Table of contents extraction.
- * Tagged contents extraction.
- * Automatic layout analysis.
- How to use
- ----------
- * Install Python 3.9 or newer.
- * Install pdfminer.six.
- ```bash
- pip install pdfminer.six
- * (Optionally) install extra dependencies for extracting images.
- ```bash
- pip install 'pdfminer.six[image]'
- * Use the command-line interface to extract text from pdf.
- ```bash
- pdf2txt.py example.pdf
- * Or use it with Python.
- ```python
- from pdfminer.high_level import extract_text
- text = extract_text("example.pdf")
- print(text)
- ```
- Contributing
- ------------
- Be sure to read the [contribution guidelines](https://github.com/pdfminer/pdfminer.six/blob/master/CONTRIBUTING.md).
- Acknowledgement
- ---------------
- This repository includes code from `pyHanko` ; the original license has been included [here](/docs/licenses/LICENSE.pyHanko).
|