.. documentation master file, created by
   sphinx-quickstart
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

pypdf_table_extraction (Camelot):
==================================
PDF Table Extraction for Humans
===============================


Release v\ |version|. (:ref:`Installation <install>`)

.. image:: https://readthedocs.org/projects/pypdf-table-extraction/badge/?version=latest
    :target: https://pypdf-table-extraction.readthedocs.io/
    :alt: Documentation Status

.. image:: https://codecov.io/github/py-pdf/pypdf_table_extraction/badge.svg?branch=main&service=github
    :target: https://codecov.io/github/py-pdf/pypdf_table_extraction/?branch=main

.. image:: https://img.shields.io/pypi/v/pypdf-table-extraction.svg
    :target: https://pypi.org/project/pypdf-table-extraction/

.. image:: https://img.shields.io/pypi/l/pypdf-table-extraction.svg
    :target: https://pypi.org/project/pypdf-table-extraction/

.. image:: https://img.shields.io/pypi/pyversions/pypdf-table-extraction.svg
    :target: https://pypi.org/project/pypdf-table-extraction/


**pypdf_table_extraction** Formerly known as `Camelot`_ is a Python library that can help you extract tables from PDFs!

.. _Camelot: https://github.com/camelot-dev/camelot

.. note:: pypdf_table_extraction is the continuation of `Camelot`_ and inteded to be a compatible replacement. You can use the old 'camelot' name as well as 'pypdf_table_extraction' for the library and the :ref:`command-line interface <cli>`.

----

**Here's how you can extract tables from PDFs.**
You can check out the quickstart notebook.

.. image:: https://colab.research.google.com/assets/colab-badge.svg
    :target: https://colab.research.google.com/github/py-pdf/pypdf_table_extraction/blob/main/examples/pypdf_table_extraction_quick_start_notebook.ipynb

Or follow the example below.
You can find the PDF used in this example `here`_.

.. _here: _static/pdf/foo.pdf

.. code-block:: pycon

    >>> import pypdf_table_extraction
    >>> tables = pypdf_table_extraction.read_pdf('foo.pdf')
    >>> tables
    <TableList n=1>
    >>> tables.export('foo.csv', f='csv', compress=True) # json, excel, html, markdown, sqlite
    >>> tables[0]
    <Table shape=(7, 7)>
    >>> tables[0].parsing_report
    {
        'accuracy': 99.02,
        'whitespace': 12.24,
        'order': 1,
        'page': 1
    }
    >>> tables[0].to_csv('foo.csv') # to_json, to_excel, to_html, to_markdown, to_sqlite
    >>> tables[0].df # get a pandas DataFrame!

.. csv-table::
  :file: _static/csv/foo.csv

pypdf_table_extraction also comes packaged with a :ref:`command-line interface <cli>`!

.. note:: pypdf_table_extraction only works with text-based PDFs and not scanned documents. (As Tabula `explains`_, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)

You can check out some frequently asked questions :ref:`here <faq>`.

.. _explains: https://github.com/tabulapdf/tabula#why-tabula

Why pypdf_table_extraction?
---------------------------

- **Configurability**: pypdf_table_extraction gives you control over the table extraction process with :ref:`tweakable settings <advanced>`.
- **Metrics**: You can discard bad tables based on metrics like accuracy and whitespace, without having to manually look at each table.
- **Output**: Each table is extracted into a **pandas DataFrame**, which seamlessly integrates into `ETL and data analysis workflows`_. You can also export tables to multiple formats, which include CSV, JSON, Excel, HTML, Markdown, and Sqlite.

.. _ETL and data analysis workflows: https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873

See `comparison with similar libraries and tools`_.

.. _comparison with similar libraries and tools: https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools


The User Guide
--------------

This part of the documentation begins with some background information about why pypdf_table_extraction was created, takes you through some implementation details, and then focuses on step-by-step instructions for getting the most out of pypdf_table_extraction.

.. toctree::
   :maxdepth: 2

   user/intro
   user/install-deps
   user/install
   user/how-it-works
   user/quickstart
   user/advanced
   user/faq
   user/cli


The API Documentation/Guide
---------------------------

If you are looking for information on a specific function, class, or method, this part of the documentation is for you.

.. toctree::
   :maxdepth: 2

   api

The Contributor Guide
---------------------

If you want to contribute to the project, this part of the documentation is for you.

.. toctree::
   :maxdepth: 2

   dev/contributing
   Changelog <https://github.com/py-pdf/pypdf_table_extraction/releases>
