2022-04-29 06:38:50Ma Daimeng

We use pdf2htmlEX This library will PDF To HTML, And through the command line 、python Control it

pdf2htmlEX Related information

pdf2htmlEX Github Home page :

Related papers :
Mac/docker install

Mac Use it directly :brew install pdf2htmlEX that will do

docker install , Use :

docker search pdf2htmlEX

You can see what you can use docker image, We choose to download the most star the docker:

NAME                                      DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
bwits/pdf2htmlex                          Smallest pdf2htmlEX container and easiest wa…   27                   [OK]
bwits/pdf2htmlex-alpine                   pdf2htmlEX in alpine                            15                   [OK]
klokoy/pdf2htmlex                                                                         7                    [OK]

Here we use the command :

docker pull bwits/pdf2htmlex

Command line PDF turn HTML

The first run :

alias pdf2htmlEX='docker run -ti --rm -v `pwd`:/pdf bwits/pdf2htmlex pdf2htmlEX'

Then use the command pdf2htmlEX The test of pdf file .pdf The target can be generated HTML file :

pdf2htmlEX sample.pdf

Please refer to :

Python PDF turn HTML

The code is also simple :

def convert_pdf_to_html(filename):
    import subprocess"docker run --rm -v `pwd`:/pdf bwits/pdf2htmlex pdf2htmlEX {}".format(filename), shell=True)

if __name__ == '__main__':
    convert_pdf_to_html("sample.pdf") #  Here is the name of the file passed in , If the code and file are not the same path , Need to use xx/xx/sample.pdf The path of 

among :

function linux The command can refer to :

