Parsing

.pdf to .csv converter or .pdf to database convertor for EDAX EDS files

These python codes will allow conversion from .pdf to .csv format so data can be uploaded directly to a database such as PostgreSQL or Maria.

Purposes

Struggling with energy dispersive X-ray analysis (EDXA or EDAX) results that are outputted from the scanning electron microscope (SEM) as PDF files? These helpful scripts work to easily convert the PDF outputs to CSV formats or upload the data directly to a database.

Energy dispersive X-ray analysis (EDXA or EDAX), also known as energy-dispersive X-ray spectroscopy (EDS, EDX, EDXS or XEDS) or energy dispersive X-ray microanalysis (EDXMA) is a common analytical technique used for the elemental analysis or chemical characterization of a sample. However some SEM-EDS setups only output data files as uneditable .pdf files.

To avoid hours of transcribing data by hand, these python scripts have been written to do all the work for you!

.csv files are easily edittable, and compatible with most text editors and graphing softwares, including Excel, Origin, and Veusz.

Functionality

The codes are written to be compatible with files produced from a Tescan Vega XMU scanning electron microscope (SEM) coupled to a 40 mm² EDAX Apollo^TM energy dispersive x-ray detector (EDS) running EDAX Genesis^TM software. The code is easily edittable to be adapted to other SEM-EDS machine/software outputs.

Future Work:

Generalized file naming structure import to metadata

Usage

This code allows converts EDAX data saved as pdf to csv format (or uploads it to a database) with few lines of code.

To use, you need to specify directory that contains multiple or single .pdf files.

Setting up an Environment

Pip


pip install -r requirements.txt

Conda

For a new environment

conda env update -n my_sem_edax_env --file ENV.yaml

In an existing environment (e.g., setup with PyCharm)


conda env update --file environment.yml

Updating Dependencies

To publish updated environment configurations, make a conda environment YML file and a pip requirements file.

conda env export --no-builds > environment.yml

pipreqs --mode compat --use-local --force . > requirements.txt

To fetch data from Postgres, use the following query:

7172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120

 SELECT sem_chemistry.sample_id,
    sem_chemistry.sem_chem_id AS chem_id,
    sem_chemistry.sem_im_id AS im_id,
    sem_chemistry.element AS sem_element,
    sem_chemistry.wt_percent AS wt_pct,
    sample.sample_name,
    sample.tube_id,
    tube.tube_name,
    tube.depth_top AS tube_depth_top,
    tube.depth_bottom,
    sem_instrument_metadata.method AS sem_method,
    sem_instrument_metadata.quantification_method AS quant_method,
    sem_instrument_metadata.quantification_standard AS quant_standard,
    sem_instrument_metadata.sem_user,
    sem_instrument_metadata.date AS sem_date
   FROM sem_chemistry
     LEFT JOIN sem_instrument_metadata ON sem_chemistry.sem_im_id = sem_instrument_metadata.sem_im_id
     LEFT JOIN sample ON sem_chemistry.sample_id = sample.sample_id
     LEFT JOIN tube ON sample.tube_id = tube.tube_id;

Dependencies

Python Version

Python 3.11

Python Libraries

PyMuPDF ==1.19.5
pillow ==8.4.0
python.dateutil ==2.8.2
pandas ==1.4.1
hyperspy ==1.6.5
hyperspy-base ==1.6.5
psycogp2 ==2.8.6
tqdm ==4.62.3
python-dotenv ==0.20.0
tabulate ==0.8.9

Support

If you experience issues with the code, support can be sought by emailing [email protected].

Authors and acknowledgment

Written by Hanna L Brooks and Camden G Bock. Last update: 2023.

License

Code is licensed with a MIT License. See license section for more information.