Document Analysis and extraction using PowerShell
A quick trip down the memory lane before we head to the AI first future
This past couple of months, I've been working on document automation using Power Platform and Gemini, and it's awesome.
But yesterday my brother brought this problem which got me thinking.
How do we get all files inside a SharePoint folder, in CSV and convert them in json, in bulk?
The first solution I had was to use LLMs like Gemini, esp with recent price changes (~1 dollar per 6000 pages for pdf with newest version of Gemini, allegedly) but then I thought, hey?
I asked ChatGPT and the first solution it had was to use Pandas and other python libraries. I said no, I will use good old PowerShell. And it did work.
PowerShell has libraries to get all files, find all excel files and then to convert them to json.
Then, we used the Windows task scheduler to set a time for the night to run the script as .PS1 file to convert files when we were not logged in. Felt amazing.
I'll do pdf merging next using PowerShell, let me know if there is any other scenario I should write about!