I’ve recently been configuring reporting to work with a new software in the cloud. As with much enterprise software, developers spend more time maintaining the code and slurping money than they do modernizing. Specifically, the reporting tool’s connections were all broken, and the only option was to save documents as Excel 1997 documents. The document had all data (up to 4 headers) on one page, so the data couldn’t easily be parsed. To make matters worse, there was no support for CSVs, XMLs, or XLSX files. This means that PowerShell had no ‘easy’ way to ingest the data. To get around this, I had to tap into the COM.
What is COM?
COM stands for Component Object Model, and many Microsoft programs have them upon installation. They’re essentially a cross-language method of interacting with program components. This means you can work with Excel or other programs that have COM objects autonomously or in the background.
Technically, this is a sub-optimal solution as I wouldn’t install Office on a production server. This is mainly a mental exercise to be able to parse data without the Excel PowerShell module.
Using Excel to Modify the Spreadsheet
This is a simplified version of the table. Note that Header2 is in the same ‘worksheet’ as Header1.
Header1 | |
Column1 | Column2 |
Data1 | Data2 |
Header2 | |
Column3 | Column4 |
Data3 | Data4 |
First off, we’ll want to define where our spreadsheet is, and open it in Excel.
$excelFilePath = "C:\Users\Public\Downloads\MyBook.xls"
$excel = New-Object -ComObject Excel.Application
$workbook = $excel.Workbooks.Open($excelFilePath)
Next, we’ll slap the initial worksheet into a variable, and initialize a new worksheet as another variable.
$sourceWorksheet = $workbook.Worksheets.Item(1)
# The first option with 99% of scripts will be '0', as this is the first binary number. Not in this case.
$destinationWorksheet = $workbook.Worksheets.Add() # We can specify a string as a worksheet name if desired.
Afterward, we’ll copy a range to the sheet we created, then clear the cells on the original sheet.
$sourceRange = $sourceWorksheet.Range("A4:B6")
$destinationCell = $destinationWorksheet.Range("A1")
$sourceRange.Copy($destinationCell)
$sourceRange.Clear()
Alternatively, we could search for a cell and copy its row, or loop through to find other headings in a larger document. It’s an ungodly hour at the time of writing, so I only have enough braincells to show you how the search would work.
$targetCell = $worksheet.Cells.Find("Header2")
$destinationWorksheet.Name = $targetCell.Value() # This just names the tab 'Header2'
$worksheet.Rows.Item($targetCell.Row).Copy()
$newWorksheet.Rows.Item(1).PasteSpecial(-4163) # -4163 corresponds to the default paste format
After you’re done manipulating, you can save and close the workbook. To keep things tidy, release the COM objects and clear the variables.
$workbook.Save()
# If you want to save to another file, you can use SaveAs()
# $workbook.SaveAs('C:\Users\Public\Downloads\MyBook-updated.xls')
$workbook.Close()
$excel.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($sourceRange) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($sourceWorksheet) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($destinationWorksheet) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbook) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel) | Out-Null
Remove-Variable -Name sourceRange, sourceWorksheet, destinationWorksheet, workbook, excel