This is another simple but common problem in certain business environments:
1) Data are presented via a Microsoft SQL Server Reporting Services report, BUT
2) The user wants the data in Excel, and, further, wants to play with it (pivot, etc.) there. The problem is that the grouping column labels are not in every record, only in the one row that begins the list of records for that group (sanitized screenshot below):
I had this assignment recently from a remote request. It took about four rounds of an e-mail exchange to figure out that it really wasn't a data problem, but a formatting one that needed solving.
It is possible to do the whole thing in Python. I did the Excel part by hand in order to get a handle on the data:
1) In Excel, delete the extra rows on top of the report leaving just the headers and the data.
2) In Excel, select everything on the data page, format the cells correctly by unselecting the Merge Cells and Wraparound options.
3) In Excel, at this point you should be able to see if there are extra empty columns as space fillers; delete them. Save the worksheet as a csv file.
4) In a text editor, open your csv file, identify any empty rows, and delete them. Change column header names as desired.
Now the Python part:
#!python36
"""
Doctor csv dump from unmerged cell
dump of SSRS dump from MSSQL database.
Fill in cell gaps where merged
cells had only one grouping value
so that all rows are complete records.
"""
import pprint
COMMA = ','
EMPTY = ''
INFILE = 'rawdata.csv'
OUTFILE = 'canneddumpfixed.csv'
ERRORFLAG = 'ERROR!'
f = open(INFILE, 'r')
headerline = next(f)
numbercolumns = len(headerline.split(COMMA))
f2 = open(OUTFILE, 'w')
# Assume at least one data column on far right.
missingvalues = (numbercolumns - 1) * [ERRORFLAG]
for linex in f:
print('Processing line {:s} . . .'.format(linex))
splitrecord = linex.split(COMMA)
for slotx in range(0, numbercolumns - 1):
if splitrecord[slotx] != EMPTY:
missingvalues[slotx] = splitrecord[slotx]
else:
splitrecord[slotx] = missingvalues[slotx]
f2.write(COMMA.join(splitrecord))
f2.close()
print('Finished')
At this point you've got your data in csv format - you can open it in Excel and go to work.
There may be a free or COTS (commercial off the shelf) utility that does all this somewhere in the Microsoft "ecosystem" (I think that's their fancy enviro-friendly word for vendor-user community) but I don't know of one.
Thanks for stopping by.
1) Data are presented via a Microsoft SQL Server Reporting Services report, BUT
2) The user wants the data in Excel, and, further, wants to play with it (pivot, etc.) there. The problem is that the grouping column labels are not in every record, only in the one row that begins the list of records for that group (sanitized screenshot below):
But I don't WANT to copy and paste all those groupings for 30,000 records :*-( |
It is possible to do the whole thing in Python. I did the Excel part by hand in order to get a handle on the data:
1) In Excel, delete the extra rows on top of the report leaving just the headers and the data.
2) In Excel, select everything on the data page, format the cells correctly by unselecting the Merge Cells and Wraparound options.
3) In Excel, at this point you should be able to see if there are extra empty columns as space fillers; delete them. Save the worksheet as a csv file.
4) In a text editor, open your csv file, identify any empty rows, and delete them. Change column header names as desired.
Now the Python part:
#!python36
"""
Doctor csv dump from unmerged cell
dump of SSRS dump from MSSQL database.
Fill in cell gaps where merged
cells had only one grouping value
so that all rows are complete records.
"""
import pprint
COMMA = ','
EMPTY = ''
INFILE = 'rawdata.csv'
OUTFILE = 'canneddumpfixed.csv'
ERRORFLAG = 'ERROR!'
f = open(INFILE, 'r')
headerline = next(f)
numbercolumns = len(headerline.split(COMMA))
f2 = open(OUTFILE, 'w')
# Assume at least one data column on far right.
missingvalues = (numbercolumns - 1) * [ERRORFLAG]
for linex in f:
print('Processing line {:s} . . .'.format(linex))
splitrecord = linex.split(COMMA)
for slotx in range(0, numbercolumns - 1):
if splitrecord[slotx] != EMPTY:
missingvalues[slotx] = splitrecord[slotx]
else:
splitrecord[slotx] = missingvalues[slotx]
f2.write(COMMA.join(splitrecord))
f2.close()
print('Finished')
At this point you've got your data in csv format - you can open it in Excel and go to work.
There may be a free or COTS (commercial off the shelf) utility that does all this somewhere in the Microsoft "ecosystem" (I think that's their fancy enviro-friendly word for vendor-user community) but I don't know of one.
Thanks for stopping by.