As a freelance artist, tax season has really become quite a headache. I’ve gotten into the practice of scanning every receipt onto Google Drive as soon as I get them, and naming the file with something like “May 15 24.10.pdf” or “$33.66 London Drugs.pdf”. But when it comes time to actually add those numbers together, I am overcome with dread just looking at hundreds of files in each folder.
This is a handy little Python script I wrote (instead of actually filing the tax returns, I know I know…) that once you feed the folder path and the file naming system in that folder, it would spit out the total sum of the expenses it finds in the filenames (and also note for you if anything file doesn’t seem to match the system you specified).
click here to see the code on Github Gist
# As a freelancer, I saved many receipts a day onto a cloud folder with a loose naming system
# consisting of generally dollar amount + dates or some description of the expense.
# As each folder has a slightly different system, here's a program where I could choose which
# naming system I'm using in each folder before it spits out the sum of amount.
import re, os, decimal
#because float() is an approximation and gives weird decimal points
from decimal import Decimal
# file path for the folder which this program will iterate through filenames
path = input('paste path here: ')
os.chdir(path)
totalSum = 0
counter = 0
def resetNumber():
# need to declare global, or will get UnboundLocalError
global totalSum, counter
totalSum = 0
counter = 0
def expenseRun(regexToUse):
# passing global variables to function
global totalSum, counter
resetNumber()
if regexToUse == 'apple':
appleExpense(totalSum, counter)
elif regexToUse == 'banana':
bananaExpense(totalSum, counter)
elif regexToUse == 'cow':
cowExpense(totalSum, counter)
else:
print('error! try again!')
def appleExpense(totalSum, counter):
for filename in os.listdir():
if apple.search(filename) == None:
print('error! file name is '+filename)
else:
result = apple.search(filename)
totalSum += Decimal(result.group(1))
counter += 1
print('receipt #' + str(counter) + '- ' +str(result.group(1))+' from '+str(result.group(2))+'. Now totaling at '+ str(totalSum) )
print('\nTOTAL SUM IS: '+str(totalSum))
def bananaExpense(totalSum, counter):
for filename in os.listdir():
if banana.search(filename) == None:
print('error! file name is '+filename)
else:
result = banana.search(filename)
totalSum += Decimal(result.group(2))
counter += 1
print('receipt #' + str(counter) + '- ' +str(result.group(2))+' from '+str(result.group(1))+'. Now totaling at '+ str(totalSum) )
print('\nTOTAL SUM IS: '+str(totalSum))
def cowExpense(totalSum, counter):
for filename in os.listdir():
if cow.search(filename) == None:
print('error! file name is '+filename)
else:
result = cow.search(filename)
totalSum += Decimal(result.group(1))
counter += 1
print('receipt #' + str(counter) + '- ' +str(result.group(1))+'. Now totaling at '+ str(totalSum) )
print('\nTOTAL SUM IS: '+str(totalSum))
#re.compile(r'expression')
# group1 = another number of digits separated by a period (ex. "12.34")
# group2 = anything that comes after the digits and a space, and before ".pdf" file extension
apple = re.compile(r'(\d+\.\d+)\s+(.*)(\.pdf$)')
# group1 = any number of text + space + possible number (ex. "Mar 15")
# group2 = another number of digits separated by a period(ex. "12.34")
banana = re.compile(r'(.*)\s+(\d+\.\d+)')
# group1 = any number of text + space + possible number (ex. "Mar 15")
cow = re.compile(r'(\d+\.\d+)')
print('\nREGEX EXPRESSIONS! type out the one that applies---')
print('apple = $ first')
print('banana = $ later')
print('cow = $ only')
regexToUse = input('\nwhich regex? ')
expenseRun(regexToUse)
input("\nPress enter to exit;")