06-08-2025 04:18 PM
I've been retired for a few years and am a little rusty. I've been scraping records from my local government and need an elegant way to scan dollar amounts from strings scraped from pdf files. The amounts can range from hundreds of dollars ($241) to millions ($2,123,456.78). Here are a couple of strings I have to parse. They are building permits: site location, NEW Single Family Residence, permit #, Valuation, Fee, applicant, date.
3448 N YAVAPAI STREET NEW SFR BLD19-0212 $116,264.53 $3,951.61BIG RED CONSTRUCTION08/29/2019
5065 W CAMELBACK LOOP KGMN NEW SFR BLD19-0341 $7,152.00 $203.06Mohave Shadez08/05/2019
I made a cluster of each of the elements of the string and then parse the string to load the elements into the cluster. I am having trouble with the dollar values.
This only works if the values have one comma. I need a general solution for 0, 1, or 2 commas.
06-08-2025 06:15 PM - edited 06-08-2025 06:18 PM
Not sure what we can assume about the structure of the input string, but here's a quick attempt.
"Assuming the $ character occurs exactly twice and always before a number, etc.)
I am sure it can be optimized further. You talk about cluster, but I don't see a cluster.
06-08-2025 06:21 PM
Thanks for the fast reply. I took the easy way out and removed the commas. That made it simple.
06-08-2025 06:37 PM
Yes, my code will remove all commas for the two numbers following the dollar characters.
06-09-2025 04:35 AM
06-09-2025 10:04 AM - edited 06-09-2025 10:05 AM
@Viper wrote:
I need a general solution for 0, 1, or 2 commas.
My regex matches 0, 1, 2 or more commas OR decimal points! That makes it not "work", but you've got other problems if the data has more than one decimal point.