PayTrust hack
I learned that PayTrust has no easy way to get the old bills downloaded. They are shutting down at the end of 2023. I wrote a script to get all my data.
The Problem
PayTrust was a very cool service that would scan your bills and help you pay them on time. You used to get a CD of all the PDFs at the end of the year. They stopped doing that some years back though.
Intuit bought them some years back. Things like the CD and usability changed. Most vendors have some form of electronic billing and auto-pay that also made it a bit less useful. Guessing the customer base kept shrinking, so finally Intuit decided to shut the doors this year.
That left me with the problem of needing to get the bills that I had no CD for. I contacted PayTrust support to ask for a data dump. They told me to run a report, click on each bill and download them one at a time.
Analysis
I looked at the report. It had hundreds of links for bill download. I found that clicking on the link popped up another window with bill details. That had an iframe with the PDF where you could click to download.
A few hundred links, with multiple clicks… A few clicks to load the file, click download, go to the next link. All made me feel like I needed to automate this.
First approach
I started down the path of just using some simple Go code to scrape the content. The PayTrust web site turns out to be a really old school HTML. Or maybe they do it intentionally, but it is not easy to find things by ID.
That led me to look for some way to use a browser automation tool. That would let my code look like a person going through the pages.
Playwright for Go
I found a Go library for Playwright, which is a testing framework that interacts with the browser to automate it. It has lots of nice features. For instance it can find things on the page by text, etc. That made me feel like I could do this.
The first stumbling block I ran into was that https://paytrust.com doesn’t work. It used to redirect you to some Intuit page and eventually back to where PayTrust actually lives. I had to try a bunch of times before I discovered that https://login..paytrust.com/3004/ would work reliably.
Once I figured that out, I realized that I had to put in some logic to handle the login flow. The site will push you through a phone verification if you haven’t logged in for a while. They call your phone and you have to type in a code in the browser (before you enter your password). On a subsequent login, you won’t get the phone call page.
Manual steps
I learned that a specific form was displayed on the page when a phone call needed to be made. I added some logic to pause and wait for input there (not super sophisticated, just a prompt in the Go code to say “yes” once you have entered the code). This actually worked pretty well as the form doesn’t exist if the site isn’t asking you for the code, so it can go on to the password input.
I forgot to mention that I use 1password to store all my logins. On first pass of the script, I wrote some code to pull the username and password from there. That code wa written for another task. I didn’t need to reinvent the wheel.
Later I did update the flags on the script so that somebody could use it without 1password (or with 1password and different vault and tags).
I got the login working pretty flawlessly. The next step was to walk through all the clicks to get to the report links I needed. Tediously, I had to restart every time I missed a step. And I was also learning out to make playwright behave as if it was capable of doing things serially.
Logged In
Immediately after login, I would see this popup. The notice asks me to assign it to a biller. Sadly though, the notice doesn’t actually exist. I added some logic to just close it.
I had to step through all the clicks to get to the reports, choose the right report, and traverse all the “bill download” links to get the PDFs.
Click flow
The steps ended up being something like this
- Look for the notification window, and click the close button if there was one.
- Click the to activate the reports tab
- Find for the spending reports link and click it
- Wait for the reports options to become visible and click to make the options visible.
- Get the options and find the one that had the right title. Then click that to load the report.
- Wait for the totals label to display in the report (this was because when I first started I wasn’t getting all the links in the report).
- Find all the links that have the bill download icon.
- Loop through all those links and get the payee name on (part of the same row where the bill icon lives).
- Check if we processed the payee already, and if so go to the next link.
- Click the link from the bill icon (this pops up a new window with bill details).
- Once the popup was there, find the options for bill dates, to loop through them (turns out the popup has all the bills for the payee).
- Select each bill in turn, find the iframe that gets loaded, and use that URL to download the bill.
Gotchas and flaws
The URL for the bill turns out to be different depending on the payee and where PayTrust decided to store them, AND includes a token that lets you download the item. I spent a bit too long trying to figure out how to get Playwright to tell the browser to download the PDF until I realized I could just do a simple HTTP GET and write to the file.
It also turned out that some of the download links brought me to a “Bill not available” page, I think for anything that was more than a couple years old. So it turned out I really only had two years worth of bills that I could get.
Conclusion
Ultimately this was a great learning experience on how to interact with CDP via Playwright, and probably was a wash in terms of effort to get the files I was after (since there was a lot of learning and considerable start and stop on getting it to finally flow end to end, and there were fewer bills I would have had to download).