main story | audit findings explorer | approach

Data processing and analysis approach

Data processing

  1. I wrote some Python scripts to parse 5 years of AGO annual audit reports from pdf into csv format. Each audit report 'case' as reported by AGO (e.g. procurement lapse by agency XYZ) constitutes a row.
  2. I extracted things that can be automatically extracted using Python. For each case, I extract AGO's description, the headers and subheaders, the agency's name, and the amounts mentioned.
  3. For the things that can't be automatically extracted (e.g. a human-readable short summary, and of all the dollar figures mentioned, which amount is the actual amount that is being flagged in the audit). That said, the summary contains none of my own words/interpretation - I simply extracted content from AGO's reports and shortened them where necessary so that you can understand the audit incident in one glance.
  4. As I prepared this project, I felt that the summary was necessary in order to present facts while making it readable and accessible. Hence, In the summary, you will find snippets which describe the audit incident in gist. If you're interested in the full incident, I've cited the source and year. You can follow the paragraph numbers for each case and read it in full in AGO's annual audit reports
  5. Next, I grouped each audit entry into categories based on AGO's description of the case (e.g. procurement issues, payment issues, related-party transactions). Some amount of interpretation was required here, but thanks to AGO's methodical way of writing, the category is simply a function of the case's header/subheader/description.
  6. Lastly, some discretion was also required for extracting the amount flagged by AGO. Most of the time it's straightforward, but sometimes the report doesn't state a specific figure. For example, the report can say something like "There were nine instances where the Certifying Officers (COs) had certified invoices with values ranging from $54,608 to $312,000", without specifying the total amount at fault. In ambiguous situations like this, I err on the side of caution and plainly state the amount stated by AGO (even if it may be much less than the total amount in question.)
  7. For the incidents belonging to Workers' Party, the data is gathered from AHTC's statement of claim and CNA news coverage on the AHTC trial.


This project is not intended to call out any specific government agencies. I'm sure they've done what they needed to do to be in line with AGO's recommendations. This is purely an objective comparison with the AHTC trial.

Who am I?

I'm a Singaporean and I believe that no matter which party you support - Singapore will benefit from more rigorous debate in Parliament on issues that are important to us - healthcare, education, social issues, etc.

I am remaining anonymous for one reason: my own safety. I've read the news and heeded the warning that you can be taken to court for criticizing public authorities on Facebook.

Nonetheless, if you wish to contact me, you can email me at