Back to blog

Artificial intelligence for accountants: where it really helps, where it doesn't

A tax consultancy with seven collaborators receives forty requests a day. Half of them are mechanical. Figuring out which half before buying a tool.

The worst risk isn't the AI's mistake. It is the collaborator who hasn't yet learned to spot it.

A colleague, owner of an accounting practice in Ticino, opens the conversation with a question that sums up a year of failed attempts: "what is the right way to use artificial intelligence in a practice like mine?". He has tried ChatGPT at individual level for a few months. He has watched two demos of specialised tools. He has worked with an agency that wanted to "integrate AI" into his flows. He has arrived at nothing that convinces him.

The reason why he has arrived at nothing is instructive. The question was poorly put. There is no single "right way". There are tasks where artificial intelligence changes the conditions of work, tasks where it produces reputational risk, and an ambiguous middle zone that requires judgement. Before choosing tools, the three categories must be separated. Those who skip this step take good tools and apply them in the wrong places.

The first category: mechanical tasks

The average accounting practice, with five to ten collaborators, lives on a daily volume of repetitive operations. Sorting incoming emails toward the right collaborator. Pre-filling standard forms starting from known client information. Extracting data from documents received (invoices, bank statements, contracts) for entry into the management system. Replying to frequent client questions about deadlines, required documents, office hours. Summarising long email exchanges to archive a client or hand over between colleagues.

These tasks have three characteristics in common. They are structurally repetitive, meaning the content varies but the structure of the task is identical. They do not require professional judgement, meaning a tax consultant with twenty years of experience does not add specific value to that individual task. And they generate significant volumes, meaning someone in the practice spends hours each week on them.

Here artificial intelligence changes the conditions of work in a tangible way. Not because it replaces the collaborator, but because it removes the part of his time that produces no value, neither for him nor for the practice. The collaborator stays, his time is redistributed toward tasks where his judgement matters.

The typical error is trying to automate these tasks with generic tools, such as an individual ChatGPT account. The tool is right, but the context is missing. ChatGPT does not know who client Rossi is, which services he holds with the practice, his communication habits, how a reply is written in the practice's style. Every answer requires external briefing, and the briefing eats the time you wanted to save. Result: people stop using it after three weeks.

The preliminary work is not installing a tool. It is structuring the context that the tool must be able to consult in order to answer correctly. Enriched client records, written internal procedures, examples of past communications, defined sorting logic. When this context exists, even a simple tool works well. When it does not exist, no tool, however advanced, produces sustainable results.

The second category: interpretive tasks

Some tasks in the practice require reading, not only execution. A client writes a long email describing a new situation, and the collaborator must understand whether it is a request for advice, a required communication, a signal of dissatisfaction, or a combination of the three. A document arrives from a supplier and contains clauses that have non-obvious tax implications. An apparently trivial client request hides a situation which, if handled without understanding the context, will produce a problem three months later.

In this category AI does not eliminate the work, but it can reduce its duration, in a specific sense. It can produce a first structured reading that the senior verifies and corrects, instead of having to produce it from scratch. It can highlight passages of a long document that deserve attention. It can put preliminary questions to the client on behalf of the senior, so that the meeting table starts with a question that is already more defined.

The condition for this to work is that whoever uses the output is able to evaluate it. A senior with twenty years of experience recognises immediately when a first AI reading has got the point and when it has missed it. For him, it is a tool that accelerates. A junior in the second year of practice, facing the same output, risks accepting it as good without seeing the nuances. For him, the same tool may become a liability.

This is one of the least discussed risks. The dominant narrative says that AI "democratises" access to advanced competence, allowing even juniors to produce senior-level output. In a professional practice the opposite is true: the junior without foundations becomes a blind reviewer. He lacks the taste, the memory of similar cases, the sense of anomaly. An output of seventy per cent looks ninety-five per cent to him, and the twenty-five per cent missing hits the client months later.

The responsible choice, in this category, is twofold. Use AI with seniors, who treat it as an accelerator. And invest in training juniors on the real work, without AI, until they have built the taste that will later let them use it without harm.

A concrete example helps to see the difference. A Ticino practice with two senior partners and three junior collaborators introduced an AI system for preparing margin notes on financial statements. The seniors adopted the tool immediately: the first AI reading saves them twenty to thirty minutes per statement, and they see instantly when the output ignores a detail that requires attention. The juniors, in the first six months, were deliberately kept away from the tool. They prepared the notes by hand, with longer timings, making mistakes and receiving corrections from the seniors. After a year, two of the three juniors had built the taste necessary to use AI as support without becoming dependent on it. The third was trained for another season before being given the tool. The more obvious practice, handing them the tool straight away for speed, would have produced a reviewer unable to see what is missing.

The third category: judgement tasks

Some activities in the practice are the core of the professional service. Personalised tax advice. Evaluation of strategic options. Signing documents that engage the practitioner's responsibility. Communications to clients on delicate matters (restructurings, successions, situations of stress). Interactions with tax authorities.

Here AI does not add value and almost always adds risk. Not because it cannot produce plausible texts. On the contrary: it can produce them too well. A junior who accepts a draft reply to a tax authority without the senior's judgement commits the same error as someone who hands the client tax planning done entirely by ChatGPT: responsibility remains with the practitioner, but without his actual judgement behind it.

For these tasks, the correct position is explicit: AI does not intervene, except as a purely accessory aid (for example rewording a sentence in an email already drafted by the senior, not producing the email). This boundary must be written into the practice's internal procedures, not left implicit.

A criterion many tax consultants find useful is this: facing a task, ask yourself "if a senior collaborator handed over the output without re-reading it, and it was wrong, would the client or the authority notice, and at what cost?". If the cost is low and recoverable (a sorting email that can be corrected), the task is mechanical. If the cost is medium (a document with an error that has to be flagged), it is interpretive. If the cost is high (wrong advice, an imprecise official communication), it is a judgement task.

This three-part classification helps to understand where AI enters, where it enters with supervision, and where it does not enter. It is a map that changes over time, because tools change too, but the logic remains: before choosing technologies, classify the tasks.

A practice that has done this preliminary work does not work with more AI or less AI. It works with AI where it is useful. The owner recovers two or three hours a week previously consumed on sorting emails or first readings of documents. Seniors accelerate on preparatory analyses. Juniors continue working in a traditional way on the professional core, because they need to build the foundations. Clients receive faster replies on repetitive topics and the same attentive quality on topics that matter.

The step preceding all this is not a purchase. It is a diagnosis. Looking at the practice and mapping the tasks into three categories takes an hour or two of structured conversation, and produces a document that is already useful in itself, even if no tool is then taken. It is the same logic we apply in the first conversation with a tax consultancy or accounting practice: first we understand where time is consumed, then we discuss what to do.

The fact that this classification is simple, and the way to check it is available to anyone who knows the practice, does not mean it is common. Many practices have introduced AI tools without ever having made the map, and they live with a mixture of enthusiasm for the cases where it works and frustration for the cases where it produces unpresentable results. The difference between those who get out of that phase and those who stay in it is the time dedicated to classifying.

If you recognise in your daily practice the pattern of forty requests a day, and the sense that half of your time goes on things that do not require your judgement, that is the kind of map worth making before any other choice.

Enjoyed the article?

Request the free diagnostic analysis of how your company operates. We produce it and walk through it together.

Request the free assessment