Donut finetuning - Unable to get a good accuracy for doc parsing

Praveen1114 · August 17, 2025, 8:25pm

Hi, I am learning how to finetune models and I wanted to try Donut. I created this for fine tuning donut - Google colab notebook. I referred these resources -

Document AI: Fine-tuning Donut for document-parsing using Hugging Face Transformers

(I used other blogs and posts as well, however being a new user I am only allowed to add 2 links)

I used a dataset which has 2000 records of receipts (it has invoices as well but for simplicity I use only receipts). Here is an example of the input labels i use for the training,

<s_receipt><s_total></s_total><s_tips></s_tips><s_time></s_time><s_telephone>703-777-5833</s_telephone><s_tax></s_tax><s_subtotal></s_subtotal><s_store_name>SAFEWAY</s_store_name><s_store_addr></s_store_addr><s_line_items><s_item_value>3.99</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>BCCHOCCUPCAKES</s_item_name><s_item_key></s_item_key><s_item_value>.49</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>ACNSPRYFRTSHAPE</s_item_name><s_item_key></s_item_key><s_item_value>1.99</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>DULCEDELECHECHE</s_item_name><s_item_key></s_item_key><s_item_value>1.50</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>MULTIGRAINCHEERIO</s_item_name><s_item_key></s_item_key><s_item_value>2.00</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>PROGRESSOR&HSTK</s_item_name><s_item_key></s_item_key><s_item_value>3.50</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>PLSBRYBSCTSANDWI</s_item_name><s_item_key></s_item_key><s_item_value>3.49</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>TOTINOSPZASTOFFE</s_item_name><s_item_key></s_item_key></s_line_items><s_ignore> </s_ignore><s_date></s_date></s_receipt>

However, the model doesn’t look like it is learning well enough. The predictions are no where close to being accurate. I am not able to figure out what I am doing wrong. After trying many things I am here looking for suggestions or help. Could someone please help me figure out what I am doing wrong.

John6666 · August 18, 2025, 2:06am

It seems that performance is not optimal when the prompt format is unexpected by the model. In addition, there seems to be versions where the processor save becomes corrupted.

There seem to be many reports that performance cannot be achieved unless labeling during training is accurate.

Topic		Replies	Views
Donut fine tuning question 🤗Optimum	0	1679	October 16, 2023
Finetuning Donut Transformer on DocParsing Beginners	0	898	October 23, 2023
Donut Pre-Train on new Language 🤗Transformers	4	2455	July 1, 2025
[DONUT] Typo errors - Document parsing 🤗Transformers	1	536	September 10, 2024
Finetune Donut with new tokenizer Intermediate	6	2826	October 10, 2023

Donut finetuning - Unable to get a good accuracy for doc parsing

Related topics