Donut finetuning - Unable to get a good accuracy for doc parsing

Hi, I am learning how to finetune models and I wanted to try Donut. I created this for fine tuning donut - Google colab notebook. I referred these resources -

  1. Document AI: Fine-tuning Donut for document-parsing using Hugging Face Transformers

(I used other blogs and posts as well, however being a new user I am only allowed to add 2 links)

I used a dataset which has 2000 records of receipts (it has invoices as well but for simplicity I use only receipts). Here is an example of the input labels i use for the training,

<s_receipt><s_total></s_total><s_tips></s_tips><s_time></s_time><s_telephone>703-777-5833</s_telephone><s_tax></s_tax><s_subtotal></s_subtotal><s_store_name>SAFEWAY</s_store_name><s_store_addr></s_store_addr><s_line_items><s_item_value>3.99</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>BCCHOCCUPCAKES</s_item_name><s_item_key></s_item_key><s_item_value>.49</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>ACNSPRYFRTSHAPE</s_item_name><s_item_key></s_item_key><s_item_value>1.99</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>DULCEDELECHECHE</s_item_name><s_item_key></s_item_key><s_item_value>1.50</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>MULTIGRAINCHEERIO</s_item_name><s_item_key></s_item_key><s_item_value>2.00</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>PROGRESSOR&HSTK</s_item_name><s_item_key></s_item_key><s_item_value>3.50</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>PLSBRYBSCTSANDWI</s_item_name><s_item_key></s_item_key><s_item_value>3.49</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>TOTINOSPZASTOFFE</s_item_name><s_item_key></s_item_key></s_line_items><s_ignore> </s_ignore><s_date></s_date></s_receipt>

However, the model doesn’t look like it is learning well enough. The predictions are no where close to being accurate. I am not able to figure out what I am doing wrong. After trying many things I am here looking for suggestions or help. Could someone please help me figure out what I am doing wrong.

1 Like

It seems that performance is not optimal when the prompt format is unexpected by the model. In addition, there seems to be versions where the processor save becomes corrupted.

There seem to be many reports that performance cannot be achieved unless labeling during training is accurate.