Change metadata of parquet files

maliced · August 8, 2025, 2:17pm

I preprocessed and uploaded the entirety of the gilkeyio/librispeech-alignments dataset, which is huge. However, I set the wrong dataset._info.features for one column. Now, the key_value_metadata.0.valueof every parquet file in my dataset has "feats": {"shape": [null, 80], "dtype": "float32", "_type": "Array2D"}when I want it to be "feats": {"shape": [null, 39], "dtype": "float32", "_type": "Array2D"}. Changing the README metadata doesn’t solve the problem, as I get the following error loading the dataset:

ValueError: cannot reshape array of size 8931 into shape (229,80).

How can I change the parquet metadata without processing the whole dataset once again ?

severo · August 8, 2025, 3:30pm

cc @lhoestq might know

lhoestq · September 2, 2025, 10:27am

I think you have to reprocess the data unfortunately

system · September 2, 2025, 10:27pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
The full dataset viewer is not available (click to read why). Only showing a preview of the rows Beginners	2	37	May 5, 2025
Parquet image dataset 🤗Datasets	6	1309	July 10, 2024
Dataset features change based on download 🤗Datasets	1	134	April 3, 2024
Standard way to upload huge dataset 🤗Datasets	5	710	April 26, 2024
Cant save Dataset as Parquet-File since Updating Datasets? 🤗Datasets	1	2480	May 1, 2021

Change metadata of parquet files

Related topics