Change metadata of parquet files

I preprocessed and uploaded the entirety of the gilkeyio/librispeech-alignments dataset, which is huge. However, I set the wrong dataset._info.features for one column. Now, the key_value_metadata.0.valueof every parquet file in my dataset has "feats": {"shape": [null, 80], "dtype": "float32", "_type": "Array2D"}when I want it to be "feats": {"shape": [null, 39], "dtype": "float32", "_type": "Array2D"}. Changing the README metadata doesn’t solve the problem, as I get the following error loading the dataset:

ValueError: cannot reshape array of size 8931 into shape (229,80).

How can I change the parquet metadata without processing the whole dataset once again ?

1 Like

cc @lhoestq might know

1 Like

I think you have to reprocess the data unfortunately

2 Likes

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.