You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried various models of transformers.js and those that support past_key_values does not actually handle it. I face several issues:
The default past_key_values are gpuBuffer tensors and ONNX requires cpu tensors as input
Downloading past_key_values into cpu using downloader method and running again will run into dimension inconsistency problems. Basically we need to feed input_ids, attention_mask, position_ids into the model.generate(), I tried various shapes and all failed:
Assume the past_key_value.dims[2] = past_length and the input_ids.dims[1] = full_length. I tweaked all combinations of each input being past_length or full_length or full_length - past_length or simply 1 (one token). None worked.
Please share a working example of transformers.js with past_key_values enabled.
asyncfunctionconvertToCPUTensor(ortTensor){if(!ortTensor||typeofortTensor.downloader!=='function'){thrownewError('Invalid ort_tensor: missing downloader method');}// Download the data from GPUconstrawData=awaitortTensor.downloader();// usually a Float16Array or Float32Array// Check the tensor type and convert to Float32Array if it's float16letdata=rawData;letdtype=ortTensor.type;if(dtype==='float16'){data=Float16Array.from(rawData);// Ensure data remains float16dtype='float16';}returnnewTensor(dtype,data,ortTensor.dims);}
functionbuildInputsForGenerate(full_inputs,past_key_values_cache,modelKey){constinput_ids_tensor=full_inputs.input_ids;if(!past_key_values_cache[modelKey]){returnfull_inputs;}constseq_len=input_ids_tensor.dims[1];if(seq_len===0){thrownewError("input_ids is empty — can't slice last token.");}// Use past key dims to get cached lengthconstpast=past_key_values_cache[modelKey];constpast_len=past['past_key_values.0.key'].dims[2];constnew_len=seq_len-past_len;constinput_ids=input_ids_tensor.slice([0,1],[seq_len-1,seq_len]);constattention_mask_length=seq_len+1;constattention_mask=newTensor("int64",BigInt64Array.from([//...Array(past_len).fill(BigInt(0)), // Mask out past tokens
...Array(attention_mask_length).fill(BigInt(1)),// Attend only to new tokens]),[1,attention_mask_length]);constposition_ids=newTensor("int64",BigInt64Array.from([...Array(new_len).keys()].map(i=>BigInt(past_len+i))),[1,new_len]);return{
input_ids,
attention_mask,
position_ids,};}
The text was updated successfully, but these errors were encountered:
I tried various models of transformers.js and those that support past_key_values does not actually handle it. I face several issues:
model.generate()
, I tried various shapes and all failed:past_key_value.dims[2] = past_length
and theinput_ids.dims[1] = full_length
. I tweaked all combinations of each input beingpast_length
orfull_length
orfull_length - past_length
or simply 1 (one token). None worked.Please share a working example of transformers.js with past_key_values enabled.
Here is my code:
The text was updated successfully, but these errors were encountered: