-
Notifications
You must be signed in to change notification settings - Fork 63
Reading and Writing of long String Variables from SPSS #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Very similar to #118 . Reported to Readstat for them to take a look. |
@ofajardo Are there any news regarding this bug? I just stumbled across this problem again when reading spss data with long strings. Some standard code wasn't working all of a sudden and it took me ages to realise that it was down to this problem again (columns being split without any warning). |
no news, sorry |
the issue can be replicated in pure C: WizardMac/ReadStat#260 |
@ofajardo Since I keep encountering this issue, I spent some time creating data to reproduce this issue, in case that it is of any help for finding the bug (sadly I don't have the abilities to actually help solve the issue). There are alot of variaties how the error is expressed when opening the file in spss, I tried to find a few examples.
|
I also tried the other examples and all of them seem good now. Closing this. |
Hi @ofajardo and thanks for testing! I just installed the newest version (1.2.1) but the problems from this issue haven't changed. Did you open the file in spss or how did you check whether it worked? When reading the same files back into python after writing them with pyreadstat the split columns don't appear. But when opened in spss they are being split. When creating the file directly in spss and then reading with pyreadstat, the variables were kept the way they should be. |
I see, I was checking by reading them with pyreadstat only. I re-open this issue then. Now I realize the issue was always that pyreadstat was reading it correctly but SPSS was not. |
I'm also experiencing a similar issue:
In the SPSS file that's created the length is set at 255. If I set it as 255 or less it will work, but anything higher than that and it will default to 255. |
I am running into the same issue. The names of the variables that are being created seem quite unpredictable, which makes writing a hacky quick fix difficult. Hopefully our friends at ReadStat can look into it! |
Same issue here. Using I found that
I would give a shot to change it to a higher value, but I have no idea how to compile it. The documentation says that it is straight forward, but I have no clue. I know it might not be the only place to change, but it is a start. I hope we have this issue fixed soon. |
I don't think simply changing the number 255 to a larger value would work. Since the library is written in C, there's a fundamental limitation with the char type which has a maximum of 255 characters. |
Thank you for your answer @gulchitai However, I couldn't understand what you wrote; probably because of my lack of knowledge in C. A char type is 1 byte limited, isn't it? I believe it is possible to create a variable like I tried to understand this I still have a hunch that it would be a good start. |
I think this subject matter has been referenced in related threads here or in https://github.com/WizardMac/ReadStat, but you mind find the PSPP documentation relating to long strings relevant: https://www.gnu.org/software/pspp/pspp-dev/pspp-dev.html#Very-Long-String-Record. Apparently, long strings are broken up into 255 length segments and stored as separate variables internally. I haven't looked at the code you've explored, but I believe that's what this length is referring to. |
Unfortunately the latest updates on Readstat source by today do not solve the issue (file is still read wrongly in SPSS) |
When reading and writing spss files with long string variables, the respective variable is being split into several variables.
Reproducing writing issue:
When this file is opened in SPSS, instead of 2 variable, it contains 5 ("LongString2" is follwed by "V2_A1", "V2_A2", "V2_A3").
When read back into Python with pyreadstat it only shows the 2 created variables.
Strangely, when only "LongString2" is created and written, or when its variable name is shorter ("LongStr"), the splitting does not occur.
Reproducing Reading Issue
Unfortunately I can't offer a file to reproduce the reading issue. The one, that causes a problem for me, can't be shared due to data protection.
And I didn't succeed in creating a sample file, that produces the same problem.
Setup Information:
The text was updated successfully, but these errors were encountered: