csv files and text delimiter

Discuss the spreadsheet application

csv files and text delimiter

Postby abielo » Mon Apr 14, 2008 4:16 pm

Hi. This is a different question, but since the topic is CSV files, it should be relevant. If not please, let me know.

The question is, when saving an XLS file in CSV format using Microsoft Excel, if there are 'commas' in the some of the XLS cells, you will end up with 'double quotes' around the string of text in those cells in the CSV format. If I save the same file using OpenOffice, I can only see two options. One, if I use 'double quotes' as a text delimiter, the double-quotes show up around every text string separated by the field delimiter (commas in my case). Two, if I don't use any text delimiter, then I don't get any double-quotes, but the strings of text that originally had 'commas' from the XLS format are now impossible to make out because my field delimiter is 'commas'. I can't change my field delimiter to something else beside commas and I can't have double quotes all over the place. I need OO to behave like Excel because I have scripts that run of the "Excel-CSV" format. So, if there anyways to get OO to behave like MS Excel in this case?

Thanks in advance for your reply,

-abielo.
abielo
 
Posts: 4
Joined: Mon Apr 14, 2008 3:48 pm

Re: csv files and text delimiter

Postby Hagar Delest » Mon Apr 14, 2008 4:56 pm

Even if the topic was about CSV files, it's not the same issue. So the thread had to be split.

Your explanation is not very clear for me (but I'm not a big user of Calc). Can you upload a sample file or screenshots to make it easier to understand? Are you talking about commas or single quotes?
AOO 4.1.7 on Xubuntu 20.04 and 4.1.5 on Windows 10 (with winPenPack port).
User avatar
Hagar Delest
Moderator
 
Posts: 28996
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: csv files and text delimiter

Postby abielo » Mon Apr 14, 2008 6:56 pm

Hi, thanks for the reply. Let me see if I can explain this better.

1- Here an example of the entries in the Spreadsheet:
data1|data2|data3.1,data3.2|data4|data5

As you can see the 3rd cell has a comma in the text string (data3.1,data3.2).

2- Here is the Microsoft Excell CSV output:
data1,data2,"data3.1,data3.2",data4,data5

As you can see, the output has double quotes around the text string that contains the comma ("data3.1,data3.2").

3- Here is the OpenOffice output using double quote as a Text Delimiter:
"data1","data2","data3.1,data3.2","data4","data5"

As you can see, there are double quotes around all text strings including the original text that had a comma in it.

4- Here is the OpenOffice output with no Text Delimiter:
data1,data2,data3.1,data3.2,data4,data5

As you can see, you can't distinguish the original text string that had a comma in it.

====================
The last output (output number 4), is totally useless. You can't tell what the original text strings were.
Output number 3 is almost useful if it weren't for double quotes been on every text string.

So the point is: when the field separator (in this case comma) is part of your text string, Excel puts double quotes around those text strings. With OO seemsto be all or nothing. Is there a way around this? I have scripts that expect the CSV format similar to what Excel is producing.

Thanks in advance for your time,

-abielo.
abielo
 
Posts: 4
Joined: Mon Apr 14, 2008 3:48 pm

Re: csv files and text delimiter

Postby Hagar Delest » Mon Apr 14, 2008 9:11 pm

OK, that's much more clear now.

But OOo behavior is rather logical: if it's a text, then put the text delimiter and else, don't put it. I can't understand why MS Excel doesn't do that. It may be related to the fact that MS Excel also handles text as number.

I understand you've been used to that behavior but for me it's an MS Excel bug: if another application than Excel tries to import such a file, how would it handle that???
AOO 4.1.7 on Xubuntu 20.04 and 4.1.5 on Windows 10 (with winPenPack port).
User avatar
Hagar Delest
Moderator
 
Posts: 28996
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: csv files and text delimiter

Postby abielo » Mon Apr 14, 2008 9:45 pm

I don't disagree with you. The OO behavior is logical, and Excel is doing something buggy (surprise!). The issue is that some applications I'm running can't parse the OO CSV output, but they parse the Excel CSV output. I don't know how they are parsing the CSV files, and I don't have access to the source code to fix the issue. That's why I was trying to see if OO could replicate the Excel output. Not a big deal. I wrote a simple script to convert the OO CSV output (the one with double quotes) to the "Excel" CSV format and now I'm good to go.

Thanks for your help anyways, it's good to know that I was not the only one this issue was not making a lot of sense. Perhaps there is a way in Excel to get it right.

Regards,

-abielo.
abielo
 
Posts: 4
Joined: Mon Apr 14, 2008 3:48 pm

Re: csv files and text delimiter

Postby Villeroy » Mon Apr 14, 2008 9:59 pm

I'm not shure about the problem. Just want to mention that you can edit the filter settings. When you set file type "Text (*.csv)" in the save-as dialog the option gets enabled. It will raise an additional dialog before the file gets written to disk.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04, no OpenOffice, LibreOffice 6.4
User avatar
Villeroy
Volunteer
 
Posts: 28644
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: csv files and text delimiter

Postby acknak » Mon Apr 14, 2008 10:24 pm

I think the core of the problem is that there is no standard CSV format, and different applications behave differently. Also, using the actual comma character as the delimiter leads to all kinds of ambiguities when commas can appear in the data.

If I export from Calc: 123 | ABC, I get
123,"ABC"

The first field is numeric, so it doesn't get any text delimiter.

If I format the numeric cell to show thousands, and export 123,456 | ABC, I get
"123,456", "ABC"

I guess that now Calc is backed into a corner, and must include a text delimiter because the number contains a comma. Ugh.

Which is why--when I have a choice--I use some other delimiter character that never appears in the data, like a tab.
AOO4/LO5 • Linux • Fedora 23
User avatar
acknak
Moderator
 
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: csv files and text delimiter

Postby abielo » Mon Apr 14, 2008 10:47 pm

Thanks for the replies.

Villeroy, yes I saw the dialog box and played with it. Didn't help.

Acknak, agree with you, but I'm not even the one entering the data into the Spreadsheet. We have various users and we have deal with anything they can throw into the Spreadsheet we are going to be converting to CSV.

Regards,

-abielo.
abielo
 
Posts: 4
Joined: Mon Apr 14, 2008 3:48 pm

Re: csv files and text delimiter

Postby Villeroy » Mon Apr 14, 2008 11:00 pm

Yes, use unambiguous delimiters. Since version 2.3(?) Calc has another option "Save as shown". It seems to me that this option is turned on by default, which is a petty. This way floating point numbers may lose precision. When I unset the option, all numbers are stored with full precision. Dates get the locale specific default format for dates.
German: 31.12.99
English(GB): 31/12/99
English(US): 12/31/99
Swedish: 99-12-31
[I don't like 2-digit years]

If you want full precision decimals with ISO-dates, keep "Save as shown" together with number format "#.000000000000000" or set Tools>Options...Calc>Calculation:Decimals places [15]
The decimal separator (dot or comma) is set automatically according to the number format locale, independent from "Save as shown".

Calc is a little bit complicated but much better than Excel, where you have to convert everything to quoted text in order to save with max. precision. Possibly this is the reason why Excel converts text to number on the fly. It could not handle it's own csv export otherwise.

Villeroy, yes I saw the dialog box and played with it. Didn't help.

Keep "Save as shown", format exactly as you like it to see in the csv and choose a field-delimiter such as semicolon, tab, pipe.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04, no OpenOffice, LibreOffice 6.4
User avatar
Villeroy
Volunteer
 
Posts: 28644
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: csv files and text delimiter

Postby huw » Tue Apr 15, 2008 10:41 am

Here's another Calc CSV quirk: Double quotes exported to CSV are multiplied on import, although I cannot reproduce this behaviour.
huw
Volunteer
 
Posts: 417
Joined: Wed Nov 21, 2007 1:57 pm

Re: csv files and text delimiter

Postby Villeroy » Tue Apr 15, 2008 11:12 am

Quotes within quoted strings are duplicated instead of introducing an escape char. This seems to be a common convention with all csv processors.

A spreadsheet with ISO formatted =NOW(), max. precision =PI(), a text with double quotes and the same text with „German“ and “English” typographic quotes. Comma is decimal separator, semicolon is field separator.
Code: Select all   Expand viewCollapse view
2008-04-15;3,14159265358979;"The Boy Named ""Sue""";"The Boy Named „Sue“";"The Boy Named “Sue”"

You get simple double-quotes instead of typographic ones when you set the font language [None].

For me, this document describes the csv "standard": http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
Yes, of course. Microsoft Excel is the trouble maker. It does even undermine pseudo-standards.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04, no OpenOffice, LibreOffice 6.4
User avatar
Villeroy
Volunteer
 
Posts: 28644
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: csv files and text delimiter

Postby huw » Tue Apr 15, 2008 11:55 am

Villeroy wrote:Quotes within quoted strings are duplicated instead of introducing an escape char. This seems to be a common convention with all csv processors.

The problem the person there was having was that the duplication was not removed on reimport:
Gdonwallace wrote:During an export / import from OOo to a csv, OOo does not remove the extra quotes. When I open the file in Calc, the extra quotes are still there. I'm running XP. My counter part is running a Linux box, and it seems to do the same thing to him.
huw
Volunteer
 
Posts: 417
Joined: Wed Nov 21, 2007 1:57 pm

Re: csv files and text delimiter

Postby Villeroy » Tue Apr 15, 2008 12:36 pm

huw wrote:The problem the person there was having was that the duplication was not removed on reimport

That person does not even tell us the office version, so you and me can not reproduce the behaviour.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04, no OpenOffice, LibreOffice 6.4
User avatar
Villeroy
Volunteer
 
Posts: 28644
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: csv files and text delimiter

Postby acknak » Tue Apr 15, 2008 6:12 pm

Hey, thanks for that link Villeroy, that's a nice description of CSV.

Somehow I knew that someone would dig up a standard to prove me wrong. ;-)
AOO4/LO5 • Linux • Fedora 23
User avatar
acknak
Moderator
 
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3


Return to Calc

Who is online

Users browsing this forum: No registered users and 30 guests