[Solved] Convert multi-string value to a single-string one

Discuss the spreadsheet application

[Solved] Convert multi-string value to a single-string one

Postby maxon » Sun Nov 24, 2019 6:13 pm

Hello,

I've got this CSV file containing product information which I need to edit:

1) Each row corresponds to one specific product

2) Column A lists categories which this product belongs to, presented in the following way (vertical bar is used as a delimiter for a string):

Code: Select all   Expand viewCollapse view
Mount type|Standalone
Purpose|Functional|Watches
Color|White
Style|Classical


If a particular product has more than one color, cells in column A allow listing numerous strings with the same attribute and different values, for example, if an item has three colors to it, it would be presented in the following way:

Code: Select all   Expand viewCollapse view
Mount type|Standalone
Purpose|Functional|Watches
Color|White
Color|Blue
Color|Gray
Style|Classical


3) Column B contains product attributes, which look like this:

Code: Select all   Expand viewCollapse view
Attribute|Model|1-0001
Attribute|Material:|Resin
Attribute|Type:|Watch
Attribute|Mount type:|Standalone
Attribute|Color:|White
Attribute|Style:|Classical
Attribute|Height:|10
Attribute|Width:|5
Attribute|Weight:|4


Cells in column B, however, do not allow to have multiple strings with the same attribute and different values. So one could not have

Code: Select all   Expand viewCollapse view
Attribute|Model|1-0001
Attribute|Material:|Resin
Attribute|Type:|Watch
Attribute|Mount type:|Standalone
Attribute|Color:|White
Attribute|Color:|Blue
Attribute|Color:|Gray
Attribute|Style:|Classical
Attribute|Height:|10
Attribute|Width:|5
Attribute|Weight:|4


because following the unique attribute rule, cell's 'color' value will be collapsed to topmost (first) string of said attribute, - 'Attribute|Color:|White' in this example, - deleting other strings with attribute 'color', resulting in loss of data.

And this is exactly what happened, as I've had mistakenly listed Color attribute for each product using different lines, instead of (as I now know) separating them with a comma and a blank space. Thus I now have a file with close to 1900 rows of product characteristics with their color attributes mangled: all the single-colored ones didn't suffer any damage, but all the multi-colored ones have only one color listed, with others destroyed.

Could you please help me edit data in this file in such a way so that Color attribute in column B becomes a single line attribute with values separated by a comma and a blank space, based on column A's value (also leaving all other values in this cell unchanged):

Current text in A and B:

Code: Select all   Expand viewCollapse view
  Column A                                               Column B
Mount type|Standalone                                Attribute|Model|1-0001
Purpose|Functional|Watches                           Attribute|Material:|Resin
Color|White                                          Attribute|Type:|Watch
Color|Blue                                           Attribute|Mount type:|Standalone
Color|Gray                                           Attribute|Color:|White
Style|Classical                                      Attribute|Style:|Classical
                                                     Attribute|Height:|10
                                                     Attribute|Width:|5
                                                     Attribute|Weight:|4


Becomes:

Code: Select all   Expand viewCollapse view
  Column A                                               Column B
Mount type|Standalone                                Attribute|Model|1-0001
Purpose|Functional|Watches                           Attribute|Material:|Resin
Color|White                                          Attribute|Type:|Watch
Color|Blue                                           Attribute|Mount type:|Standalone
Color|Gray                                           Attribute|Color:|White, Blue, Gray
Style|Classical                                      Attribute|Style:|Classical
                                                     Attribute|Height:|10
                                                     Attribute|Width:|5
                                                     Attribute|Weight:|4


Tackling this is surely beyond my capability, so I really hope you could help out.
Thank you for your time and effort.
Last edited by maxon on Mon Nov 25, 2019 8:36 pm, edited 1 time in total.
OpenOffice 4.1.6 on Win10
maxon
 
Posts: 2
Joined: Sun Nov 24, 2019 5:59 pm

Re: How to convert cell A's multi-string value to a single-s

Postby RusselB » Sun Nov 24, 2019 9:55 pm

I'm out of town for the day, but I'm going to suggest you start by importing your data into multiple columns using the Text import dialog or Data->Text to columns
OpenOffice 4.1.7 and LibreOffice 6.3.3.2 on Windows 7 Pro & Ultimate
If you believe your problem has been resolved, please go to your first post in this topic, click the Edit button and add [Solved] to the beginning of the Subject line.
User avatar
RusselB
Moderator
 
Posts: 5794
Joined: Fri Jan 03, 2014 7:31 am
Location: Sarnia, ON

Re: How to convert cell A's multi-string value to a single-s

Postby MrProgrammer » Sun Nov 24, 2019 10:42 pm

Hi, and welcome to the forum. Perhaps someone has a great idea, but I have two thoughts about this project, neither of which you will like. I worked for more than three decades as a professional data processing analyst/programmer, so consider my advice carefully.

maxon wrote:
Current text in A and B:
Column A Column B
Mount type|Standalone Attribute|Model|1-0001
Purpose|Functional|Watches Attribute|Material:|Resin
Color|White Attribute|Type:|Watch
Color|Blue Attribute|Mount type:|Standalone
Color|Gray Attribute|Color:|White
Style|Classical Attribute|Style:|Classical
Attribute|Height:|10
Attribute|Width:|5
Attribute|Weight:|4

Becomes:
Column A Column B
Mount type|Standalone Attribute|Model|1-0001
Purpose|Functional|Watches Attribute|Material:|Resin
Color|White Attribute|Type:|Watch
Color|Blue Attribute|Mount type:|Standalone
Color|Gray Attribute|Color:|White, Blue, Gray
Style|Classical Attribute|Style:|Classical
Attribute|Height:|10
Attribute|Width:|5
Attribute|Weight:|4

Tackling this is surely beyond my capability, so I really hope you could help out.
First, this work will be much easier in a imperative language like Perl instead of a functional language like Calc formulas. Perl can easily read/write CSV files. I realize you may not know Perl, however you don't know how to accomplish this with Calc either, and it will be harder with Calc. This is particularly true if all this text (below) is contained in a single cell.
   Mount type|Standalone
   Purpose|Functional|Watches
   Color|White
   Color|Blue
   Color|Gray
   Style|Classical
Calc has very limited text processing capabilities to begin with, and it is designed to operate on separate data in separate cells, not joined in a single cell. You can split the data in A and B with Text to Columns but it will be a challenge to match the rows and join them together again.

Second, even though 1900 entries will take time, you will probably fix your data fastest if you just edit it by hand, perhaps using Edit → Find&Replace to help. For example, you can filter the first column to display only the cells which contain White AND Blue AND Gray, then replace Color:|White with Color:|White, Blue, Gray in the second column, perhaps fixing a dozen rows at once. If, say, a quarter of the items have multiple attributes, that's 475 to edit. If you can use Find&Replace to fix even five at a time, that's about 100. If each edit takes 30 seconds, that's 3000 seconds, or less than an hour. It's unlikely you can put together an automated solution in an hour, especially if you're not a programmer.

[Tutorial] Ten concepts that every Calc user should know, section 6. Saving your data and making backups

Good luck with the project. I am done helping here.
Mr. Programmer
AOO 4.1.7 Build 9800 on MacOS 10.14.6.   The locale for any menus or Calc formulas in my posts is English (USA).
User avatar
MrProgrammer
Moderator
 
Posts: 3941
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: How to convert cell A's multi-string value to a single-s

Postby maxon » Mon Nov 25, 2019 8:34 pm

RusselB, MrProgammer - Thanks for clarifying that there was no quick and easy solution for this.

Considering what MrProgammer said:

MrProgrammer wrote:Second, even though 1900 entries will take time, you will probably fix your data fastest if you just edit it by hand

MrProgrammer wrote: It's unlikely you can put together an automated solution in an hour, especially if you're not a programmer.


and the fact that I am indeed no programmer, I had to monkey my way around this file, manually correcting everything :) Took about 7 hrs in total. On the bright side I've fished out a couple dozens of data entry inaccuracies and typos along the way.

RusselB, MrProgrammer - hat's off to you, sirs, for quick replies and friendliness

All in all heart-warming welcome to the forum :)
OpenOffice 4.1.6 on Win10
maxon
 
Posts: 2
Joined: Sun Nov 24, 2019 5:59 pm


Return to Calc

Who is online

Users browsing this forum: No registered users and 28 guests