Page 1 of 1

[Calc] Split string into an array of single words

PostPosted: Fri Aug 13, 2010 4:46 am
by jari0471
Split string into an array of single words (similar to separation of "arguments" in shell command line)

Code: Select all   Expand viewCollapse view
Sub test
  Dim p As String
  p1="aa bb  cc   dd"
  a1=Split(p1," ")
  Print a1(0)
  Print a1(1)
  Print a1(2)
  Print a1(3)
End Sub

Displayed is "aa", "bb", " " and "cc".
Displayed should be "aa", "bb", "cc" and "dd".

There is 2 spaces between "bb" and "cc" and 3 spaces between "cc" and "dd" in p1.
Is there another command or different option for this command that can do separation of words (arguments)?
If a large file is read to calc sheet, it will take a long time if the splitting function is slow. Thus fast (and simple) operation is requested.

Re: Split string into an array of single words

PostPosted: Fri Aug 13, 2010 5:39 am
by Charlie Young
I get aa bb cc dd running the code copied exactly from above, but where are the strings coming from? If they are in, or could be gotten into spreadsheet cells, you could use Data > Text to Columns.

Re: Split string into an array of single words

PostPosted: Fri Aug 13, 2010 6:55 am
by jari0471
The forum seemed to remove additional "spaces" between the quotes.

Re: Split string into an array of single words

PostPosted: Fri Aug 13, 2010 6:59 am
by jari0471
There is 2 spaces between "bb" and "cc" and 3 spaces between "cc" and "dd" in p1 (the forum posting seems to remove the additional spaces automatically).

Re: Split string into an array of single words

PostPosted: Fri Aug 13, 2010 7:15 am
by RoryOF
jari0471 wrote:There is 2 spaces between "bb" and "cc" and 3 spaces between "cc" and "dd" in p1 (the forum posting seems to remove the additional spaces automatically).


In that case, and in all cases where the layout/spacing is important, you should post using the "Code" option in the Full Editor window. This option is not available in the Quick Reply window - you have to switch to Full Editor.

Re: Split string into an array of single words

PostPosted: Fri Aug 13, 2010 7:27 am
by FJCC
Here is a not-very-elegant solution for the example given
Code: Select all   Expand viewCollapse view
Sub test
p1="aa bb  cc   dd"
a1=Split(p1," ")
count = 0
Dim a2(ubound(a1))
For Each Word in a1
   If Word <> "" then
      a2(count) = Word
      count = count+1
   end if
next Word
Redim Preserve a2(count)
Print a2(0)
Print a2(1)
Print a2(2)
Print a2(3)
End Sub

Re: Split string into an array of single words

PostPosted: Fri Aug 13, 2010 8:20 am
by jari0471
The solution you give can count the number of "words", but can be come "painfully" slow do to the for-loop, when there is 1000's of strings to work with... (Thus I try find internal function, java -code or any other solution that gives very fast speed also).

Re: Split string into an array of single words

PostPosted: Fri Aug 13, 2010 9:08 am
by RoryOF
If you take your strings into a word processor, such as Writer, its Find and Replace can quickly simplify multiple spaces to single spaces, then change them to new lines or other delimiters, so that the file can be reopened in Calc using the CSV import mechanism.

Re: Split string into an array of single words

PostPosted: Fri Aug 13, 2010 9:47 am
by jari0471
The idea using writer is possible. However the text is read from a file "iFile" with Line Input (that gives single string for a line).
The original file(s) should not be modified, since they are data files.
Code: Select all   Expand viewCollapse view
 
  iFile = Freefile
  Open sPath+"/"+sFile For Input As iFile

  While (not eof(iFile))

    Line Input #iFile,sValue
    sValue=Trim(sValue)
    If sValue<>"" Then

Maybe there is another way reading the files that can do the splitting the words.
Some of the lines (documentation, etc. other data) in the original file are not intended to be used, thus the read-routine is complicated.

Re: Split string into an array of single words

PostPosted: Fri Aug 13, 2010 10:06 am
by Hagar Delest
jari0471 wrote:The forum seemed to remove additional "spaces" between the quotes.

Use BBCode (I've edited your first post).

Re: [Calc] Split string into an array of single words

PostPosted: Fri Aug 13, 2010 11:45 am
by RoryOF
The idea using writer is possible. However the text is read from a file "iFile" with Line Input (that gives single string for a line).
The original file(s) should not be modified, since they are data files


In that case, you should be working on copies of the files, not on the originals. The Writer path then seems most practical.

Re: [Calc] Split string into an array of single words

PostPosted: Fri Aug 13, 2010 12:45 pm
by RPG
Hello

Maybe you can use the function replace. It is not document but I believe it is normal function. I'm not sure if it is only in the go-oo version or in all version of OOo.
As far I knew it can have six parameters
a) the string
b) The new part
c) the old part
d) where to start
e) How often to change a part
f) Case sensitive

Code: Select all   Expand viewCollapse view
Sub test
  Dim p As String
  p1="aa bb    cc   dd"
  p1=replace(p1,"  "," ")
  a1=Split(p1," ")
  msgbox  a1(0) & chr(13) & _
   a1(1) & chr(13) & _
   a1(2) & chr(13) & _
   a1(3)
End Sub



Romke

Re: [Calc] Split string into an array of single words

PostPosted: Fri Aug 13, 2010 2:37 pm
by B Marcelly
Hi,
I took this as a recreation challenge :P

Here is a general solution, it works with words separated by any length of space.
It tries to replace a long series of spaces by one space, then tries with a shorter series, etc.
See how powerful is split/join 8-)
Code: Select all   Expand viewCollapse view
Dim phrase As String
phrase = "hello OOoBasic programmers, this   is       a  solution"

Dim nbSpaces As Long, v As Variant

nbSpaces = 5  ' initial value, greater or equal to 2
Do While nbSpaces > 1
  v = split(phrase, space(nbSpaces))
  if UBound(v()) = 0  then
    nbSpaces = nbSpaces -1
  else
    phrase = join(v, " ")
  end if
Loop
v = split(phrase, " ")
' *** display of vector v() ***
MsgBox(join(v, chr(13)), 0, "List of words")

The initial value of nbSpaces can optimize the number of loops, if you know what length of spaces you may usually have. But even a start value of 2 can do the job.

Re: [Calc] Split string into an array of single words

PostPosted: Fri Aug 13, 2010 8:10 pm
by Charlie Young
OK, here's my attempt, which doesn't use the Basic Split function. I, um, borrowed Bernard's trick for displaying the results.

Code: Select all   Expand viewCollapse view
Sub TestStrSplit
   Dim a
   Dim s As String
   Dim i As Integer
   
   s = "hello OOoBasic programmers, this   is       another  solution"

   a = StrSplit(s," ")
   MsgBox(join(a,chr(13))
End Sub

Function StrSplit(s As String, SplitChar As String)
   Dim i As Integer
   Dim n As Integer
   Dim m As String
   Dim t As String
   Dim a(0) As String
   Dim InWord As Boolean
      
   t =""
   InWord = False
   i = 1
   n = 0
      
   do while i   <=   len(s)
      m = mid(s,i,1)
      if m <> SplitChar then
         InWord = True
         t = t & m
      else
         if InWord then
            Redim Preserve a(n) As String
            a(n) = t
            n = n + 1
            t = ""
            InWord = False
         endif
      endif
      i = i + 1
   loop
   if InWord then
      Redim Preserve a(n) As String
      a(n) = t
   endif
   
   StrSplit = a
End Function

Re: [Calc] Split string into an array of single words

PostPosted: Sat Aug 14, 2010 6:07 am
by jari0471
Since the original data is in text file (and used line input to place it to a string line by line),
it could be also possible to read the original text file word by word and no split is required at all.
But I could not find how to do this either.

Re: [Calc] Split string into an array of single words

PostPosted: Tue May 30, 2017 8:46 am
by eddyparkinson
Bug fix - handle blank cells
Code: Select all   Expand viewCollapse view
REM  *****  BASIC  *****
Sub TestStrSplit_clone
   Dim a
   Dim s As String
   Dim i As Integer
   
   s = "hello OOoBasic programmers, this   is       another  solution"

   a = StrSplit(s," ")
   MsgBox(join(a,chr(13))
   a = StrSplit(Nothing," ")
   MsgBox(join(a,chr(13))
End Sub

Function StrSpliter(s, SplitChar As String)
  StrSpliter = "a"
End Function


Function StrSplit(SplitThis, SplitChar As String)
   Dim s As String

   if IsNull(SplitThis) or IsEmpty(SplitThis) then s = ""  else s =  SplitThis
   Dim i As Integer
   Dim n As Integer
   Dim m As String
   Dim t As String
   Dim a(0) As String
   Dim InWord As Boolean
     
   t =""
   InWord = False
   i = 1
   n = 0
     
   do while i   <=   len(s)
      m = mid(s,i,1)
      if m <> SplitChar then
         InWord = True
         t = t & m
      else
         if InWord then
            Redim Preserve a(n) As String
            a(n) = t
            n = n + 1
            t = ""
            InWord = False
         endif
      endif
      i = i + 1
   loop
   if InWord then
      Redim Preserve a(n) As String
      a(n) = t
   endif
   
   StrSplit = a
End Function

Re: [Calc] Split string into an array of single words

PostPosted: Wed Jun 14, 2017 10:47 am
by RPG
Hello

It seems to me that in the last post is not understand that B Marcelly and Charlie Young did play a little with some commands. I think it is also good to make a real short example of how to solve the problem.

In starBASIC there are two commands JOIN and SPLIT who are real powerful as also B Marcelly pointed out. Most of the time those two commands are used with a single character to join or split values. But it is also possible to use more characters then one character, B Marcelly did use it in his example in the split part.

I do prefer the command replace. In the time when this thread started the replace command was not in all version of OOo. It is now in all new version of OpenOffice. I am not sure if it works in all cases but I have test it as much I can. It is also only an example to learn to use REPLACE and SPLIT.

It is me not clear what people want do with the StrSplit function. I want only place here how to use replace and split. When you knew the function replace, split, join then it is more easy to make short and easy code.
Code: Select all   Expand viewCollapse view
Function StrSplit(SplitThis, SplitChar As String)
dim tempory  ' we define a tempory variable
' Be aware that we use the variabele first for the string value.
' We set it as an empty string
tempory="" ' Make sure we have an empty string.
if typename(SplitThis)="String" then
' The parameter is a string in this case we need a string   
   tempory=replace(SplitThis,"  ", " ") ' erase first all double spaces
' Be aware that that tempory is change from a single variabele to an array variabele
   tempory =SPLIT(tempory,SplitChar) ' Split string to array
end if
StrSplit=tempory ' Give back to the function an empty string or an array as the original function was
End Function


Romke

Re: [Calc] Split string into an array of single words

PostPosted: Tue Nov 12, 2019 9:52 am
by jarkky
At the end (after 9 years) the below is the latest solution:

Code: Select all   Expand viewCollapse view
Sub Test_string
  Print parse_string("a b c")(2)
  Print parse_string("")(0)
End Sub

Function parse_string(stri as String) as Array

  lst=Len(stri)
  Dim ssv(30) as String
  Dim ssvr(0) as String

  i=0 : n=0
  Do While i<lst
    Do While i<lst
      f=0
      If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
        f=1
        i=i+1
      End If
      If f=0 Then
        Exit Do
      End If
    Loop
    If i=lst Then
      Exit Do
    End If
    If Mid(stri,i+1,1)="'" Then
      f=0 : i=i+1
      ssv(n)=""
      Do While i<lst
        If Mid(stri,i+1,1)="'" Then
          f=1 : i=i+1
          Exit Do
        Else
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
      Loop
      If f=0 Then
        Print "String has no closing quotation!"
        parse_string="@closing_quotation_missing"
        Exit Function
      End If
      If i<lst Then
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          Print "After closing quotation at column "+Str(i)+" string separator is required!"
          parse_string="@string_sepator_missing"
          Exit Function
        End If
        i=i+1
      End If
      n=n+1
    Else
      ssv(n)=Mid(stri,i+1,1)
      i=i+1
      Do While i<lst
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          If Mid(stri,i+1,1)="'" Then
            Print "Quotation character at column "+Str(i)+" not allowed in middle of string!"
            parse_string="@wrong_quotation_type"
            Exit Function
          End If
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
        If f=1 Then
          Exit Do
        End If
      Loop
      n=n+1
    End If
  Loop
  If (n>0) Then
    Redim ssvr(n-1) as String
  End If
  For j=0 To n-1
    ssvr(j)=ssv(j)
  Next j
  parse_string=ssvr
 
End Function




It looks like the "parse_string" command as this is missing in the openoffice macro commands.

Re: [Calc] Split string into an array of single words

PostPosted: Tue Nov 12, 2019 10:26 am
by jarkky
Below is possibly the right method.
To test if string is empty it looks like requiring using ubound&lbound.
Not sure if there is any test command for empty arrays, such as "If v=[] Then ... End If"

Code: Select all   Expand viewCollapse view
Sub Test_string
  Print parse_string("a b c")(2)
  ev=parse_string(" ")
  Print lbound(ev),":",ubound(ev)
End Sub

Function parse_string(stri as String) as Array

  lst=Len(stri)
  Dim ssv(30) as String
  Dim ssvr() as String

  i=0 : n=0
  Do While i<lst
    Do While i<lst
      f=0
      If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
        f=1
        i=i+1
      End If
      If f=0 Then
        Exit Do
      End If
    Loop
    If i=lst Then
      Exit Do
    End If
    If Mid(stri,i+1,1)="'" Then
      f=0 : i=i+1
      ssv(n)=""
      Do While i<lst
        If Mid(stri,i+1,1)="'" Then
          f=1 : i=i+1
          Exit Do
        Else
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
      Loop
      If f=0 Then
        Print "String has no closing quotation!"
        parse_string="@closing_quotation_missing"
        Exit Function
      End If
      If i<lst Then
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          Print "After closing quotation at column "+Str(i)+" string separator is required!"
          parse_string="@string_sepator_missing"
          Exit Function
        End If
        i=i+1
      End If
      n=n+1
    Else
      ssv(n)=Mid(stri,i+1,1)
      i=i+1
      Do While i<lst
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          If Mid(stri,i+1,1)="'" Then
            Print "Quotation character at column "+Str(i)+" not allowed in middle of string!"
            parse_string="@wrong_quotation_type"
            Exit Function
          End If
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
        If f=1 Then
          Exit Do
        End If
      Loop
      n=n+1
    End If
  Loop
  If (n>0) Then
    Redim ssvr(n-1) as String
  End If
  For j=0 To n-1
    ssvr(j)=ssv(j)
  Next j
  parse_string=ssvr
 
End Function

Re: [Calc] Split string into an array of single words

PostPosted: Tue Nov 12, 2019 10:54 am
by jarkky
Using "Redim preserve" at the last part will then help avoid to copy the vector contents into another vector:

Code: Select all   Expand viewCollapse view
Sub Test_string
  Print parse_string("a b c")(2)
  ev=parse_string(" ")
  Print lbound(ev),":",ubound(ev)
End Sub

Function parse_string(stri as String) as Array

  lst=Len(stri)
  Dim ssv(30) as String
  Dim ssvr() as String

  i=0 : n=0
  Do While i<lst
    Do While i<lst
      f=0
      If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
        f=1
        i=i+1
      End If
      If f=0 Then
        Exit Do
      End If
    Loop
    If i=lst Then
      Exit Do
    End If
    If Mid(stri,i+1,1)="'" Then
      f=0 : i=i+1
      ssv(n)=""
      Do While i<lst
        If Mid(stri,i+1,1)="'" Then
          f=1 : i=i+1
          Exit Do
        Else
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
      Loop
      If f=0 Then
        Print "String has no closing quotation!"
        parse_string="@closing_quotation_missing"
        Exit Function
      End If
      If i<lst Then
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          Print "After closing quotation at column "+Str(i)+" string separator is required!"
          parse_string="@string_sepator_missing"
          Exit Function
        End If
        i=i+1
      End If
      n=n+1
    Else
      ssv(n)=Mid(stri,i+1,1)
      i=i+1
      Do While i<lst
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          If Mid(stri,i+1,1)="'" Then
            Print "Quotation character at column "+Str(i)+" not allowed in middle of string!"
            parse_string="@wrong_quotation_type"
            Exit Function
          End If
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
        If f=1 Then
          Exit Do
        End If
      Loop
      n=n+1
    End If
  Loop
  If (n>0) Then
    Redim preserve ssv(n-1) as String
    parse_string=ssv
  Else
    parse_string=ssvr
  End If
 
End Function

Re: [Calc] Split string into an array of single words

PostPosted: Tue Nov 12, 2019 12:25 pm
by jarkky
I have tried to check more and thinking if function error return should occur with global variables (to return full error descriptions).
This case if openoffice had global global variable for error return for each command I could use those same variable here. But this may require more checking...
Also if the Mid()-function index starts at index=1 rather than at index=0 can be problem as well as that redim can not make empty strings.


Code: Select all   Expand viewCollapse view
Sub Test_string
  Print parse_string("a b c")(2)
  Print erf
  ev=parse_string(" ")
  Print lbound(ev),":",ubound(ev)
  Print erf
  ev=parse_string("a b 'c")
  Print erf
  If erf=True Then
    Print ers
  End If
End Sub

REM Function error return is defined at the global variable level.
REM This could be possible method for all the functions.
REM erf : Flag TRUE/FALSE (error occur or not)
REM ers : Error name as string
REM erm : Error message as string
Global erf as Boolean
Global ers as String
Global erm as String

Function parse_string(stri as String) as Array

  lst=Len(stri)
  Dim ssv(30) as String
  Dim ssvr() as String

  i=0 : n=0
  Do While i<lst
    Do While i<lst
      f=0
      REM The Mid()-command requires i+1 argument when i starts from zero. Can this be problem?
      If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
        f=1
        i=i+1
      End If
      If f=0 Then
        Exit Do
      End If
    Loop
    If i=lst Then
      Exit Do
    End If
    If Mid(stri,i+1,1)="'" Then
      f=0 : i=i+1
      ssv(n)=""
      Do While i<lst
        If Mid(stri,i+1,1)="'" Then
          f=1 : i=i+1
          Exit Do
        Else
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
      Loop
      If f=0 Then
        erm="String has no closing quotation!"
        ers="@closing_quotation_missing"
        erf=TRUE
        parse_string=ssvr
        Print erm
        Exit Function
      End If
      If i<lst Then
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          erm="After closing quotation at column "+Str(i)+" string separator is required!"
          ers="@string_sepator_missing"
          erf=TRUE
          parse_string=ssvr
          Print erm
          Exit Function
        End If
        i=i+1
      End If
      n=n+1
    Else
      ssv(n)=Mid(stri,i+1,1)
      i=i+1
      Do While i<lst
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          If Mid(stri,i+1,1)="'" Then
            erm="Quotation character at column "+Str(i)+" not allowed in middle of string!"
            ers="@wrong_quotation_placement"
            erf=TRUE
            parse_string=ssvr
            Print erm
            Exit Function
          End If
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
        If f=1 Then
          Exit Do
        End If
      Loop
      n=n+1
    End If
  Loop
  REM redim does not seem to operate with negative value (to make empty string)!
  If (n>-1) Then
    Redim preserve ssv(n-1) as String
    parse_string=ssv
  Else
    parse_string=ssvr
  End If
  erm=""
  ers=""
  erf=FALSE
End Function

Re: [Calc] Split string into an array of single words

PostPosted: Tue Nov 12, 2019 1:30 pm
by jarkky
The error handling in the user functions would be probably using different variables that as "erl" & "err" by openoffice own functions:

Code: Select all   Expand viewCollapse view
Sub Test_String
  Print parse_string("a b c")(2)
  Print erf
  ev=parse_string(" ")
  Print lbound(ev),":",ubound(ev)
  Print erf
  ev=parse_string("a b 'c")
  Print erf
  If erf=True Then
    Print ers
  End If
End Sub

REM Function error return is defined at the global variable level.
REM This could be possible method for all the functions.
REM The user function can not stop into the error automatically or detect error line number.
REM Errors must be checked from the erf-flag. erf is operated by all commands, while
REM erc, ers and erm are filled in the function only in case erf=TRUE
REM If erc, ers and erm would be arrays, then all the errors in the program could be logged
REM into memory or a file for later inspection. Also I can not fill the "erl" and "err"
REM variables for user defined functions. That is if I have my own error message outside
REM the openoffice errors the program will not halt for that case.
REM Error handling should be available for the user defined functions also...
REM erf : Flag TRUE/FALSE (error occurred in last command or not)
REM erc : Command that produced the error
REM ers : Error name as string
REM erm : Error message as string
Global erf as Boolean
Global erc as String
Global ers as String
Global erm as String
REM The above parameters could be initialized globally, but how?

Function parse_string(stri as String) as Array

  lst=Len(stri)
  Dim ssv(30) as String
  Dim ssvr() as String

  i=1 : n=0
  Do While i<=lst
    Do While i<=lst
      f=0
      If Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9) Then
        f=1
        i=i+1
      End If
      If f=0 Then
        Exit Do
      End If
    Loop
    If i>lst Then
      Exit Do
    End If
    If Mid(stri,i,1)="'" Then
      f=0 : i=i+1
      ssv(n)=""
      Do While i<=lst
        If Mid(stri,i,1)="'" Then
          f=1 : i=i+1
          Exit Do
        Else
          ssv(n)=ssv(n)+Mid(stri,i,1)
        End If
        i=i+1
      Loop
      If f=0 Then
        erc="parse_string()"
        erm="String has no closing quotation!"
        ers="@closing_quotation_missing"
        erf=TRUE
        parse_string=ssvr
        Print erm
        Exit Function
      End If
      If i<=lst Then
        f=0
        If Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          erc="parse_string()"
          erm="After closing quotation at column "+Str(i)+" string separator is required!"
          ers="@string_sepator_missing"
          erf=TRUE
          parse_string=ssvr
          Print erm
          Exit Function
        End If
        i=i+1
      End If
      n=n+1
    Else
      ssv(n)=Mid(stri,i,1)
      i=i+1
      Do While i<=lst
        f=0
        If Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          If Mid(stri,i,1)="'" Then
            erc="parse_string()"
            erm="Quotation character at column "+Str(i)+" not allowed in middle of string!"
            ers="@wrong_quotation_placement"
            erf=TRUE
            parse_string=ssvr
            Print erm
            Exit Function
          End If
          ssv(n)=ssv(n)+Mid(stri,i,1)
        End If
        i=i+1
        If f=1 Then
          Exit Do
        End If
      Loop
      n=n+1
    End If
  Loop
  REM redim does not seem to operate with negative value (to make empty string)!
  If (n>0) Then
    Redim preserve ssv(n-1) as String
    parse_string=ssv
  Else
    parse_string=ssvr
  End If
  erf=FALSE
End Function

Re: [Calc] Split string into an array of single words

PostPosted: Tue Nov 12, 2019 3:38 pm
by jarkky
It looks like the "split string" could work as below:


Code: Select all   Expand viewCollapse view
Sub Test_String
  Print parse_string("a b c")(2)
  Print erf
  ev=parse_string(" ")
  Print lbound(ev),":",ubound(ev)
  Print erf
  ev=parse_string("a b 'c")
  Print erf
  If erf=True Then
    Print ers
  End If
End Sub

REM Function error return is defined at the global variable level.
REM This could be possible method for all the functions.
REM The user function can not stop into the error automatically or detect error line number.
REM Errors must be checked from the erf-flag. erf is operated by all commands, while
REM erc, ers and erm are filled in the function only in case erf=TRUE
REM If erc, ers and erm would be arrays, then all the errors in the program could be logged
REM into memory or a file for later inspection. Also I can not fill the "erl" and "err"
REM variables for user defined functions. That is if I have my own error message outside
REM the openoffice errors the program will not halt for that case.
REM Error handling should be available for the user defined functions also...
REM erf : Flag TRUE/FALSE (error occurred in last command or not)
REM erc : Command that produced the error
REM ers : Error name as string
REM erm : Error message as string
Global erf as Boolean
Global erc as String
Global ers as String
Global erm as String
REM The above parameters could be initialized globally, but how?

Function parse_string(stri as String) as Array

  lst=Len(stri)
  Dim ssv(30) as String
  Dim ssvr() as String

  i=1 : n=0
  Do While i<=lst
    Do While (Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9))
      i=i+1
      If i>lst Then
        Exit Do
      End If
    Loop
    If i>lst Then
      Exit Do
    End If
    If Mid(stri,i,1)="'" Then
      f=0 : i=i+1
      ssv(n)=""
      Do While i<=lst
        If Mid(stri,i,1)="'" Then
          f=1 : i=i+1
          Exit Do
        Else
          ssv(n)=ssv(n)+Mid(stri,i,1)
        End If
        i=i+1
      Loop
      If f=0 Then
        erc="parse_string()"
        erm="String has no closing quotation!"
        ers="@closing_quotation_missing"
        erf=TRUE
        parse_string=ssvr
        Print erm
        Exit Function
      End If
      If i<=lst Then
        If Not(Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9)) Then
          erc="parse_string()"
          erm="After closing quotation at column "+Str(i)+" string separator is required!"
          ers="@string_sepator_missing"
          erf=TRUE
          parse_string=ssvr
          Print erm
          Exit Function
        End If
        i=i+1
      End If
    Else
      ssv(n)=Mid(stri,i,1)
      i=i+1
      Do While i<=lst
        If Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9) Then
          i=i+1
          Exit Do
        End If
        If Mid(stri,i,1)="'" Then
          erc="parse_string()"
          erm="Quotation character at column "+Str(i)+" not allowed in middle of string!"
          ers="@wrong_quotation_placement"
          erf=TRUE
          parse_string=ssvr
          Print erm
          Exit Function
        End If
        ssv(n)=ssv(n)+Mid(stri,i,1)
        i=i+1
      Loop
    End If
    n=n+1
  Loop
  REM redim does not seem to operate with negative value (to make empty string)!
  If (n>0) Then
    Redim preserve ssv(n-1) as String
    parse_string=ssv
  Else
    parse_string=ssvr
  End If
  erf=FALSE
End Function



Not sure of the function error handling and also it looks like the "Redim" can not make empty array...

Re: [Calc] Split string into an array of single words

PostPosted: Wed Nov 13, 2019 1:52 am
by JeJe
Getting rid of suplus spaces in the initial post can be done easily. Turning a string into a byte array *should* make things faster than using mid...

Code: Select all   Expand viewCollapse view


Sub test
   Dim p As String
   p1="aa bb  cc   dd"

   dim b() as byte, c as long,start as boolean
   b() = p1: c =-2

   for i = 0 to ubound(b) step 2
      if b(i) = 32 and b(i+1) = 0 then
         if start = true then
            c= c+2
            b(c) =b(i)
            b(c+1) =0
            start = false
         end if
      else
         c= c+2
         b(c) =b(i)
         b(c+1) =b(i+1)
         start = true
      end if
   next
   redim preserve b(c)
   p1= b

   a1=Split(p1," ")
   Print a1(0)
   Print a1(1)
   Print a1(2)
   Print a1(3)

End Sub



Re: [Calc] Split string into an array of single words

PostPosted: Wed Nov 13, 2019 12:40 pm
by jarkky
That looks possible, but I extended to remove the "tabular" or "\t" also:


Code: Select all   Expand viewCollapse view
Sub Test_String
  Print parse_string("""a b c"" ""\"""" c")(1)
  Print parse_string("a b c")(2)
  Print erf
  ev=parse_string(" ")
  Print lbound(ev),":",ubound(ev)
  Print erf
  ev=parse_string("a b 'c")
  Print erf
  If erf=True Then
    Print ers
  End If
End Sub

REM Function error return is defined at the global variable level.
REM This could be possible method for all the functions.
REM The user function can not stop into the error automatically or detect error line number.
REM Errors must be checked from the erf-flag. erf is operated by all commands, while
REM erc, ers and erm are filled in the function only in case erf=TRUE
REM If erc, ers and erm would be arrays, then all the errors in the program could be logged
REM into memory or a file for later inspection. Also I can not fill the "erl" and "err"
REM variables for user defined functions. That is if I have my own error message outside
REM the openoffice errors the program will not halt for that case.
REM Error handling should be available for the user defined functions also...
REM erf : Flag TRUE/FALSE (error occurred in last command or not)
REM erc : Command that produced the error
REM ers : Error name as string
REM erm : Error message as string
Global erf as Boolean
Global erc as String
Global ers as String
Global erm as String
REM The above parameters could be initialized globally, but how?

Function parse_string(stri as String) as Array
  If Not (TypeName(stri)="String") Then
    erc="parse_string()"
    erm="Input argument can be only string type!"
    ers="@wrong_argument_type"
    erf=TRUE
    parse_string=""
    Print erm
    Exit Function   
  End If
  lst=Len(stri)
  Dim ssv(30) as String
  Dim ssvr() as String
  i=1 : n=0
  Do While i<=lst
    Do While (Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9))
      i=i+1
      If i>lst Then
        Exit Do
      End If
    Loop
    If i>lst Then
      Exit Do
    End If
    If Mid(stri,i,1)="""" Then
      f=0 : i=i+1
      ssv(n)=""
      Do While i<=lst
        If Mid(stri,i,1)="""" Then
          If Mid(stri,i-1,1)="\" Then
            ssv(n)=Left(ssv(n),Len(ssv(n))-1)
          Else
            f=1 : i=i+1
            Exit Do
          End If
        End If
        ssv(n)=ssv(n)+Mid(stri,i,1)
        i=i+1
      Loop
      If f=0 Then
        erc="parse_string()"
        erm="String has no closing quotation!"
        ers="@closing_quotation_missing"
        erf=TRUE
        parse_string=ssvr
        Print erm
        Exit Function
      End If
      If i<=lst Then
        If Not(Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9)) Then
          erc="parse_string()"
          erm="After closing quotation at column "+Str(i)+" string separator is required!"
          ers="@string_sepator_missing"
          erf=TRUE
          parse_string=ssvr
          Print erm
          Exit Function
        End If
        i=i+1
      End If
    Else
      ssv(n)=Mid(stri,i,1)
      i=i+1
      Do While i<=lst
        If Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9) Then
          i=i+1
          Exit Do
        End If
        If Mid(stri,i,1)="""" Then
          If Mid(stri,i-1,1)="\" Then
            ssv(n)=Left(ssv(n),Len(ssv(n))-1)
          Else
            erc="parse_string()"
            erm="Quotation character at column "+Str(i)+" not allowed in middle of string!"
            ers="@wrong_quotation_placement"
            erf=TRUE
            parse_string=ssvr
            Print erm
            Exit Function
          End If
        End If
        ssv(n)=ssv(n)+Mid(stri,i,1)
        i=i+1
      Loop
    End If
    n=n+1
  Loop
  REM redim does not seem to operate with negative value (to make empty string)!
  If (n>0) Then
    Redim preserve ssv(n-1) as String
    parse_string=ssv
  Else
    parse_string=ssvr
  End If
  erf=FALSE
End Function



Here also even input arguments are defined as "String" -type the User should check that they come as "string"-type into the function...

 Edit: Please use the CODE tags around program code for readability and easy copying 

Re: [Calc] Split string into an array of single words

PostPosted: Tue Dec 03, 2019 7:03 pm
by KaduLeite
jari0471 wrote:Split string into an array of single words (similar to separation of "arguments" in shell command line)

Code: Select all   Expand viewCollapse view
Sub test
  Dim p As String
  p1="aa bb  cc   dd"
  a1=Split(p1," ")
  Print a1(0)
  Print a1(1)
  Print a1(2)
  Print a1(3)
End Sub

Displayed is "aa", "bb", " " and "cc".
Displayed should be "aa", "bb", "cc" and "dd".

There is 2 spaces between "bb" and "cc" and 3 spaces between "cc" and "dd" in p1.
Is there another command or different option for this command that can do separation of words (arguments)?
If a large file is read to calc sheet, it will take a long time if the splitting function is slow. Thus fast (and simple) operation is requested.



As an exercise, I think this solve the problem with a smaller and simpler code.
To ignore other elements, like points or commas, just add them to the test with AND.

Code: Select all   Expand viewCollapse view
Function SpaceSplit(p1 as String) as Array
  Dim i as Integer, a1(), a2()
  a1=Split(Trim(p1)," ")
  For i = 0 to Ubound(a1) 'For each element in a1
    If a1(i)<>"" Then 'If element is different to ""
       Redim Preserve a2(UBound(a2)+1) 'Increases the size of a2 in 1 element
       a2(UBound(a2))=a1(i) 'Add the non "" element to a2
    End If
  Next
  SpaceSplit = a2 ' Function returns a2, an array with each word
End Function


sub test
   dim s1()
   s1 = spacesplit("aa      bb   cc    dd    ee")
end sub


Kadu Leite

Re: [Calc] Split string into an array of single words

PostPosted: Wed Dec 04, 2019 12:43 pm
by JeJe
Redim Preserve is an expensive operation to have in a loop... that doesn't matter with just a few items of course...

Re: [Calc] Split string into an array of single words

PostPosted: Thu Dec 05, 2019 7:55 pm
by KaduLeite
JeJe wrote:Redim Preserve is an expensive operation to have in a loop... that doesn't matter with just a few items of course...

Je Je,

You're absolutely right.
What about that?

Code: Select all   Expand viewCollapse view
Function SpaceSplit(p1 as String) as Array
  Dim i as Integer, j as Integer, a1()
  a1=Split(Trim(p1)," ")
  Dim a2(Ubound(a1))
  For i = 0 to Ubound(a1)
    If a1(i)<>"" Then
       a2(j)=a1(i)
       j = j+1
    End If
  Next
  Redim Preserve a2(j-1)
  SpaceSplit = a2
End Function


It uses the same logic, but writes to an array with the same size as a1 and just when a2 is OK, it redim preserve with the correct size.

Kadu Leite

Re: [Calc] Split string into an array of single words

PostPosted: Thu Dec 05, 2019 10:43 pm
by JeJe
Yeah, its much better to redim outside the loop once at the end.

If anyone's interested in these things... OOBasic is very similar to Microsoft's now obselete Visual Basic 6 (in the non-office suite aspects)... and there is an old but very good (and still existing!) site which looked at the fastest ways to code various functions in Visual Basic...

http://xbeat.net/vbspeed/