[Solved] Creating text search formula

Discuss the spreadsheet application
Post Reply
Enlance
Posts: 2
Joined: Sun Sep 20, 2009 12:56 pm

[Solved] Creating text search formula

Post by Enlance »

I've been trying to create a search that counts all words in my search. I have different sentences in each cell, but want to count all cells including one specific word.

Can anyone help me with this. I've tried COUNTIF, DCOUNTA and all sorts of combinations. But Im still not making it work as I want to.

Any help would be of interest.

Thank you.
Last edited by Enlance on Mon Sep 21, 2009 4:33 pm, edited 1 time in total.
Open Office.org 3.1.1 (Build:9420)
On Windows 7 RC1
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Creating text search formula

Post by Villeroy »

Tools>Options...Calc>Calculate: Turn OFF all the check boxes on that options page.

A1: I've been trying to create a search that counts all words in my search. (first sentence of your request)
B1: SEARCH (a word in A1)
C1: =MATCH($B1;$A$1:$A$99;0) => 1 (the position of a cell within A1:A99 where this row's value in B ($B1) is matched for the first time.
Turn on the "match whole cell" option and the formula returns #NA which is a special error. It indicates that you did everything correctly, but a matching cell value is simply not there.
Turn on the above plus "regular expressions" and
=MATCH(".*" & $B1 & ".*";$A$1:$A$99;0) => 1 (position of a cell within A1:A99 where the regular expression (search pattern) *.SEARCH.* matches within the whole cell value of this row's value in A ($A1).

Lookup functions match whole cell values in search vectors (rows or columns of cells), SEARCH and FIND match parts within single cells values. The D-functions you mention match entire rows in a list.

D1: =FIND($B1;$A1) => #VALUE the position where this row's value in B ($B1) is found within this row's value in A ($A1). This returns #VALUE for no match since FIND always works case-sensitively. Put "search" into B1 and the result turns to 30, the position of the "s" in "search".

E1: =SEARCH($B1;$A1) => 30 the position where this row's value in B ($B1) is found within this row's value in A ($A1). This returns #VALUE in case of no match. SEARCH always is case-insensitive while supporting pattern matching.

No, all this does not count words as you can evaluate easily when searchin for the word "a". Put an a in B1.
Both, FIND and SEARCH return 24, which is the position of the "a" within the word "search".
Since SEARCH supports pattern matching:
F1: =SEARCH("\<"&$B1&"\>";$A1) with "regular expressions enabled" returns 28, which is the position of the word "a" in this row's column A ($A1). The search value is concatenated from \< (start of word), the value in B and \> (end of word), giving \<a\>
Regexes can "see" word boundaries. Lookup "regular expressions" in the online help and refer to http://wiki.services.openoffice.org/wik ... ns_in_Calc

But still we do not count how many words "a" are in the sentence.
Put another word "a" in A1 or simply copy down the first row and continue with
A2: I've been trying to create a search that counts a word in my search.
B2:F2 copied down from B1:F1
Copy F2 to G2 and modify it like this:
G2 =SEARCH("\<" & B2 & "\>";$A2;F2+1) the additional F2+1 adds 1 to the position retrieved in the left neigbour(F2) and searches this row's value in A ($A1) for regex "\<a\>" behind the F1 position. The result is 49.
http://wiki.services.openoffice.org/wik ... H_function

Knowing how to SEARCH words in one cell within another cell's value, we could put rather long text with many words "a" into a cell, drag the SEARCH formula to the right until it returns #VALUE (no more matches) and count the retrieved position numbers, which would be the word count.
Z2 =COUNTIF(ISNUMBER($F2:Y2);1) [entered as array formula with Ctrl+Shift+Enter rather than Enter] counts numbers from this row's cell in F ($F2) until the left neigbour (Y2).

There is a rather nifty technique to count some letter, say "a", by means of a single , sophisticated array formula. But I can not find out yet how to perform a similar trick in conjunction with a regex for the word "a".
C1 =COUNTIF(MID($A1;ROW(OFFSET($A$1;0;0;LEN($A1)));1);$B1) [entered as array formula with Ctrl+Shift+Enter rather than Enter] counts the ocurrences of a single letter in this row's column B ($B1) within the value of this row's column A ($A1).
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
JohnV
Volunteer
Posts: 1585
Joined: Mon Oct 08, 2007 1:32 am
Location: Kentucky, USA

Re: Creating text search formula

Post by JohnV »

Or at Tools > Options > Calc > Calculate, check "Enable regular expressions ..." FWIW, "Case Sensitive" doesn't seem to matter here.

=COUNTIF(B1:C9;".*SomeWord.*")

This will not count multiple occurrences of SomeWord in a cell but you didn't seem to ask for that.
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Creating text search formula

Post by Villeroy »

JohnV wrote:This will not count multiple occurrences of SomeWord in a cell but you didn't seem to ask for that.
:ouch:
OK, I missed the point (again).
:oops:
Let's talk about COUNTIF and regexes then:
=COUNTIF(B1:C9;".*SomeWord.*") with regex option, but option "whole cell"=off, works in the same way as
=COUNTIF(B1:C9;"SomeWord") since regexes match anywhere within a string unless the position is specified more explicitly.
With "whole cell"=on, ".*SomeWord.*" can be used to override that option.

I prefer =COUNTIF(B1:C9;"SomeWord") with both options, regex and "whole cell" being turned off.
Reason: If you are not so familiar with regexes and you search for anything with braces and other special chars you lose, particularly if the search criteria are calculated values (match leftmost x chars from some cell in some range).
In OOo regexes all types of braces, backslash, and any of ^?.+*$ are characters with a special meaning.
Some people search for things like "+++special offer+++" or "(by the way)" including the literal "+" and braces.

Neither "SomeWord" nor ".*SomeWord.*" match words as I demonstrated with SEARCH matching the word "a" rather than the letter "a".
In order to match strings within word boundaries, you've got to use regexes with \< and \>
Partial comparison of "the" (or whole-cell comparison of regex .*the.*) matches "the" within "thermal" as well as in "cathedral". Regex \<the\> matches only "the" as a word.

And since we are using regexes with word boundaries, we can add the jokers to override the "whole cell" option:
.*\<the\>.*
Full translation: Match any single character (the dot), any count of the preceeding item or none (the asterisk), followed by the start-of-word boundary \<, letters t h e, followed by a end-of-word boundary \> and any count of any character (again the .*).
In formula context: =COUNTIF(B1:C9;".*\<the\>.*")
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Creating text search formula

Post by Villeroy »

A simple userdefined STRINGCOUNT function in plain Basic. It calls FIND or SEARCH recursively in order to return the count of a string within another string.
[Tutorial] How to install a code snippet. Cell functions have to reside somewhere in library "Standard".
Count FINDs of A1 in B1: =STRINGCOUNT(A1;B1;0)
Count SEARCHes of A1 in B1: =STRINGCOUNT(A1;B1;1)

Code: Select all

REM  *****  BASIC  *****
Global g_FunctionAccess

Function STRINGCOUNT(swhat,swhere,bRegex)
REM count how many times swhat is in swhere using SEARCH with regex support
REM or case sensitive FIND
REM "aa" is matched 2 times in "aaa" (at pos 1 and pos 2)
on error goto ExitErr
	s = cStr(swhat) : v = cStr(swhere) : b =cBool(bRegex)
	if b then
		sF = "SEARCH" 'case insensitive regex pattern
	else
		sF = "FIND" 'case sensitive literal
	endif
	oFA = getFunctionAccess()
	oFA.RegularExpressions = b
	m = 0
	STRINGCOUNT = 0
	do
		m = oFA.callFunction(sF, Array(s, v, m +1))
		STRINGCOUNT = StringCount +1
	loop
ExitErr:
End Function

Function getFunctionAccess()
if isObject(g_FunctionAccess) then
	getFunctionAccess = g_FunctionAccess
else
	g_FunctionAccess = createUnoService("com.sun.star.sheet.FunctionAccess")
	getFunctionAccess = g_FunctionAccess
endif
End Function
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
JohnV
Volunteer
Posts: 1585
Joined: Mon Oct 08, 2007 1:32 am
Location: Kentucky, USA

Re: Creating text search formula

Post by JohnV »

OK, I missed the point (again).
But not very often!
Enlance
Posts: 2
Joined: Sun Sep 20, 2009 12:56 pm

Re: [Solved] Creating text search formula

Post by Enlance »

Thank you so much for all your generous help with this issue.

Turns out it was as easy as JohnV explained.. Thank you.

When would MS give this service with Excel issues? Hahahaa
Open Office.org 3.1.1 (Build:9420)
On Windows 7 RC1
Post Reply