[Solved] CSV : counting number of lines

Keyboard macros or custom scripts

[Solved] CSV : counting number of lines

Postby Mr.Dandy » Sun Aug 30, 2020 3:53 pm

Hello forum,

I have a huge CSV file with more 1 million of lines.
If I use a loop to read it, this take a long time.

Do you know a way to count total of lines instantanely?
Last edited by Mr.Dandy on Wed Sep 02, 2020 11:02 am, edited 1 time in total.
OpenOffice 4.1.10 - Windows 10
User avatar
Mr.Dandy
 
Posts: 339
Joined: Tue Dec 11, 2012 4:22 pm

Re: CSV : counting number of lines

Postby JeJe » Sun Aug 30, 2020 5:22 pm

Store a header at the start of the file that records the number of lines - whenever you modify the file you update it.
Openoffice 4.1.6
Windows 8
JeJe
Volunteer
 
Posts: 1567
Joined: Wed Mar 09, 2016 2:40 pm

Re: CSV : counting number of lines

Postby RoryOF » Sun Aug 30, 2020 5:45 pm

In this posting
https://www.codespeedy.com/count-the-number-of-lines-in-a-text-file-in-python/
you will find code for line-counting in files.
Apache OpenOffice 4.1.10 on Xubuntu 20.04.2 (mostly 64 bit version) and very infrequently on Win2K/XP
User avatar
RoryOF
Moderator
 
Posts: 32697
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: CSV : counting number of lines

Postby JeJe » Sun Aug 30, 2020 5:55 pm

You could get an instant answer if you used fixed width fields instead of CSV - simple calculation from the size of the file.
Openoffice 4.1.6
Windows 8
JeJe
Volunteer
 
Posts: 1567
Joined: Wed Mar 09, 2016 2:40 pm

Re: CSV : counting number of lines

Postby Villeroy » Sun Aug 30, 2020 6:25 pm

Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Villeroy
Volunteer
 
Posts: 29909
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: CSV : counting number of lines

Postby Mr.Dandy » Wed Sep 02, 2020 11:01 am

Thanks folks
I try python solution.
Powershell is only Windows-compatible.
OpenOffice 4.1.10 - Windows 10
User avatar
Mr.Dandy
 
Posts: 339
Joined: Tue Dec 11, 2012 4:22 pm

Re: [Solved] CSV : counting number of lines

Postby Villeroy » Wed Sep 02, 2020 11:47 am

On Mac, Linux and WIndows with Linux tools:
Code: Select all   Expand viewCollapse view
wc -l file.csv
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Villeroy
Volunteer
 
Posts: 29909
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [Solved] CSV : counting number of lines

Postby Villeroy » Wed Sep 02, 2020 5:12 pm

Just in case you access database data with the database component:
Code: Select all   Expand viewCollapse view
Sub test_RecordCount()
   print getRecordCount(sDB:="Bibliography", sTBL:="biblio", sUSR:="", sPWD:="")
End Sub

Function getRecordCount(sDB, sTBL, sUSR, sPWD) AS Long
on error goto exitErr
   dbc = CreateUnoService("com.sun.star.sdb.DatabaseContext")
   db = dbc.getByName(sDB)
   s = "SELECT COUNT(*) AS ""C"" FROM """& sTBL &""""
   con = db.getConnection(sUSR, sPWD)
   stm = con.prepareStatement(s)
   q = stm.executeQuery()
   q.next()
   n = q.getLong(1)
exitErr:
   con.close()
   getRecordCount = n
End Function

I believe this works with any table in any type of database connection. Tested with Bibliography of type dBase and HSQL2.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Villeroy
Volunteer
 
Posts: 29909
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [Solved] CSV : counting number of lines

Postby Villeroy » Wed Sep 02, 2020 6:41 pm

OK, the above method is certainly the most complicated method to count lines of a csv file. However, it would work with any table that belongs to any type of database connection.

This is how to read lines with any descendant of (Visual) Basic since the early 90ies:
Code: Select all   Expand viewCollapse view
Sub test_RecordCount()
   Msgbox getLineCount("file:///tmp/Untitled.csv")
End Sub

Function getLineCount(sURL As String) As Long
Dim n As Long, s As String
   FN = Freefile()
   Open sURL For Input As #FN
   while not eof(#FN)
      n = n+1
      line input #FN, s
   wend
   Close #FN
   getLineCount = n
End Function

Why does nobody read any books before writing programs?
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Villeroy
Volunteer
 
Posts: 29909
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [Solved] CSV : counting number of lines

Postby JeJe » Wed Sep 02, 2020 6:58 pm

Do you know a way to count total of lines instantanely?


I took the issue to be the last word in the question. The answer is, for a million line CSV file, there isn't one. Get a supercomputer (maybe...)

Edit:
My other thought was a hybrid of a CSV file/fixed width record file... where you make all the rows of equal size by adding a last record that pads the row size to a fixed width. You could then find the count instantly from the file size and load it as a CSV... but you'd need to adjust the last record to any other record changes and so on and that would be some work...
Openoffice 4.1.6
Windows 8
JeJe
Volunteer
 
Posts: 1567
Joined: Wed Mar 09, 2016 2:40 pm

Re: [Solved] CSV : counting number of lines

Postby Villeroy » Wed Sep 02, 2020 8:56 pm

On my notebook, the Basic loop takes 7 seconds to count a million lines. wc -l takes a fraction of a second.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Villeroy
Volunteer
 
Posts: 29909
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [Solved] CSV : counting number of lines

Postby JeJe » Wed Sep 02, 2020 9:46 pm

Villeroy wrote:On my notebook, the Basic loop takes 7 seconds to count a million lines. wc -l takes a fraction of a second.


That's meaningless without the line/file size (?)

(I do suspect I'm going to have to concede here... I ran some tests with my laptop and got fed up with the wait... but mine's a Pentium 2020M)
Openoffice 4.1.6
Windows 8
JeJe
Volunteer
 
Posts: 1567
Joined: Wed Mar 09, 2016 2:40 pm

Re: [Solved] CSV : counting number of lines

Postby JeJe » Wed Sep 02, 2020 10:04 pm

MrProgrammer wrote:
$time perl -nE '}{say " $. $ARGV"' Million.txt


Don't know perl - I'm guessing that creates a file? With a million unrealistically small 1 record ("say ") size lines?

Edit, no say looks to be a command? Anyway - a million lines is meaningless without knowing the file size.
Openoffice 4.1.6
Windows 8
JeJe
Volunteer
 
Posts: 1567
Joined: Wed Mar 09, 2016 2:40 pm

Re: [Solved] CSV : counting number of lines

Postby Villeroy » Wed Sep 02, 2020 10:35 pm

Villeroy wrote:On my notebook, the Basic loop takes 7 seconds to count a million lines. wc -l takes a fraction of a second.

My file is 97MB in size. About 100 chars per line.
wc -l takes 0.14 seconds
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Villeroy
Volunteer
 
Posts: 29909
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [Solved] CSV : counting number of lines

Postby MrProgrammer » Wed Sep 02, 2020 10:46 pm

Mr.Dandy wrote:Do you know a way to count total of lines instantanely?
One cannot accomplish a task in zero time. How quickly is "instantly"? Asking in an OpenOffice forum about how to count the lines in a file suggests to me that this is a case of XY Problem.

JeJe wrote:Don't know perl
Perl is a general-purpose Unix scripting language originally developed to make report processing easier which is also available on Linux, MacOS, and Windows. Many tasks can be accomplished quickly and easily with short programs. Perl is included with MacOS, and is included with many Linux distributions.

JeJe wrote:Anyway - a million lines is meaningless without knowing the file size.
Yes, I should have given the file size. Million0004.txt has 4-byte lines. Million1024.txt has 1K lines. On my machine, the larger file takes about a tenth of a second with wc and less than a second with perl.

$ ls -ln Million*.txt
-rw-r--r-- 1 503 20 4000000 Sep 2 15:25 Million0004.txt
-rw-r--r-- 1 503 20 1024000000 Sep 2 15:22 Million1024.txt
$ for f in Million*.txt; do time wc -l $f; done
1000000 Million0004.txt

real 0m0.006s
user 0m0.003s
sys 0m0.002s
1000000 Million1024.txt

real 0m0.713s
user 0m0.598s
sys 0m0.114s
$ for f in Million*.txt; do time perl -nE '}{say " $. $ARGV"' $f; done
1000000 Million0004.txt

real 0m0.086s
user 0m0.080s
sys 0m0.004s
1000000 Million1024.txt

real 0m0.902s
user 0m0.727s
sys 0m0.171s

JeJe wrote:Store a header at the start of the file that records the number of lines - whenever you modify the file you update it.
Or, if speed is of the essence, keep the count in a separate file which is created (using one of the methods above) by the process which generates the CSV file. Then the lines only need to be counted once and any other times the count can be accessed quickly.
wc -l <Million1024.txt >Million1024_Count.txt
Mr. Programmer
AOO 4.1.7 Build 9800 on MacOS 10.14.6.   The locale for any menus or Calc formulas in my posts is English (USA).
User avatar
MrProgrammer
Moderator
 
Posts: 4038
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: [Solved] CSV : counting number of lines

Postby Villeroy » Thu Sep 03, 2020 12:12 am

Code: Select all   Expand viewCollapse view
$ time python countlines.py
1051178

real   0m0,286s
user   0m0,216s
sys   0m0,069s

countlines.py:
Code: Select all   Expand viewCollapse view
with open('/tmp/test/test.csv') as my_file:
    print(sum(1 for _ in my_file))
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Villeroy
Volunteer
 
Posts: 29909
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany


Return to OpenOffice Basic, Python, BeanShell, JavaScript

Who is online

Users browsing this forum: No registered users and 0 guests