Josip Zohil, Koper, Slovenija,
February, 2015
Asynchronous reading of a
file in FoxPro
Visual
FoxPro (VFP) is usually synchronous. There are multiple functions
to read a file in VFP. Usually we open a file and read it
line by line or chunk by chunk of bytes. The question is why should we want to
solve a problem asynchronously? The synchronous execution blocks the computer
CPU and the VFP screen is frozen. On the other side, it is possible to perform
a long running task (reading a file) without blocking in an asynchronous way.
In this article we shall read a file using a collection of timers. We
shall hide the complex asynchronous file reading behind a simple
interface.
Reading a file
You
can read a VFP file in various ways:
·
Entire file,
·
Line by line,
·
Chunk of
bytes.
Using
VFP functions to read a file in a synchronous way result in a
blocking application (frozen screen, VFP application is unresponsive to events,
VFP can't render on the screen, timer events are waiting in the event
loop...). Reading a file of 170 MB and processing its content results in a
blocking application for two (2) minutes or more!
Optimizing
the reading and processing speed may result in an improvement of, for
example, 100%, but also in the optimized solution the screen remains blocked
for thirty (30) seconds!
The
similar results we obtain by improving the hardware (processor, disk
speed...).
The
main problem is in the way our programs are written: the synchronous execution.
On the other side, writing asynchronous code may result in a relatively
complicated and error prone code. In dynamic programming languages and event
driven systems (like VFP) it is possible, in a relatively easy way, create user
defined functions (UDF) and other constructions that run in an asynchronous
way. The users (programmers) of this UDF write asynchronous code in a similar
way as synchronous (with another interface). The asynchronous complexity is
hidden behind the UDF interface.
The
benefits of using this asynchronous constructions are a non blocking screen
(reactive VFP) and faster speed of execution. Sometimes, there may be a
penalty: a slower execution; VFP executes also other VFP jobs (mouse
move, key press...) or the operating system (OS) gives other work to the CPU.
In the synchronous execution (blocking) VFP executes (mostly) one job and the
OS »respect« this VFP »priority«!
As
said, asynchronous code is sometimes (ideal condition) faster than the
synchronous code. That means, hardware is not a bottleneck. We can improve the
speed of reading, processing a file and unblock the VFP application. The
cause of blocking is the way our programs are written. The screen is frozen for
a minute or more because of bed implemented VFP programs!
Event driven reading of a
file
Let
us observe the process of reading a file chunk by chunk (a similar
explanation is also for a line by line reading).
VFP
execution is event driven: VFP »iterates« over its event loop (similar to
a do while loop iteration over the records of a DBF file). It is a »never
ending« loop (a stream of event handlers). Each loop iteration is a
»tick«. Ticks executes functions (event handlers) in the event queue. The
functions were pushed in the event loop by the event emitter (input/output
operation, rendering to the screen, timers...). Each
event pushs in the event loop the corresponding event handler (a
function).
A read of
a chunk of a file is an event. We have a sequence of events (eventually never ending).
The event of reading a chunk has it's event handler.
Note.
Theoretically, we can read some chunks of the file, then write lines
in it, read lines and so on.
When
VFP finds our code to »read a chunk«, it puts the handler (usually a
function) in the event loop; pay attention, it does not execute this code
immediately: the code is pushed in the event loop. Later, when VFP is not busy
(after all other code executes), it executes this function. The event handlers
in the event loop are executed in a serial way. The same function
(p.e. UDF) may be pushed in the event loop in various ways: by a VFP
program or by a timer...
For
example, suppose we have a program:
Function blockMe()
Fun()
Gun()
Iun()
endfunc
VFP
pushed in the event loop the blockMe function. When it starts
executing, it runs Fun, blocks the VFP, then it executes Gun, VFP is still
blocked, then it executes Iun, VFP is blocked. VFP remains blocked during
the execution of these three functions; the function blockMe is all
the time on the VFP call stack and VFP is unable to perform other
actions.
Suppose
now, each of these functions are fired by a timer (three timers). When the
first timer fires, the Fun is put in the event loop (first phase), when the
second is fired the Gun enters the event loop.... When Fun executes (a second
phase) it enters the Call Stack, when it terminates, it goes out of the Call
Stack, after that Gun enters the Call Stack.... Between the two entering of the
Call Stack, the VFP is able (in certain condition) to react to other
events (user actions, timers events ...).
Let
us conclude:
The
function blockMe iterates over three functions, during the execution,
it is on the Call Stack and blocks the VFP.
The
execution with the timers has three visits on the Call Stack. The program
»iterates« over a collection of three timers; the VFP is blocked for three
times for a shorter period.
From
the point of view of a CPU, reading a line (of a file) from a disk is a
waiting operation: a CPU is waiting the disk operation to terminate (seek the
line position, read the data from the disk...). During this waiting,
the CPU may perform another operation.
Reading a file using a
collection of timers
In
the preceding section we saw that we have two iterations over a collection
of:
·
Timers and their
events (each timer my fire multiple events),
·
The file's
chunks.
There
are a one to one correspondence between the timer events and reading a
chunk.
We
are going to write a program that:
·
Creates a 179 MB
file with a name test.txt and random content (see, W.I.P.),
·
Creates a cursor to
store the results,
·
Spawn a collection
of 254 deferred objects (deferedARF) with the corresponding timers (Mt),
·
Creates a file
object (rf) to maintain state. It has a public interface, a
function nextChunk and get-functions,
·
Eventually
stop the iteration (stop reading a file).
We
can look at the timer events as a sequence, we iterate over. At each
»iteration« the function nextChunk is called: read a line,
process the data and eventually insert the data in a cursor. The file object
(oFile) is manipulated by 254 timers in an ordered way (they access
the oFile object one after the other - FIFO).
Asynchronous read of a
file
Clear
Set Escape
On
Close
Databases All
Local
lni,lcs,lcs2
Private loCol As Collection,mt,notimers,tcFileName,ttt,
buffSize
tcFileName =
'test.txt'
* Generate
an 174 MB text file
lcs =
Replicate("Coco jambo",80) + Chr(13) + Replicate("tatu rata",50)
+ Chr(13) + "tatu rata coco " + Replicate("papa
",80) + Chr(13)
lcs2 =
Replicate(lcs,10000)
Strtofile(lcs2,m.tcFileName)
For lni =
1 To 2
&&10 174 MB , more than 170 MB
Strtofile(lcs2,m.tcFileName,1)
NEXT
loCol = Createobject("Collection")
&& a collection of defered objects (timers)
notimers=254
&& 8,16,128 number of timers
buffSize=64000
*tickms=1
&& 1 ms interval
mt=0
ttt=Seconds()
OFORM=Createobject("mFORM")
OFORM.Show()
Read Event
Return
In
the first part of a program we have:
·
Some private
variables,
·
A group of commands
that creates a text file test.txt,
·
A collection
object. Later we shall populate it with deferred objects (and timers),
·
A form with two
command buttons to: start and stop the asynchronous file read.
From
a file class rf we create an object with a read only
interface:
Define Class rf As Custom
Protected
f_h,tcSearchText,curName,Lineno,endOfFile,fthen,Thisf
*buffSize=64000
Thisf=Null
fthen=""
f_h=""
tcSearchText=""
curName=""
fileSize=0
charReaded=0
Lineno=0
endOfFile=.F. && resolved
Procedure
Init(f_h,tcSearchText,curName,Thisf,fthen)
This.f_h=f_h
This.fileSize=
Fseek(This.f_h, 0, 2) && Move
pointer to EOF
Fseek(This.f_h, 0) && Move pointer to BOF
If This.fileSize
<= 0 && Is File
empty?
=
Fclose(This.f_h) && Close the
file
?"Error
in file-open."
Throw
"Error in file-open. Empty file."
Endif
This.tcSearchText=tcSearchText
This.curName=curName
If
Vartype(fthen)="C"
This.fthen=fthen
Endif
If
Vartype(Thisf)="O"
This.Thisf=Thisf
Endif
Endproc
Procedure getEOF
Return
This.endOfFile
Endproc
Procedure getF_h
Return This.f_h
Endproc
Function setThen(F)
If This.resolved
&& don't alow changes after resolved
Return
Endif
This.fthen=F
Endfunc
Function getThen()
Return This.fthen()
Endfunc
* iterates over a chunks "data structure", is called from a
timer
* put "the next function" in the event loop
Procedure
nextChunk && read a current
chunk
If This.endOfFile
Return
Endif
Activate Screen
Local
al,rl,gcString,lenGc,Back
gcString=Fread(This.f_h,buffSize) && read a chunk of file
lenGc=Len(gcString)
al= Alines(aMyArray,
gcString) && push a string in
an array of lines
al=Alen(aMyArray)
* ?"len",al,lenGc,This.charReaded,this.fileSize
* ii=0
*If
This.charReaded+lenGc<This.fileSize
&& not EOF
If
!Feof(This.f_h)
rl=al-1 && If not end of file, ignor the
last line
Else
rl=al && If end of file, include
also the last line
Endif
For i=1 To rl
*AT(cSearchExpression, cExpressionSearched [, nOccurrence])
This.Lineno
= This.Lineno + 1
content=Upper(aMyArray[i])
*IF AT("COCO",content)>0
If
This.tcSearchText $ Upper(m.content)
m.lineno=This.Lineno
Insert
Into (This.curName) From Memvar
Endif
Endfor
If
!Feof(This.f_h) && not end of
file
Back=Len(aMyArray[rl+1])
&& last "line" is dropped
This.charReaded=This.charReaded+buffSize-Back && update state
*fs=Fseek(This.f_h,-Back,1)
&& move a pointer to the end of the last readed line
fs=Fseek(This.f_h,This.charReaded) && move a pointer to the end of the
last readed line
*?"Back",back,fs
Else
This.charReaded=This.charReaded+lenGc && update the state
Endif
If Feof(This.f_h)
And Not This.endOfFile &&
when end of file is reached
This.endOfFile=.T. && update state
Fclose(This.f_h) && close a file
Activate
Screen
?"End
loop:",Seconds()-ttt,Recno()
If
Len(Trim(This.fthen))>0 And Vartype(This.Thisf)<>"X"
&& optional then function
Select
(This.curName) && bind to a grid
recordsource
Go
Top
Try
With
This.Thisf && bind the object
and the function
Evaluate(This.fthen+"()") && bind to thisform
Endwith
Catch
To ex
?ex.Message
Endtry
Endif
Endif
DoEvents && pause and pass control to the
VFP event loop
Endproc
Enddefine
This class has the state variables LineNo, charReaded, fileSize and endOfFile.
The first is the iteration index, the second register the number of read
characters, the third is a constant, the file size and the fourth register the
end of file (and terminates the iteration).
The initial values are: a file handle number (f_h), a
string to search (tcSearchText) and the cursor name (curname); in it we insert
the results. The second pair of initial values (thisf and fthen) are the
function to execute, when the read of file terminates and the object of which
this function is a member. This pair is optional.
A procedure nextChunk is called on each line
until the end of file is reached (endOfFile=.T.).
A command Fread(This.f_h,buffSize) reads a
chunk of bytes (64 k) and put the
content in a variable gcString. The
command Alines(aMyArray,
gcString) split the string gcString in an array aMyArray. In each array element
is a line except in the last array's element. It is usually only a part of a
line. We count the number of characters in this element:
Back=Len(aMyArray[rl+1]) and reduce the number of read characters
This.charReaded=This.charReaded+buffSize-Back. We simply ignore the characters
of the last line. The exception is the last read chunk (end of file).
We iterate over the
array elements. If tcSearchText is inside a string content, the
line number and the content are inserted in the cursor. We move back the pointer of the read file
characters.
If the end of file is reached Feof(This.f_h) (the first timer), we close
the file. Then we check, if the fthen function is present. In such case we
evaluate it.
After that a Doevents is called: it suspends
execution of a current function and pass control to a VFP event loop. When VFP
returns it resume immediately after this command. If the end of file
is reached (Feof(This.f_h)=.T.), we close a file and reports the end of
computation (the end of the iteration).
Other public functions of this class are of access
type.
The
timer class is a usual timer with a very small interval of 1 ms. It's
specific is a mrun function; inside it, we can insert a filter.
Define Class mt As Timer
Interval=1
oparent=Null
Procedure
Init(oparent)
This.oparent=oparent
Activate
Screen
This.Enabled=.T.
mt=mt+1
?"START",mt
Endproc
Procedure
mrun(x) && insert code for eventual filter of events
If
This.oparent.getOfileEOF()
This.Enabled=.F.
This.Interval=0 && stop the timer
Endif
Return x
Endproc
Procedure Timer
This.Enabled=.F.
This.mrun()
This.Enabled=.T.
Endproc
Enddef
If
the iteration reachs the end of a file (This.Parent.getOfileEOF()), we
stop the timer (interval=0).
Note.
To the function mrun, we could pass other filter condition in a form of a
function.
When
the timer event happens, we call the function mrun. Later we shall bind
this function to the nextChunk function of the file object.
We
will run the timer with the smallest possible time interval of
1 ms. We can »drop« this interval in smaller pieces using this
technique: We push on the event loop multiple timer event handlers with a time
interval of 1 ms. For example, we push a first timer T1 on the event
loop (for example, using a command T1.enabled=.T.). In the same way we push the
second timer in the event loop. The time interval between the two timers
initialization is very small (measured in NS – nano seconds).
After 1 ms it fires the event of the T1 timer and it's event
handler (function) is pushed in the event loop. Immediately after that the
timer T2 fires and its event handler (call back function) is pushed in the
event loop. VFP executes serially the event handlers (call back functions) in
the event loop. For example, when it is not busy it executes the T1 callback
function, after that the eventual functions in the event loop and after
that the T2 callback function. The time interval between the execution of
the two event handlers may be very small (measured in NS). In one ms (one
millisecond) two events can fire. Using sixtyfour (64) timers, in one
millisecond, sixtyfour events may fire.
This
pushing and execution of the functions in the event loop is relatively cheap
operations.
As
you see, this is not a strictly imperative style of programming: Using the
timers and doevents we only partially »dictate« the order of
execution: we give a VFP engine the possibilities to insert other event
handlers in the event loop, in »parallel« with our timers events.
*This
class compose a file object and the timer; we shall spawn a collection of
"deferred" objects
Define Class deferedARF As Custom
Protected
resolved,ofile
*Add Object mt As mt
mt=Null
resolved=.F.
ofile=Null
Procedure
Init(ofile,tickms) && ticks in
ms
This.ofile=ofile
This.mt=Createobject("mt",This)
This.mt.Interval=tickms
Bindevent(This.mt,"mrun",This.ofile,"nextChunk",1)
Endproc
Procedure
getOfileEOF
Return
This.ofile.getEOF()
Endproc
Function
getOfileF_h()
Return
This.ofile.getF_h()
Return
Function
getResolved()
Return
This.resolved
Endfunc
Enddef
To
the object of type deferedARF we pass a timer (MT) and the file state
object (rf). We bind the two added object using a bindevent: when a timer
event fires, it calls a mrun function and after that
a nexLine is executed. Using this construction, on each timer event
a nextChunk is called: we iterate over the groups of file's lines.
(Using the timer(s) we iterate over the file).
The
role of this object is also to stop a timer when the end of file (test.txt)
is reached. This filter is inside a function mrun.
The mrun function is not called when we
initialize the object, but later (in certain cases never), when the timer event
happens, so the name deferred. Also, when mrun fires, nothing useful
happen. When the binded function nextChunk executes we
eventually obtain a useful »result« (a record in a cursor). In case of eventual
errors, we have no meaningful result (In this article we don't consider this
case). The eventually inserted record in a cursor curName is called
also a future or a promise value.
* all the
complexity is inside this function
Function
readAsync(filename,tcSearchText,curName,notimers,oformR,fthen)
tickms=1
&& 1 ms
Activate Screen
*?filename
ttt=Seconds()
Select (curName)
Delete All
Local f_h
Try
f_h=Fopen(filename,10)
ofile=Createobject("rf",f_h,tcSearchText,curName,oformR,fthen)
Catch To ex
* ?ex.Message
Throw ex
Endtry
Local i
i=loCol.Count
Do While i>0
loCol.Remove(i)
i=i-1
Enddo
*** Start (Add) a collection of deferred objects (timers)
For i =1 To notimers && spawn the timers
loCol.Add(Createobject("deferedARF",ofile,tickms))
loCol(i).Name="T"+Ltrim(Str(i))
Endfor
Return ofile
Endfunc
The
function readAsync is an interface function for reading a file
in an asynchronous way and process the content. It creates a file
object, ofile and pass it the parameters: a file handle (a reference
to a file), a text to search and a cursor name. We have also a pair of optional
parameters: oFormR, fthen.
Fthen is a function that will be executed after the file is readed. Fthen is a
member of the object oFormR.
Note. We
can enrich this construction and pass to the function also a function to catch
eventual errors and a function to execute when the iteration terminates ( a
then function or a promise function), for example:
readAsync(filename,tcSearchText,curName,notimers, oformR,fthen, ferror).
This example (ferreor) is not presented in this article.
The
parameter noTimers means the number of timers we shall spawn to
iterate over a file. I obtained a good performance with 4-128 timers (It seams
the number of timers doesn't influence the speed of reading a file). In the
example in this article we have a collection of a relatively large number of
timers (254).
In
the last part of this function, we add the timers to the lCol object
created in the first part of this article.
Define Class mform As Form
Add Object cmdStart
As CommandButton With Left=10, Caption="Start"
Procedure
cmdStart.Click
Local
ofr,curName
curName=Sys(2015)
Create
Cursor (curName) (Lineno i, content m)
Local
oformR
oformR=Thisform
Try
*read a file, filter using "COCO", put results in curName
and use a number of notimers
ofr=readAsync(tcFileName,
"COCO",curName,notimers,oformR,"fthen")
*ofr=readAsync(tcFileName, "COCO",curName,notimers)
&& without a then function
Catch
To ex
Activate
Screen
?ex && a bad error management!!!, pass a
function
Throw
ex
Endtry
Return
ofr
Endproc
Add Object cmdStop
As CommandButton With Left=200, Caption="Stop"
Procedure
cmdStop.Click
Activate
Screen
?"stoped",Recno(),loCol.Count,loCol(1).getOfileF_h()
For
i =1 To loCol.Count && spawn the
timers
loCol(i).mt.Interval=0
Endfor
If
loCol(1).getOfileF_h()>0
Fclose(loCol(1).getOfileF_h())
Endif
*Thisform.Release()
Endproc
Enddefine
The mform class
creates an objects of type form with two buttons: to start reading a file
in an asynchronous way; and a button to stop this iteration.
Function fthen()
*SELECT (.curName)
Activate Screen
If Vartype(.grid1)="U"
.AddObject("grid1","GRID")
.grid1.Move(2,20)
Endif
.grid1.RecordSource=Alias()
.grid1.Visible=.T.
Endfunc
This
function will be eventually executed when the process of reading a file will
terminate. It will add (bind a grid to the form object) a grid and it's recordsource. A »value« RecordSource
is a promise. This function is optional. We can read the file without this »promise«
function.
Tests
When
we run the program presented at the beginning of this article, we obtain time
intervals in the range 11-12 seconds. If we read this file using the
usual synchronous code, we obtain the execution times in the range
400-500 seconds (1:40) and 18 seconds (1:1.5) if the file is opened in buffered
mode (FOPEN (FileName)). The importance of
this measurement is very limited as our main goal was to unblock
VFP. It shows us that the asynchronous execution unblock the VFP
application with a significant reduction in time execution.
Note.
The fastest execution is also a consequence of a larger buffer size (64000
characters).
Conclusion
The
asynchronous version doesn't block VFP, it is reactive during a twelve
(12) seconds time interval: in »parallel« it reads a large file and
executes other user actions.
The
asynchronous read of a file has these characteristics:
·
A function to call
the asynchronous execution is simple (simple interface):
ReadAsync (tcFileName,
"COCO", curName, no timers).
·
We use this
function in a similar way as the synchronous version.
·
There are no
multithreading problems, no dll registration.
·
The asynchronous
version may execute faster than the synchronous.
VFP
is event driven, its event loop executes in an asynchronous way (by default).
During a process of a synchronous reading of a file, VFP block, because of
bad implemented programs. VFP programmers have the possibilities to write
programs that doesn't froze a VFP application for ten seconds or more. The
example of this article demonstrates this.