Saturday, March 28, 2009

C# webbrowser control - Synchronization for Page navigation/loading


We will face lot of difficulties/errors if we are not handling page synchronization properly when using .NET webbrowser control for scrapping/crawling web pages.
(i-e) We need to write a code to start other activities only when page navigation is completely done.

We can use the below function "waitTillLoad()" for this synchronization purpose.
It will wait till the browser readystate becomes "complete".
Since, initally the readystate will be "complete" there is a possibility of incorrectly exiting this function even before starting new page loading.

So, to avoid this issue we have enhanced the function to wait for non-complete status before waiting for complete status.
(i-e) Page loading should occur only after starting the page navigation.

We need to mention timeout period (waittime), as the function may fall into infinite loop if we are calling it two times without initiating any further page navigation.

We can use the same function with little modifications in vb.net also.
It will be more useful and also I hope it will be more reliable as we are using it in many tools and applications for long time.
I can say that it is very essential if you are using webbrowser control for doing any page scrapping and web crawling.

You can visit our website at www.qualitypointtech.com
and you can send mail to me(rajamanickam.a@gmail.com) for any of your software development needs/assitance.



private void waitTillLoad()
{
WebBrowserReadyState loadStatus;
//wait till beginning of loading next page
int waittime = 100000;
int counter = 0;
while (true)
{
loadStatus = webBrowser1.ReadyState;
Application.DoEvents();

if ((counter > waittime) || (loadStatus == WebBrowserReadyState.Uninitialized) || (loadStatus == WebBrowserReadyState.Loading) || (loadStatus == WebBrowserReadyState.Interactive))
{
break;
}
counter++;
}

//wait till the page get loaded.
counter = 0;
while (true)
{
loadStatus = webBrowser1.ReadyState;
Application.DoEvents();

if (loadStatus == WebBrowserReadyState.Complete)
{
break;
}
counter++;

}

}

More Articles...
You can bookmark this blog for further reading, or you can subscribe to our blog feed.

10 comments:

Bobby Lundqvist said...

Almost took a year for the first comment ;-), but here it comes!

Thanks alot for a Very useful bit of code, been looking around the web like a maniac, but the without any luck, until now!

Cheers again mate, You saved me ALOT of work!

Best regards,
Bobby Lundqvist

Rajamanickam Antonimuthu said...

Bobby,
Thanks for your comments

Anonymous said...

I've been looking at the same issue for a day or so now and didn't really find the answer. I thought I tried what you have here already so I was skeptical but it worked like a champ. Thanks a lot!

Clay

Anonymous said...

Not working. If you flash on the website that you are loading this is not working.

Kristian Hildebrandt said...

first of all, thanks for your code, it works perfectly. however sometimes my webbrowser gets stuck after having submitted a form (invokemember.click()) and readystate does not seem to become complete. Do you have any solution for this?

Anonymous said...

Hey, lots of thanks! It just works ;)

Anonymous said...

1 million thank yous, sir!

neetha said...

It really works, thank you so much

Anonymous said...

Is this a correct translation to vb.net?


Public Sub waitTillLoad()
Dim waittime As Integer = 100000
Dim counter As Integer = 0
Dim loadStatus As WebBrowserReadyState

'wait till beginning of loading next page
While (True)

loadStatus = browser.ReadyState
Application.DoEvents()

If ((counter > waittime) _
Or (loadStatus = WebBrowserReadyState.Uninitialized) _
Or (loadStatus = WebBrowserReadyState.Loading) _
Or (loadStatus = WebBrowserReadyState.Interactive)) _
Then Exit While

counter += 1
End While

'wait till the page get loaded.
counter = 0
While (True)

loadStatus = browser.ReadyState
Application.DoEvents()

If (loadStatus = WebBrowserReadyState.Complete) _
And (browser.IsBusy <> True) _
Then Exit While

counter += 1
End While

End Sub

Vertigo said...

Synchronization should not be performed in a loop, this is a spin-cycle CPU consuming solution but instead events should be catched (onDocumentCompleted, onNavigationStarted events)and separate thread should wait on the event before to start consume browser's content. The event is fired in onDocumentCompleted and set to non-signal after navigation is invoked :

// This stuff should run on the background thread:
// 1. Start navigation :
protected void NavigateOnTheBrowser(string URL)
{
if (m_WebBrowser.InvokeRequired)
{ m_WebBrowser.Invoke(m_NavigationDelegate, new object[] { URL });}
else
{ m_WebBrowser.Navigate(URL); }
// Set auto reset event to non-signal to avoid consumers actions on the not-loaded content
m_SyncEvent.Reset();
}

// 2. Get the text from the browser:
protected string GetTextFromBrowser()
{
string response = "";
m_SyncEvent.WaitOne();
if (m_WebBrowser.InvokeRequired)
{
response = (String)m_WebBrowser.Invoke(m_BrowserGetTextDelegate, new object[] { });
}
else
{
response = m_WebBrowser.DocumentText; }
m_SyncEvent.Set();
return response;
}

// This stuff is an event catching on the firm where web-browser is hosted:
private void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (webBrowserPartner.ReadyState != WebBrowserReadyState.Complete)
return;
// Allow worker thread to consume loaded content:
m_SyncEvent.Set();
}


Here m_SyncEvent is an AutoResetEvent

Search This Blog