Tuesday, March 31, 2009

Finding broken links using HttpWebRequest/HttpWebResponse in C#


HttpWebRequest/HttpWebResponse can be used to find broken links in website. You can refer below function isBrokenLink.
StatusCode in the response will be used for finding whether the link is broken or not. But normally exception will be thrown if the link is broken. So the Timeout property of webrequest plays important role here. (i-e) If we specify more timeout value, then total execution will take more time. If we specify less timout
then there may a possiblity of declaring a valid link as a broken link. If anyone knows how to handle it appropriately, you can mention it in the comments.

private bool isBrokenLink(string url)
{

Boolean isBrokenLink = false;

try
{

WebRequest http = HttpWebRequest.Create(url);
http.Timeout = 5000;
HttpWebResponse httpresponse = (HttpWebResponse)http.GetResponse();

if (httpresponse.StatusCode == HttpStatusCode.OK)
{
isBrokenLink = false;
}
else
{
isBrokenLink = true;
}


}
catch (Exception ex)
{
isBrokenLink = true;

}
return isBrokenLink;

}


Below updates added on April 21.

Making below two changes in the above code may increase the performance.

HttpWebRequest http = (HttpWebRequest) WebRequest.Create(url);
http.UserAgent = "Mozilla/9.0 (compatible; MSIE 6.0; Windows 98)";
http.Method = "HEAD";



Actually the HEAD method will allow verifying the link without downloading entire content. So the performance will be increased. Particularly, it will improve the performance significantly when verifying the missing images.
More Articles...

Read more ...

Finding Time duration in C#


We can use TimeSpan for finding Time duration in C#.

(e.g)
//Get starttime
DateTime starttime=DateTime.Now;
//To some task (e.g browse www.qualitypointtech.com)
............
webbrowser1.navigate("www.qualitypointtech.com");
............
//Get endtime
DateTime endtime=DateTime.Now;

//Find the time gap between starttime and endtime.
TimeSpan duration = endtime - starttime
More Articles...

Read more ...

C# webbrowser control - Javascript error suppression


Webbrowser control in C#/.net is having a property to enable/disable javascript error messages while doing navigation/crawling/scrapping.

We can suppress/hide javascript error messages using below code.

webBrowser1.ScriptErrorsSuppressed=true;


More C# articles

More javascript articles

More Articles...

Read more ...

Saturday, March 28, 2009

C# webbrowser control - Synchronization for Page navigation/loading


We will face lot of difficulties/errors if we are not handling page synchronization properly when using .NET webbrowser control for scrapping/crawling web pages.
(i-e) We need to write a code to start other activities only when page navigation is completely done.

We can use the below function "waitTillLoad()" for this synchronization purpose.
It will wait till the browser readystate becomes "complete".
Since, initally the readystate will be "complete" there is a possibility of incorrectly exiting this function even before starting new page loading.

So, to avoid this issue we have enhanced the function to wait for non-complete status before waiting for complete status.
(i-e) Page loading should occur only after starting the page navigation.

We need to mention timeout period (waittime), as the function may fall into infinite loop if we are calling it two times without initiating any further page navigation.

We can use the same function with little modifications in vb.net also.
It will be more useful and also I hope it will be more reliable as we are using it in many tools and applications for long time.
I can say that it is very essential if you are using webbrowser control for doing any page scrapping and web crawling.

You can visit our website at www.qualitypointtech.com
and you can send mail to me(rajamanickam.a@gmail.com) for any of your software development needs/assitance.



private void waitTillLoad()
{
WebBrowserReadyState loadStatus;
//wait till beginning of loading next page
int waittime = 100000;
int counter = 0;
while (true)
{
loadStatus = webBrowser1.ReadyState;
Application.DoEvents();

if ((counter > waittime) || (loadStatus == WebBrowserReadyState.Uninitialized) || (loadStatus == WebBrowserReadyState.Loading) || (loadStatus == WebBrowserReadyState.Interactive))
{
break;
}
counter++;
}

//wait till the page get loaded.
counter = 0;
while (true)
{
loadStatus = webBrowser1.ReadyState;
Application.DoEvents();

if (loadStatus == WebBrowserReadyState.Complete)
{
break;
}
counter++;

}

}

More Articles...
You can bookmark this blog for further reading, or you can subscribe to our blog feed.
Read more ...

Friday, March 27, 2009

MySQL - Is select query case sensitive?


Select query in MySQL can be either case sensitive or case insensitive by default.

It depends on CHARSET defined while creating the table. Binary chartset(e.g utf8) will make select query case sensitive by default. But anyway we can force the select query case insensitive by using UCASE in the query.

e.g select * from companys where ucase(name)=ucase('QualityPoint')
More Articles...

Read more ...

Tuesday, March 17, 2009

AJAX- Some tags (TABLE,TR,TD) will be readonly in IE


Internet Explorer won't allow to assign innerHTML value for some HTML tags (TABLE,TR,TD). InnerHTML property will be readonly for these tags in IE.

As a workaround for handing this issue we can put div tag for updating values using AJAX in IE. InnerHTML for div tag is writable in IE.
More Articles...

Read more ...

Saturday, March 14, 2009

FFmpeg - Media file conversion tool


FFmpeg will be more useful for converting any media file.

We can use this tool from php by using exec command. Please find below the example code for converting media file to .flv file.

exec("ffmpeg -itsoffset -4 -i '$sourcefilepath' -vcodec mjpeg -vframes 1 -an -f rawvideo -s 320x240 '$desinationfile'")
More Articles...

Read more ...

sIFR - The new way of Displaying desired Font in web browser


sIFR (Scalable Inman Flash Replacement) is an Open Source Technology which allows us to see Flash Fonts in web brower while enabling search engines to read the text.

Javascript and Flash player should be enabled in the viewing browser to make it work. Otherwise it will automatically use traditional CSS style without user intervention.
More Articles...

Read more ...

Search This Blog