Title: Dolt-Dist Crawling of Rich Internet Application


Abstract:
Crawling web applications is important for indexing, accessibility and security assessment. Crawling traditional web applications is an old problem, as old as the web itself. Crawling Rich Internet Applications (RIA) quickly and efficiently, however, is an open problem. Technologies such as AJAX and partial Document Object Model (DOM) updates only makes the problem of crawling RIA more time consuming to the web crawler. To reduce the time to crawl a RIA, a new distributed algorithm to crawl a RIA in parallel with multiple computers, called Dolt-Dist RIA Crawler. Dolt-Dist RIA Crawler uses the JavaScript events in the DOM structure to partition the search space.