Title:
Dolt-Dist Crawling of Rich Internet Application
Abstract:
Crawling
web
applications is important for indexing, accessibility and
security
assessment. Crawling traditional web applications is an old
problem, as old as
the web itself. Crawling Rich Internet Applications (RIA)
quickly and
efficiently, however, is an open problem. Technologies such as
AJAX and partial
Document Object Model (DOM) updates only makes the problem of
crawling RIA more
time consuming to the web crawler. To reduce the time to crawl a
RIA, a new
distributed algorithm to crawl a RIA in parallel with multiple
computers,
called Dolt-Dist RIA Crawler. Dolt-Dist RIA Crawler uses the
JavaScript events
in the DOM structure to partition the search space.