Abstract:The key issue of mining data on WEB is how to design an intelligent and effective spider.The paper analyzes the work flow and key technologies of the spider facing URL in details.It also brings forward the mind that adopting several queues to manage the URL list,in order to download HTML files in high speed we sort the URLs by document correlativity.Moreover,we import the idea of iterative threshold into computing document correlativity,which resolve the random modification of threshold.