?tdZddlZddlZddlZddlZdgZejddZGddZ GddZ Gd d Z y) a% robotparser.py Copyright (C) 2000 Bastian Kleineidam You can choose between two licenses when using this package: 1) GNU GPLv2 2) PSF license for Python 2.2 The robots.txt Exclusion Protocol is implemented as specified in http://www.robotstxt.org/norobots-rfc.txt NRobotFileParser RequestRatezrequests secondscZeZdZdZddZdZdZdZdZdZ dZ d Z d Z d Z d Zd Zy)rzs This class provides a set of methods to read, parse and answer questions about a single robots.txt file. czg|_g|_d|_d|_d|_|j |d|_y)NFr)entriessitemaps default_entry disallow_all allow_allset_url last_checkedselfurls 9/opt/alt/python312/lib64/python3.12/urllib/robotparser.py__init__zRobotFileParser.__init__s;  !! Sc|jS)zReturns the time the robots.txt file was last fetched. This is useful for long-running web spiders that need to check for new robots.txt files periodically. )r rs rmtimezRobotFileParser.mtime&s   rc6ddl}|j|_y)zYSets the time the robots.txt file was last fetched to the current time. rN)timer )rrs rmodifiedzRobotFileParser.modified/s  IIKrcp||_tjj|dd\|_|_y)z,Sets the URL referring to a robots.txt file.N)rurllibparseurlparsehostpathrs rr zRobotFileParser.set_url7s-%||44S9!A> 49rc tjj|j}|j }|j |j djy#tjj$rU}|jdvrd|_ n%|jdk\r|jdkrd|_ |jYd}~yd}~wwxYw)z4Reads the robots.txt URL and feeds it to the parser.zutf-8)iiTiiN)rrequesturlopenrreadrdecode splitlineserror HTTPErrorcoder r close)rfrawerrs rr%zRobotFileParser.read<s 9&&txx0A&&(C JJszz'*557 8||%% xx:%$(!SSXX^!% IIKK  s)A**CA CCcd|jvr|j||_yy|jj|yN*) useragentsr rappend)rentrys r _add_entryzRobotFileParser._add_entryJs= %"" "!!)%*"* LL   &rcd}t}|j|D]}|s4|dk(r t}d}n"|dk(r|j|t}d}|jd}|dk\r|d|}|j }|sh|j dd}t |dk(s|dj j|d<tjj|dj |d<|ddk(rB|dk(r|j|t}|jj|dd}*|ddk(r3|dk7s9|jjt|dd d}e|dd k(r3|dk7st|jjt|dd d}|dd k(r?|dk7s|dj jrt!|d|_d}|dd k(r|dk7s|dj d}t |dk(rk|dj jrJ|dj jr)t%t!|dt!|d|_d}|ddk(s|j(j|d|dk(r|j|yy)zParse the input lines from a robots.txt file. We allow that a user-agent: line is not preceded by one or more blank lines. rr#N:z user-agentdisallowFallowTz crawl-delayz request-rate/sitemap)Entryrr5findstripsplitlenlowerrrunquoter2r3 rulelinesRuleLineisdigitintdelayrreq_rater)rlinesstater4lineinumberss rrzRobotFileParser.parseSs DA:!GEEaZOOE*!GEE #AAvBQx::>   \\**6<<+?+?+DE ll%%r"Z__   j.. 0C0C'EFll  %C\\E *s++"   %%//4 4rc|jsy|jD]!}|j|s|jcS|jr|jjSyN)rrrWrIr rrYr4s r crawl_delayzRobotFileParser.crawl_delaysTzz|\\E *{{""   %%++ +rc|jsy|jD]!}|j|s|jcS|jr|jjSyr])rrrWrJr r^s r request_ratezRobotFileParser.request_ratesTzz|\\E *~~%"   %%.. .rc4|jsy|jSr])rrs r site_mapszRobotFileParser.site_mapss}}}}rc|j}|j||jgz}djtt|S)Nz )rr joinmapstr)rrs r__str__zRobotFileParser.__str__s@,,    )!3!3 44G{{3sG,--rN)rQ)__name__ __module__ __qualname____doc__rrrr r%r5rr[r_rarcrhrrrrsE !(? 9'G#R: .rc"eZdZdZdZdZdZy)rFzoA rule line is a single "Allow:" (allowance==True) or "Disallow:" (allowance==False) followed by a path.c|dk(r|sd}tjjtjj|}tjj ||_||_y)NrQT)rrrRrrVr!rX)rr!rXs rrzRuleLine.__init__sP 2:iI||&&v||'<'>zTADIIMMrN)rirjrkrlrrWrhrmrrrFrFs1#BNrrFc(eZdZdZdZdZdZdZy)r>z?An entry has one or more user-agents and zero or more rulelinesc<g|_g|_d|_d|_yr])r2rErIrJrs rrzEntry.__init__s  rcg}|jD]}|jd||j|jd|j|j7|j}|jd|jd|j |j tt|jdj|S)Nz User-agent: z Crawl-delay: zRequest-rate: r< ) r2r3rIrJrequestssecondsextendrfrgrEre)rretagentrates rrhz Entry.__str__s__E JJeW- .% :: ! JJtzzl3 4 == $==D JJ a ~F G 3sDNN+,yy~rc|jddj}|jD]}|dk(ry|j}||vsyy)z2check if this entry applies to the specified agentr<rr1TF)rArCr2)rrYr~s rrWzEntry.applies_tosSOOC(+113 __E|KKME ! %rcd|jD]!}|j|s|jcSy)zZPreconditions: - our agent applies to this entry - filename is URL decodedT)rErWrX)rrrrMs rrXzEntry.allowance s-NNDx(~~%#rN)rirjrkrlrrhrWrXrmrrr>r>sI  rr>) rl collections urllib.errorr urllib.parseurllib.request__all__ namedtuplerrrFr>rmrrrsX   $k$$]4FG ..DNN$((r