current position:Home>The crawler grabs the dynamic long text and uses it to directly generate HTML reports
The crawler grabs the dynamic long text and uses it to directly generate HTML reports
2021-08-27 03:00:37 【Lei Xuewei】
This is my participation 8 The fourth of the yuegengwen challenge 23 God , Check out the activity details :8 Yuegengwen challenge
The school committee wrote an article before Hot list long list crawler screenshot article and Fast and elegant HTML Report development
Play a little bigger this time , We climb down the hot list and save it as a report to check .
First look at the effect :
To make up !
First step Generate a report
No mistake , Put the reptile first , Out of thin air , Just get some data and generate the report first .
Save the following code as report.py, This name will be used later to introduce .
from dominate.tags import *
""" The special generation of Lei Xue Committee html Functions reported """
def generate_html(tuples):
_html = html()
_head = head()
_head.add(title(" Compiled by Lei Xuewei CSDN Hot list report "))
_head.add(meta(charset="utf-8"))
_html.add(_head)
_body = _html.add(body())
_table = table(border=1)
with _table.add(tbody()):
index = 0
for tp in tuples:
index += 1
leiXW = tr()
leiXW += td(str(index))
leiXW += td(a(tp[1],href=tp[0]))
with _body.add(div(cls="leixuewei")):
h3(" Compiled by Lei Xuewei CSDN Hot list ")
_body.add(_table)
return _html.render()
""" A function specially designed by the Commission of mine science to directly generate and save reports """
def lei_report(leixuewei_tuples, path):
data = generate_html(leixuewei_tuples)
with open(path, "w") as f:
f.write(data)
if __name__ == "__main__":
lxw_tuples = []
lxw_tuples.append(("https://blog.csdn.net/geeklevin/article/details/119594295"," Lei Xuewei Python Generate Html report form "))
lxw_tuples.append(("https://blog.csdn.net/geeklevin/article/details/116771659","Docker Tired of playing , Try it Vagrant"))
path = "./csdn_rank.html"
lei_report(lxw_tuples, path)
Copy code
Code parsing
The code on generates a html Webpage , And save to path Variable specifies the path .
- Prepare a binary array
- Pass in generate_html function , This function builds a string with head and body. among body Iterate over the input array , Generate a table .
- Write the table content output to a file
The effect is as follows :
The second step is to transform the previous crawler code
That's this one Hot list long list crawler screenshot article The core code inside , Let's directly transform .
''' Tips for solving crawlers on streaming pages The core code of the screenshot : '''
def resolve_height(driver, pageh_factor=5):
js = "return action=document.body.scrollHeight"
height = 0
page_height = driver.execute_script(js)
ref_pageh = int(page_height * pageh_factor)
step = 150
max_count = 15
count = 0
while count < max_count and height < page_height:
#scroll down to page bottom
for i in range(height, ref_pageh, step):
count+=1
vh = i
slowjs='window.scrollTo(0, {})'.format(vh)
print('[ Lei Xuewei Demo]exec js: %s' % slowjs)
driver.execute_script(slowjs)
sleep(0.3)
if i >= ref_pageh- step:
print('[ Lei Xuewei Demo]not fully read')
break
height = page_height
sleep(2)
page_height = driver.execute_script(js)
print("finish scroll")
return page_height
# Get the actual height of the window
page_height = resolve_height(driver)
print("[ Lei Xuewei Demo]page height : %s"%page_height)
sleep(5)
driver.execute_script('document.documentElement.scrollTop=0')
sleep(1)
driver.save_screenshot(img_path)
page_height = driver.execute_script('return document.documentElement.scrollHeight') # Page height
print("get accurate height : %s" % page_height)
# The above code is from the previous article
# Reference report function
from report import lei_report
# Pull to the bottom of the page
driver.execute_script(f'document.documentElement.scrollTop={page_height};')
sleep(1)
driver.save_screenshot(f'./leixuewei_rank_end.png')
blogs = driver.find_elements_by_xpath("//div[@class='hosetitem-title']/a")
# Generating arrays
articles = []
for blog in blogs:
link = blog.get_attribute("href")
title = blog.text
articles.append((link,title))
print('get %s articles' % len(articles))
print('articles : %s ' % str(articles))
# Given path , Generate html The report
path = "./leixuewei_csdn_rank.html"
lei_report(articles, path)
print(" Save hot list to path :%s" %path)
"""LeiXueWei Demo Code , There are so many white whores , Pay attention to the third company and support it !"""
Copy code
Code parsing
The crawler code of streaming processing in the previous article has deleted the screenshot merging code segment .
then , The key is coming. . Following steps :
- The reptile pulls directly to the bottom , For a link , Generating arrays
- Then take a screenshot at the end of the page , You can keep it as a souvenir in the future
- Import calls lei_report function , Generate page
It's simpler , No, read it line by line .
The effect is as follows :
The report is too long. The screenshot cuts the beginning and end , have a look .
summary : Look at this article more
This article is for demonstration purposes only , Any objections to the demo site , Please inform us of the modification .
Finally, the use of reptiles must be cautious , Don't use it as a child's play to climb institutional websites . You can't brush with serious network when you study , This behavior will make you eat LAO rice !
by the way , The school committee can also focus on long-term reading => Lei Xuewei interesting programming story compilation
perhaps => Lei Xuewei NodeJS series
Continuous learning and continuous development , I'm Lei Xuewei !
Programming is fun , The key is to understand the technology thoroughly .
It's not easy to create , Please support , Like the collection and support the School Committee !
copyright notice
author[Lei Xuewei],Please bring the original link to reprint, thank you.
https://en.qdmana.com/2021/08/20210827030035077s.html
The sidebar is recommended
- Crazy blessing! Tencent boss's "million JVM learning notes", real topic of Huawei Java interview 2020-2021
- JS JavaScript how to get the subscript of a value in the array
- How to implement injection in vuex source code?
- JQuery operation select (value, setting, selected)
- One line of code teaches you how to advertise on Tanabata Valentine's Day - Animation 3D photo album (music + text) HTML + CSS + JavaScript
- An article disassembles the pyramid architecture behind the gamefi outbreak
- BEM - a front-end CSS naming methodology
- [vue3] encapsulate custom global plug-ins
- Error using swiper plug-in in Vue
- Another ruthless character fell by 40000, which was "more beautiful" than Passat and maiteng, and didn't lose BMW
guess what you like
-
Huang Lei basks in Zhang Yixing's album, and the relationship between teachers and apprentices is no less than that in the past. Netizens envy Huang Lei
-
He was cheated by Wang Xiaofei and Li Chengxuan successively. Is an Yixuan a blessed daughter and not a blessed home?
-
Zhou Shen sang the theme song of the film "summer friends and sunny days" in mainland China. Netizen: endless aftertaste
-
Pink is Wangyuan online! Back to the peak! The new hairstyle is creamy and sassy
-
Front end interview daily 3 + 1 - day 858
-
Spring Webflux tutorial: how to build reactive web applications
-
[golang] walk into go language lesson 24 TCP high-level operation
-
August 23, 2021 Daily: less than three years after its establishment, Google dissolved the health department
-
The female doctor of Southeast University is no less beautiful than the female star. She has been married four times, and her personal experience has been controversial
-
There are many potential safety hazards in Chinese restaurant. The top of the program recording shed collapses, and the artist will fall down if he is careless
Random recommended
- Anti Mafia storm: He Yun's helpless son, Sun Xing, is destined to be caught by his dry son
- Introduction to flex flexible layout in CSS -- learning notes
- CSS learning notes - Flex layout (Ruan Yifeng tutorial summary)
- Today, let's talk about the arrow function of ES6
- Some thoughts on small program development
- Talk about mobile terminal adaptation
- Unwilling to cooperate with Wang Yibo again, Zhao Liying's fans went on a collective strike and made a public apology in less than a day
- JS function scope, closure, let, const
- Zheng Shuang's 30th birthday is deserted. Chen Jia has been sending blessings for ten years. Is it really just forgetting to make friends?
- Unveil the mystery of ascension
- Asynchronous solution async await
- Analysis and expansion of Vue infinite scroll source code
- Compression webpack plugin first screen loading optimization
- Specific usage of vue3 video play plug-in
- "The story of huiyeji" -- people are always greedy, and fairies should be spotless!
- Installing Vue devtool for chrome and Firefox
- Basic usage of JS object
- 1. JavaScript variable promotion mechanism
- Two easy-to-use animation JS that make the page move
- Front end Engineering - scaffold
- Java SQL Server intelligent fixed asset management, back end + front end + mobile end
- Mediator pattern of JavaScript Design Pattern
- Array de duplication problem solution - Nan recognition problem
- New choice for app development: building mobile applications using Vue native
- New gs8 Chengdu auto show announces interior Toyota technology blessing
- Vieira officially terminated his contract and left the team. The national security club sent blessings to him
- Less than 200000 to buy a Ford RV? 2.0T gasoline / diesel power, horizontal bed / longitudinal bed layout can be selected
- How does "heart 4" come to an end? Pinhole was boycotted by the brand, Ma Dong deleted the bad comments, and no one blessed him
- We are fearless in epidemic prevention and control -- pay tribute to the front-line workers of epidemic prevention!
- Front end, netty framework tutorial
- Xiaomi 11 | miui12.5 | android11 solves the problem that the httpcanary certificate cannot be installed
- The wireless charging of SAIC Roewe rx5 plus is so easy to use!
- Upload and preview pictures with JavaScript, and summarize the most complete mybatis core configuration file
- [25] typescript
- CSS transform Complete Guide (Second Edition) flight.archives 007
- Ajax foundation - HTTP foundation of interview essential knowledge
- Cloud lesson | explain in detail how Huawei cloud exclusive load balancing charges
- Decorator pattern of JavaScript Design Pattern
- [JS] 10. Closure application (loop processing)
- Left hand IRR, right hand NPV, master the password of getting rich