本文共 3709 字,大约阅读时间需要 12 分钟。
2021/04/18上传
2021/04/21更新:修改N
的输入方式,增加对png
格式图片的下载支持,增加了自动处理几种错误情况的代码,能够将下载过程与报错记录保存到日志中。
N = input('Input the number you want to download:'); URL = 'https://www.nasa.gov/api/2/ubernode/_search';mainURL = 'https://www.nasa.gov/sites/default/files/';opt = weboptions('Timeout',10);for i=1:N data = webread(URL,'size',num2str(N),'from','0','sort','promo-date-time:desc','q','((ubernode-type:image) AND (routes:1446))','_source_include','promo-date-time,master-image,nid,title,topics,missions,collections,other-tags,ubernode-type,primary-tag,secondary-tag,cardfeed-title,type,collection-asset-link,link-or-attachment,pr-leader-sentence,image-feature-caption,attachments,uri',opt); imgURL = append(mainURL,data.hits.hits(i).x_source.master_image.uri(10:end)); img = webread(imgURL,opt); filename = append('Img_',num2str(i),'_',data.hits.hits(i).x_source.master_image.title,'.jpg'); imwrite(img,filename); disp(append('FINISHED:',num2str(i),'/',num2str(N)));enddisp('Completed!');
将.m
脚本文件所在路径添加到MATLAB路径中,运行脚本,命令行提示:Input the number you want to download:
,输入你想下载的图片数量后,爬虫自动开始运行并显示进度,进度读完则显示Completed!
,图片保存在脚本所在目录下。
本爬虫仅适用于爬取NASA画廊每日图片,但只要取得了图片链接,用此方法可以爬取其他更多网站。
在https://www.nasa.gov/multimedia/imagegallery/iotd.html
使用F12中Network工具,可以抓取到网页获取图片信息的网址接口URL
,它的参数由几个部分组成,其中size
对应一次获取的图片数量,则可通过变更size
来获得不同的图片数量。
URL
的响应中,包含我们要获取的图片链接的一部分,即uri
。
通过mainURL
与uri(10:end)
组合可以得到不同编号的图片链接,使用webread()
函数读入即可。
weboption()
函数用于设置访问方式为Get
与超时响应时间Timeout
。
append()
合并字符串,imwrite()
将图片写入指定文件并重命名。
via nasa.gov
2021/04/21更新:修改N
的输入方式,增加对png
格式图片的下载,增加了自动处理几种错误情况的代码,能够将下载过程与报错记录保存到日志中。
disp('Input the number you want to download:[N1-N2]');N1 = input('N1:');N2 = input('N2:');disp(append('From ',num2str(min(N1,N2)),' to ',num2str(max(N1,N2)),' There are ',num2str(max(N1,N2)-min(N1,N2)+1),' pictures.'));URL = 'https://www.nasa.gov/api/2/ubernode/_search';mainURL = 'https://www.nasa.gov/sites/default/files/';opt = weboptions('Timeout',10);ispng=1;path = 'F:\PictureDownload\PictureDownload';for i=min(N1,N2):max(N1,N2) try data = webread(URL,'size',num2str(i),'from','0','sort','promo-date-time:desc','q','((ubernode-type:image) AND (routes:1446))','_source_include','promo-date-time,master-image,nid,title,topics,missions,collections,other-tags,ubernode-type,primary-tag,secondary-tag,cardfeed-title,type,collection-asset-link,link-or-attachment,pr-leader-sentence,image-feature-caption,attachments,uri',opt); catch disp('[ERROR]Failed to connect to the website. Check your web connection.'); break end imgURL = append(mainURL,data.hits.hits(i).x_source.master_image.uri(10:end)); try img = webread(imgURL,opt); catch disp(append('[WARN]Failed to download the ',num2str(i),'th picture. It has been skipped up.')); disp(append('[LINK]',imgURL)); i = i+1; continue end filename = append(path,'Img_',num2str(i),'_',data.hits.hits(i).x_source.master_image.title,'.jpg'); try imwrite(img,filename); disp(append('[',num2str(i),']FINISHED:',num2str(i-min(N1,N2)+1),'/',num2str(max(N1,N2)-min(N1,N2)+1))); catch filename = append(path,'Img_',num2str(i),'_',data.hits.hits(i).x_source.master_image.title,'.png'); try imwrite(img,filename); catch ispng=0; end if ispng==1 disp(append('[WARN]The ',num2str(i),'th picture is the format of png, it has been download successfully.')) else disp(append('[WARN]Failed to write in img file, The No.',num2str(i),' picture has been skipped up.')); disp(append('[LINK]:',imgURL)); i = i+1; end endenddisp('Completed!');
转载地址:http://fjugf.baihongyu.com/