今天给各位分享java获取网页信息的知识,其中也会对java获取网站内容进行解释,如果能碰巧解决你现在面临的问题,别忘了关注本站,现在开始吧!
本文目录一览:
java获取html
Java访问网络url,获取网页的html代码
方式一:
一是使用URL类的openStream()方法:
openStream()方法与制定的URL建立连接并返回InputStream类的对象,以从这一连接中读取数据;
openStream()方法只能读取网络资源。
二是使用URL类的openConnection()方法:
openConnection()方法会创建一个URLConnection类的对象,此对象在本地机和URL指定的远程节点建立一条HTTP协议的数据通道,可进行双向数据传输。类URLConnection提供了很多设置和获取连接参数的方法,最常用到的是getInputStream()和getOutputStream()方法。
openConnection()方法既能读取又能发送数据。
列如:
public static void main(String args[]) throws Exception {
try {
//输入url路径
URL url = new URL("url路径"); InputStream in =url.openStream(); InputStreamReader isr = new InputStreamReader(in); BufferedReader bufr = new BufferedReader(isr); String str; while ((str = bufr.readLine()) != null) { System.out.println(str); } bufr.close(); isr.close(); in.close(); } catch (Exception e) { e.printStackTrace(); } }
java如何获取网页中的文字
package test;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.Authenticator;
import java.net.HttpURLConnection;
import java.net.PasswordAuthentication;
import java.net.URL;
import java.net.URLConnection;
import java.util.Properties;
public class URLTest {
// 一个public方法,返回字符串,错误则返回"error open url"
public static String getContent(String strUrl) {
try {
URL url = new URL(strUrl);
BufferedReader br = new BufferedReader(new InputStreamReader(url
.openStream()));
String s = "";
StringBuffer sb = new StringBuffer("");
while ((s = br.readLine()) != null) {
sb.append(s + "/r/n");
}
br.close();
return sb.toString();
} catch (Exception e) {
return "error open url:" + strUrl;
}
}
public static void initProxy(String host, int port, final String username,
final String password) {
Authenticator.setDefault(new Authenticator() {
protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication(username,
new String(password).toCharArray());
}
});
System.setProperty("http.proxyType", "4");
System.setProperty("http.proxyPort", Integer.toString(port));
System.setProperty("http.proxyHost", host);
System.setProperty("http.proxySet", "true");
}
public static void main(String[] args) throws IOException {
String url = "";
String proxy = "";
int port = 80;
String username = "username";
String password = "password";
String curLine = "";
String content = "";
URL server = new URL(url);
initProxy(proxy, port, username, password);
HttpURLConnection connection = (HttpURLConnection) server
.openConnection();
connection.connect();
InputStream is = connection.getInputStream();
BufferedReader reader = new BufferedReader(new
InputStreamReader(is));
while ((curLine = reader.readLine()) != null) {
content = content + curLine+ "/r/n";
}
System.out.println("content= " + content);
is.close();
System.out.println(getContent(url));
}
}
如何在java代码中获取页面内容
import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.HttpURLConnection;
import java.net.URL;public class Test
{
public static void main(String[] args) throws Exception
{
PrintWriter pw = new PrintWriter("d:\\test.xml");//d:\\test.xml是你的xml文件路径
pw.println(getHtmlConentByUrl(" "));// 是你要访问的页面
pw.flush();
pw.close();
}
public static String getHtmlConentByUrl(
String ssourl) {
try {
URL url = new URL(ssourl);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setInstanceFollowRedirects(false);
con.setUseCaches(false);
con.setAllowUserInteraction(false);
con.connect(); StringBuffer sb = new StringBuffer();
String line = "";
BufferedReader URLinput = new BufferedReader(new InputStreamReader(con.getInputStream()));
while ((line = URLinput.readLine()) != null) {
sb.append(line);
}
con.disconnect();
return sb.toString().toLowerCase();
} catch (Exception e) {
return null;
}
}}
在获取到的页面内容是字符串,这里解析有两个办法,一是通过dom4j把字符串转化为dom进行解析,这样最好,但是对方的页面未必规范,符合dom结构。二是通过解析字符串过滤你想要的内容,该方法比较繁琐,需要一些技巧。我有的就是二;
java语言获取网页标签中的内容
新浪的那个天气的值是通过js动态加载的,原始html页面是div id="SI_Weather_Wrap" class="now-wea-wrap clearfix"/div 。
而jsoup只是对html进行解析,所以是找不到js动态生成的哪些信息的。
java读取网站内容的两种方法
HttpClient
利用apache的虚拟客户端包获取某个地址的内容 import java io UnsupportedEncodingException;
import java util HashSet;
import java util Iterator;
import java util Set;
import java util regex Matcher;
import java util regex Pattern;
import mons ;
import mons ;
import mons ;
public class catchMain {
/** *//**
* @param args
*/
public static void main(String[] args) {
String url = ;
String keyword= 食 ;
String response=createClient(url keyword);
}
public static String createClient(String url String param){
HttpClient client = new HttpClient();
String response=null;
String keyword=null;
PostMethod postMethod = new PostMethod(url);
try {
if(param!=null)
keyword = new String(param getBytes( gb ) ISO );
} catch (UnsupportedEncodingException e ) {
// TODO Auto generated catch block
e printStackTrace();
}
NameValuePair[] data = { new NameValuePair( keyword keyword) };
// 将表单的值放入postMethod中
postMethod setRequestBody(data);
try {
int statusCode = client executeMethod(postMethod);
response = new String(postMethod getResponseBodyAsString()
getBytes( ISO ) GBK );
} catch (Exception e) {
e printStackTrace();
}
return response;
}
java自带的HttpURLConnection
public static String getPageContent(String strUrl String strPostRequest
int maxLength) {
//读取结果网页
StringBuffer buffer = new StringBuffer();
System setProperty( client defaultConnectTimeout );
System setProperty( client defaultReadTimeout );
try {
URL newUrl = new URL(strUrl);
HttpURLConnection hConnect = (HttpURLConnection) newUrl
openConnection();
//POST方式的额外数据
if (strPostRequest length() ) {
hConnect setDoOutput(true);
OutputStreamWriter out = new OutputStreamWriter(hConnect
getOutputStream());
out write(strPostRequest);
out flush();
out close();
}
//读取内容
BufferedReader rd = new BufferedReader(new InputStreamReader(
hConnect getInputStream()));
int ch;
for (int length = ; (ch = rd read())
(maxLength = || length maxLength); length++)
buffer append((char) ch);
rd close();
hConnect disconnect();
return buffer toString() trim();
} catch (Exception e) {
// return 错误:读取网页失败! ;
return null;
}
lishixinzhi/Article/program/Java/hx/201311/26339
java网页获取
StringBuffer用之前要初始化,eg:StringBuffer sb = new StringBuffer();
StringBuffer document=new StringBuffer();
String line; // 读入网页信息
while ((line = reader.readLine()) != null){
document.append(line+"\n");
}
String title = document.toString();
title = title.substring(title.indexOf("title") + 7,
title.indexOf("/title"));
System.out.println(title);
关于java获取网页信息和java获取网站内容的介绍到此就结束了,不知道你从中找到你需要的信息了吗 ?如果你还想了解更多这方面的信息,记得收藏关注本站。
2、本站永久网址:https://www.yuanmacun.com
3、本网站的文章部分内容可能来源于网络,仅供大家学习与参考,如有侵权,请联系站长进行删除处理。
4、本站一切资源不代表本站立场,并不代表本站赞同其观点和对其真实性负责。
5、本站一律禁止以任何方式发布或转载任何违法的相关信息,访客发现请向站长举报
6、本站资源大多存储在云盘,如发现链接失效,请联系我们我们会第一时间更新。
源码村资源网 » java获取网页信息(java获取网站内容)
1 评论