java获取网页信息(java获取网站内容)

今天给各位分享java获取网页信息的知识,其中也会对java获取网站内容进行解释,如果能碰巧解决你现在面临的问题,别忘了关注本站,现在开始吧!

本文目录一览:

java获取html

Java访问网络url,获取网页的html代码

方式一:

一是使用URL类的openStream()方法:

openStream()方法与制定的URL建立连接并返回InputStream类的对象,以从这一连接中读取数据;

openStream()方法只能读取网络资源。

二是使用URL类的openConnection()方法:

openConnection()方法会创建一个URLConnection类的对象,此对象在本地机和URL指定的远程节点建立一条HTTP协议的数据通道,可进行双向数据传输。类URLConnection提供了很多设置和获取连接参数的方法,最常用到的是getInputStream()和getOutputStream()方法。

openConnection()方法既能读取又能发送数据。

列如:

public static void main(String args[]) throws Exception {

try {

//输入url路径

URL url = new URL("url路径"); InputStream in =url.openStream(); InputStreamReader isr = new InputStreamReader(in); BufferedReader bufr = new BufferedReader(isr); String str; while ((str = bufr.readLine()) != null) { System.out.println(str); } bufr.close(); isr.close(); in.close(); } catch (Exception e) { e.printStackTrace(); } }

java如何获取网页中的文字

package test;

import java.io.BufferedReader;

import java.io.IOException;

import java.io.InputStream;

import java.io.InputStreamReader;

import java.net.Authenticator;

import java.net.HttpURLConnection;

import java.net.PasswordAuthentication;

import java.net.URL;

import java.net.URLConnection;

import java.util.Properties;

public class URLTest {

// 一个public方法,返回字符串,错误则返回"error open url"

public static String getContent(String strUrl) {

try {

URL url = new URL(strUrl);

BufferedReader br = new BufferedReader(new InputStreamReader(url

.openStream()));

String s = "";

StringBuffer sb = new StringBuffer("");

while ((s = br.readLine()) != null) {

sb.append(s + "/r/n");

}

br.close();

return sb.toString();

} catch (Exception e) {

return "error open url:" + strUrl;

}

}

public static void initProxy(String host, int port, final String username,

final String password) {

Authenticator.setDefault(new Authenticator() {

protected PasswordAuthentication getPasswordAuthentication() {

return new PasswordAuthentication(username,

new String(password).toCharArray());

}

});

System.setProperty("http.proxyType", "4");

System.setProperty("http.proxyPort", Integer.toString(port));

System.setProperty("http.proxyHost", host);

System.setProperty("http.proxySet", "true");

}

public static void main(String[] args) throws IOException {

String url = "";

String proxy = "";

int port = 80;

String username = "username";

String password = "password";

String curLine = "";

String content = "";

URL server = new URL(url);

initProxy(proxy, port, username, password);

HttpURLConnection connection = (HttpURLConnection) server

.openConnection();

connection.connect();

InputStream is = connection.getInputStream();

BufferedReader reader = new BufferedReader(new

InputStreamReader(is));

while ((curLine = reader.readLine()) != null) {

content = content + curLine+ "/r/n";

}

System.out.println("content= " + content);

is.close();

System.out.println(getContent(url));

}

}

java获取网页信息(java获取网站内容),java获取网页信息,信息,java,app,第1张

如何在java代码中获取页面内容

import java.io.BufferedReader;

import java.io.DataInputStream;

import java.io.DataOutputStream;

import java.io.File;

import java.io.FileInputStream;

import java.io.FileNotFoundException;

import java.io.FileOutputStream;

import java.io.IOException;

import java.io.InputStreamReader;

import java.io.PrintWriter;

import java.net.HttpURLConnection;

import java.net.URL;public class Test

{

public static void main(String[] args) throws Exception

{

PrintWriter pw = new PrintWriter("d:\\test.xml");//d:\\test.xml是你的xml文件路径

pw.println(getHtmlConentByUrl(" "));// 是你要访问的页面

pw.flush();

pw.close();

}

public static String getHtmlConentByUrl(

String ssourl) {

try {

URL url = new URL(ssourl);

HttpURLConnection con = (HttpURLConnection) url.openConnection();

con.setInstanceFollowRedirects(false);

con.setUseCaches(false);

con.setAllowUserInteraction(false);

con.connect(); StringBuffer sb = new StringBuffer();

String line = "";

BufferedReader URLinput = new BufferedReader(new InputStreamReader(con.getInputStream()));

while ((line = URLinput.readLine()) != null) {

sb.append(line);

}

con.disconnect();

return sb.toString().toLowerCase();

} catch (Exception e) {

return null;

}

}}

在获取到的页面内容是字符串,这里解析有两个办法,一是通过dom4j把字符串转化为dom进行解析,这样最好,但是对方的页面未必规范,符合dom结构。二是通过解析字符串过滤你想要的内容,该方法比较繁琐,需要一些技巧。我有的就是二;

java语言获取网页标签中的内容

新浪的那个天气的值是通过js动态加载的,原始html页面是div id="SI_Weather_Wrap" class="now-wea-wrap clearfix"/div 。

而jsoup只是对html进行解析,所以是找不到js动态生成的哪些信息的。

java读取网站内容的两种方法

HttpClient

利用apache的虚拟客户端包获取某个地址的内容  import java io UnsupportedEncodingException;

import java util HashSet;

import java util Iterator;

import java util Set;

import java util regex Matcher;

import java util regex Pattern;

import mons ;

import mons ;

import mons ;

public class catchMain {

    /** *//**

     * @param args

     */

    public static void main(String[] args) {

        String url = ;

        String keyword= 食 ;

        String response=createClient(url keyword);

    }

public static String createClient(String url String param){

        HttpClient client = new HttpClient();

        String response=null;

        String keyword=null;

        PostMethod postMethod = new PostMethod(url);

        try {

            if(param!=null)

           keyword = new String(param getBytes( gb ) ISO );

        } catch (UnsupportedEncodingException e ) {

            // TODO Auto generated catch block

            e printStackTrace();

        }

        NameValuePair[] data = { new NameValuePair( keyword keyword) };

        // 将表单的值放入postMethod中

        postMethod setRequestBody(data);

        try {

            int statusCode = client executeMethod(postMethod);

            response = new String(postMethod getResponseBodyAsString()

                    getBytes( ISO ) GBK );

        } catch (Exception e) {

            e printStackTrace();

        }

        return response;

    }

java自带的HttpURLConnection

public static String getPageContent(String strUrl String strPostRequest

            int maxLength) {

        //读取结果网页

        StringBuffer buffer = new StringBuffer();

        System setProperty( client defaultConnectTimeout );

        System setProperty( client defaultReadTimeout );

        try {

            URL newUrl = new URL(strUrl);

            HttpURLConnection hConnect = (HttpURLConnection) newUrl

                    openConnection();

            //POST方式的额外数据

            if (strPostRequest length() ) {

                hConnect setDoOutput(true);

                OutputStreamWriter out = new OutputStreamWriter(hConnect

                        getOutputStream());

                out write(strPostRequest);

                out flush();

                out close();

            }

            //读取内容

            BufferedReader rd = new BufferedReader(new InputStreamReader(

                    hConnect getInputStream()));

            int ch;

            for (int length = ; (ch = rd read())

                    (maxLength = || length maxLength); length++)

                buffer append((char) ch);

            rd close();

            hConnect disconnect();

            return buffer toString() trim();

        } catch (Exception e) {

            // return 错误:读取网页失败! ;

            return null;

        }

lishixinzhi/Article/program/Java/hx/201311/26339

java网页获取

StringBuffer用之前要初始化,eg:StringBuffer sb = new StringBuffer();

StringBuffer document=new StringBuffer();

String line; // 读入网页信息

while ((line = reader.readLine()) != null){

document.append(line+"\n");

}

String title = document.toString();

title = title.substring(title.indexOf("title") + 7,

title.indexOf("/title"));

System.out.println(title);

关于java获取网页信息和java获取网站内容的介绍到此就结束了,不知道你从中找到你需要的信息了吗 ?如果你还想了解更多这方面的信息,记得收藏关注本站。

1、本网站名称:源码村资源网
2、本站永久网址:https://www.yuanmacun.com
3、本网站的文章部分内容可能来源于网络,仅供大家学习与参考,如有侵权,请联系站长进行删除处理。
4、本站一切资源不代表本站立场,并不代表本站赞同其观点和对其真实性负责。
5、本站一律禁止以任何方式发布或转载任何违法的相关信息,访客发现请向站长举报
6、本站资源大多存储在云盘,如发现链接失效,请联系我们我们会第一时间更新。
源码村资源网 » java获取网页信息(java获取网站内容)

1 评论

您需要 登录账户 后才能发表评论

发表评论

欢迎 访客 发表评论