怎样用正则表达式过滤掉页面中除了和<img>以外所有的标签

像这样可以保留p标签：
str.replaceAll("(<(\\/?[^Pp].*?)>)", "").replaceAll("[\\s*]+"," ").replaceAll(" ", "");我现在想保留p和img标签

推荐答案 2016-07-19

这个还真不容易实现，单独保留p或者img都可以，但是两个条件放一起就不行了。于是我换了一种思路，用了个函数实现了，你看下，代码是python下的：

import re

t = '<html>asdfasdf<head>1111111111<body>asdfasdfasdf <img herf="fff">'
def replace_two(m):
 """
 #过滤掉页面中除了和<img>以外所有的标签
 """
 all = re.findall(r'</?.*?>',m)
 save = re.findall(r'</?(?:img).*?>|</?[pP]*?>',m)

 for e in all:
 if e not in save:
 m1 = m.replace(e, '')
 m = m1
 return m

print replace_two(t)

追答

[]的用法是匹配[]中的任意一个字符，加^是不匹配的意思，你这样写会把以i、m和g开头标签都会过滤掉的

温馨提示：答案为网友推荐，仅供参考

当前网址：http://88.wendadaohang.com/zd/MKKVSttSSKMK1gBKVaK.html

相似回答

大家正在搜

怎样用正则表达式过滤掉页面中除了<p></p>和<img>以外所有的标签