I need help writing a regex query to remove all the website addresses in the log file. Each line of the log file contains a bunch of information (IP address, protocol, bytes, requested website, etc ...). In particular, I end up in "http: //" and the specific "ENDING", where I specify "ending = com, biz, net, tv, info", I can not find the full URL (i.e. http: // www. Google.com/bla/page2 = bubblebla, bus) The hard part of this regex query is that I want it to be on the domain Com or .info or .biz to be included in subdomain (i.e.: http: // www.google.com.MaliciousWebsite.com) is there any way to reduce this to google.com in this situation instead of full domain To catch?
I have never written a reggae query, so I have an online reference chart (http: // www. Added bytes.com/cheat-sheets-regular-expressions-cheat-sheet/) but struggling I'm even far away from me:
"\ a [http: //] \ z [\.] [Com, info, biz, tv, net]" * Sorry for the space in the URL, but StackWarFlow is flagging them and I'm new because I can post a maximum of just 2.
Thanks for the help.
Renamed : Based on the excellent feedback from everyone, I think it would be better to write this rule so that it is between (Http or https) and (non-valid url characters:?,!, @, #, $,%, ^, And, *, (,), [, {,},]] |, /, ', ';;> & Lt ;; & gt;) This will ensure that all TLDs have been captured and such webcasts like google.com.bad.website.com have also been caught. This is my mockup so far:
"\ a [https ?: //] '?! (! @ # $% ^ & Amp;; * () - = [] {} | " Thanks for all help.
You can try this Expresion:
\ b ((?:: Http: //) (?:?) * (?:.) ( ? Com | info | biz | tv | net]) and the details of how you can keep an eye here is :)
r " "" \ B # Express the position on a word boundary (# Match the regular expression below and capture its mail on backfare number 1 (?: Match # Regular expression of: http: // # match with letters ???? http: //? Literally) (?: # Match the regular expression given below. # Match any one letter Which is not a line break character) * # Zero and between unlimited times, return as often as possible (greedy) (match: #: #: with regular expression.) Match the character. Really? (?: Match the regular expression given below? Match the regular expression given below (next option attempts only if it fails) Com # matches the characters together - really. Match # or below with regular expression number 2 (try the next option only if it fails one) Information # match with letters ???? Infoâ ?? Really Try # or below to match regular expression number 3 (next option only if it fails) # match biz # letter? Match the word literally # or try regular expression number 4 below the next option only when it fails one) TV # Match the characters â €? TVS ???? Really # Or connect with regular X-ray below Sassian number 5 (the whole group fails if it fails in a match) .net # Match the characters ???? met ?? Literally)) "" "
Comments
Post a Comment