![]() |
![]() |
![]() |
|||||
![]() |
![]() |
![]() |
![]() |
![]() |
|||
| Welcome
to Tech Support Forum home to more then 136,000 problems solved. Issues
have included: Spyware, Malware, Virus Issues, Windows, Microsoft,
Linux, Networking, Security, Hardware, and Gaming Getting your
problem solved is as easy as: 1. Registering for a free account 2. Asking your question 3. Receiving an answer Registered members: * See fewer ads. * And much more..
|
| Want to know how to post a question? click here | Having problems with spyware and pop-ups? First Steps |
|
|||||||
| Linux Support Linux - Operating Systems and Applications Support |
![]() |
|
|
Thread Tools |
|
|
#1 (permalink) |
|
Registered User
Join Date: Oct 2007
Posts: 45
OS: Windows Vista Business
|
awk script help
hello. i'm not sure if this is the right place to ask my question, but i want to make an awk script to scan an html file and output all the links (e.g .html, .htm, .jpg, .doc, .pdf, etc..) inside and how many times each one occurs in the file. please anyone who knows help me!
__________________
"Football is not just a game; it is a weapon of the Revolution." |
|
|
|
|
|
#2 (permalink) |
|
Registered User
Join Date: Oct 2007
Location: Littleton, Colorado USA
Posts: 339
OS: xp 64 sp2 Fedora Core 8 (vmware xp core 8 x32) Minix
|
Re: awk script help
It will be in the form of:
awk '/.html/ {HTML=HTML+1} /.htm/ {HTM=HTM+1} /.doc/ {DOC=DOC+1} END { printf("HTML=%s; HTM=%s; DOC=%s\n", HTML, HTM, DOC);}' It might be better to use perl. Perl has a lot of libraries that know about correct syntax html files. The libraries can parse the document and return various pieces of it. The code snipit above came from a shell script call "dvddup.sh". Go and google for it. It has a lot of awk in it. Also go to the directory "/usr/bin" and "grep awk *". There are a lot of files distributed with my Fedora Core 8 that have awk scripts in them. Hope this helps. |
|
|
|
|
|
#3 (permalink) |
|
Registered User
Join Date: Oct 2007
Location: Littleton, Colorado USA
Posts: 339
OS: xp 64 sp2 Fedora Core 8 (vmware xp core 8 x32) Minix
|
Re: awk script help
Another place to look for awk examples is to go to the /etc directory in a text command window and type in "grep -R awk *". (The -R tells grep to recursively decend through the underlying directories). There are lots of startup config scripts that use awk (and everything else). This command also prints a lot of garbage.
|
|
|
|
|
|
#4 (permalink) |
|
Registered User
Join Date: Oct 2007
Posts: 45
OS: Windows Vista Business
|
Re: awk script help
i want to do specifically what i described above. its part of an assignment for my university and i dont know much about awk, that's why, and i don't have time to bother...!
i think your solution would take all occurences of /.html/ and in an html code there are links with .html and text with .html. so it would have to be something like /href=/ or /src=/ and then RS="<" and then split up the record to fields and "clean" the link! but i don't know exactly how to do it! is it more clear now?
__________________
"Football is not just a game; it is a weapon of the Revolution." |
|
|
|
|
|
#5 (permalink) |
|
Registered User
Join Date: Jun 2008
Posts: 4
OS: linux
|
Re: awk script help
Can anyone please help me with this awk script?
rsh sim1 vmstat -m | awk 'NR == 12 {print "Sim1","\t", $6}' >> /net/home/linux_users/Phyoe/scripts/man_u.txt && rup -l | egrep sim1 | awk '{print "\t","\t", $8}' >> /net/home/linux_users/Phyoe/scripts/man_u.txt The above output will be Sim1 2981 1.08 I wish to get the output in one line as following Sim1 2981 1.08 and also how do i print the field immediately after the regex? awk '/regexp/{getline;print}' <-this does not work for me as i need to print the field, not line. Please Help Best Regards, Aung Phyoe Last edited by Saosin1984 : 06-09-2008 at 11:46 PM. |
|
|
|
|
|
#6 (permalink) |
|
Registered User
Join Date: Oct 2007
Location: Littleton, Colorado USA
Posts: 339
OS: xp 64 sp2 Fedora Core 8 (vmware xp core 8 x32) Minix
|
Re: awk script help
I think, and I'm not very good at this, is that you are sending two commands to machine sim1
1)rsh sim1 vmstat -m | awk 'NR == 12 {print "Sim1","\t", $6}' >> /net/home/linux_users/Phyoe/scripts/man_u.txt and 2) && rup -l | egrep sim1 | awk '{print "\t","\t", $8}' >> /net/home/linux_users/Phyoe/scripts/man_u.txt I don't know what the rup command is. It is not in my man pages. Why dont you try something like sh sim1 vmstat -m | awk 'NR == 12 {print "Sim1","\t", $6,$8}' >> /net/home/linux_users/Phyoe/scripts/man_u.txt That way the vmstat gets piped thru the awk and if the line has 12 fields, then the 6th and the 8th get printed. A suggestion, substitute ssh for rsh. The security is better. And you don't have to play with those pesky .rsh and .rexec files. ssh will let you set up the transparent login privileges using a long script without having to actually login to the remote machine. (And you can remote login as root). |
|
|
|
|
|
#7 (permalink) |
|
Registered User
Join Date: Jun 2008
Posts: 4
OS: linux
|
Re: awk script help
Thank you very much for the reply.
rup is a command for showing the up time for servers and also the load average of the servers. you are sending two commands to machine sim1<-- no, Sim1 is the server and i am just printing the word "Sim1" What i am trying to do is that, i try to grep the load average of sim1 which is rup command and also memory usage of sim1 which is vmstat command. From the 2 above command, using awk to get the right field and print it into a file but i have some trouble. rsh sim1 vmstat -m | awk 'NR == 12 {print "Sim1","\t", $6}' >> /net/home/linux_users/Phyoe/scripts/man_u.txt && rup -l | egrep sim1 | awk '{print "\t","\t", $8}' >> /net/home/linux_users/Phyoe/scripts/man_u.txt This is the thing that i wrote in a file and the result is: Sim1 2764 2.6 The result that i wish to get is in one sigle line: Sim1 2764 2.6 I hope this is easier to understand now. :) |
|
|
|
|
|
#8 (permalink) |
|
Registered User
Join Date: Jun 2008
Posts: 4
OS: linux
|
Re: awk script help
Thank you very much for the reply.
rup is a command for showing the up time for servers and also the load average of the servers. you are sending two commands to machine sim1<-- no, Sim1 is the server and i am just printing the word "Sim1" What i am trying to do is that, i try to grep the load average of sim1 which is rup command and also memory usage of sim1 which is vmstat command. From the 2 above command, using awk to get the right field and print it into a file but i have some trouble. rsh sim1 vmstat -m | awk 'NR == 12 {print "Sim1","\t", $6}' >> /net/home/linux_users/Phyoe/scripts/man_u.txt && rup -l | egrep sim1 | awk '{print "\t","\t", $8}' >> /net/home/linux_users/Phyoe/scripts/man_u.txt This is the thing that i wrote in a file and the result is: Sim1 2764 2.6 The result that i wish to get is in one sigle line: Sim1 2764 2.6 I hope this is easier to understand now. :) |
|
|
|
|
|
#9 (permalink) |
|
Registered User
Join Date: Jun 2008
Posts: 4
OS: linux
|
Re: awk script help
I managed to solve it!
rsh sim1 vmstat -m | awk -v ORS='' 'NR == 12 {print "Sim1","\t", $6}' >> filename && rup -l | egrep sim1 | awk '{print "\t", $8}' >> filename The above command will give me Sim1 2856 2.5 Thank you so much! |
|
|
|
|
|
#10 (permalink) |
|
Registered User
Join Date: Oct 2007
Location: Littleton, Colorado USA
Posts: 339
OS: xp 64 sp2 Fedora Core 8 (vmware xp core 8 x32) Minix
|
Re: awk script help
You killed the ORS default from a newline into a double quote. Somehow the double quote disappeared into a pipe or filename.
Clever. |
|
|
|
![]() |
| Thread Tools | |
|
|