View New Content

Javascript Disabled Detected

You currently have javascript disabled. Several functions may not work. Please re-enable javascript to access full functionality.

New to python, who knows how to use regex?

Started by Melchoire, Oct 22 2010 08:12 PM

Please log in to reply

17 replies to this topic

#1 Melchoire

5284 posts

Users Awards

Posted 22 October 2010 - 08:12 PM

So I'm just trying to get a handle for python 'cuz I need a couple different programs in different languages to build up my programming portfolio so I can get a job fast.

Anyways suppose I have an IRC chat log where each time someone says something a line in this form is added:

[hh:mm] <'handle'> 'this is what they said'

what I want to do is extract each person's name and what they typed so I'm using regex like so:

m = re.search("\[\d{2}:\d{2}]\s+<(.+?)>\s(.+?)\n", dtext)

dtext is where the chat log is stored.

The pattern matches a maximum of 2 groups at a line. It will have an even number of groups no matter what. The first one will be the username and second one will be whatever they said. What I want to do is count each person's words/line. And by line I mean everything from when they start typing to when they submit. So it's not a literal "line".

I figured the most efficient way to do this is make a "type"(I don't know what the equivalent of this is in python) called "Line", for example, with the attributes "Name" and "Words". I can iterate through the chatlog text file line by line each line that matches will have a name and whatever-they-said to go with it which I can add to an overall array and do the math later on.

So really the part I need help with is
1. creating a "type"
2. reading a file line by line (i already know how to open one)
3. matching regex pattern against that line

Back to top

#2 iargue

10048 posts

Users Awards

Posted 22 October 2010 - 08:14 PM

Read a file by lines. http://docs.python.o...nputoutput.html

I dunno what you mean by type :|. You can make an array, or an list if you want?

And you can match regex against an entire string instead of against the array.

Back to top

#3 Pyro699

1543 posts

Users Awards

Posted 22 October 2010 - 08:21 PM

Make sure to check out http://gskinner.com/RegExr/ to test your regular expressions...

Then what i would do is:



import re

allMatches = re.findall("<your expression>", stringToSearch)



if len(allMatches) == 0:

    print "No matches found"

else:

    for match in allMatches:

        print "This is one of your matches: %s" % str(match)

As for the reading file line by line...



f = open('FileName', 'r')

lines = f.readlines()

f.close()



for line in lines:

    print line

Happy programming

~Cody

Edited by Pyro699, 22 October 2010 - 08:24 PM.

Back to top

#4 Melchoire

5284 posts

Users Awards

Posted 22 October 2010 - 08:29 PM

Read a file by lines. http://docs.python.o...nputoutput.html

I dunno what you mean by type :|. You can make an array, or an list if you want?

And you can match regex against an entire string instead of against the array.

For example in visual basic you could make a type like this:

Public Type Person
	Dim Name as String
	Dim Salary as Double
End Type

I'm not sure that's even proper syntax(I haven't worked with it in years) but you get what I'm trying to do right? It's like a class but only with properties.

Make sure to check out http://gskinner.com/RegExr/ to test your regular expressions...

Then what i would do is:
import re
allMatches = re.findall("<your expression>", stringToSearch)

if len(allMatches) == 0:
    print "No matches found"
else:
    for match in allMatches:
        print "This is one of your matches: "+match
Happy programming
~Cody

Well I think it'll be a little easier if I just read it line by line instead, because I'd rather handle 2 groups at a time.

Edit: didn't see your edit, thanks ^_^

that helped

Back to top

#5 Pyro699

1543 posts

Users Awards

Posted 22 October 2010 - 08:35 PM

You mean like this?



class Person(object):

    def __init__(self, name, salary):

        self.name = name

        self.salary = salary



    def getName(self):

        return self.name



    def getSalary(self):

        return self.salary



a = Person("My Name", 23000)

print a.getName()

print a.getSalary()

Like that?

Back to top

#6 Melchoire

5284 posts

Users Awards

Posted 22 October 2010 - 08:39 PM

You mean like this?


class Person(object):
    def __init__(self, name, salary):
        self.name = name
        self.salary = salary

    def getName(self):
        return self.name

    def getSalary(self):
        return self.salary

a = Person("My Name", 23000)
print a.getName()
print a.getSalary()

Like that?

Well in that case you're creating a class which is sorta overkill for something as simple as what I want to make. I mean it works but is there another alternative?

Back to top

#7 Pyro699

1543 posts

Users Awards

Posted 22 October 2010 - 08:42 PM

You could just create a 2 dimensional list...



a = [ ["Person A's Name", 23000], ["Person B's Name", 42000] ]



for name, salary in a:

    print "The persons name is, %s\nTheir salary is %i\n\n" % (name, salary)

~Cody

Back to top

#8 Melchoire

5284 posts

Users Awards

Posted 22 October 2010 - 08:46 PM

You could just create a 2 dimensional list...


a = [ ["Person A's Name", 23000], ["Person B's Name", 42000] ]

for name, salary in a:
    print "The persons name is, %s\nTheir salary is %i\n\n" % (name, salary)

~Cody

I think I'll just use a class for now. I don't wanna get into arrays just yet. Thanks for the help, I'll come back to this thread if I have more questions.

Back to top

#9 Noitidart

Neocodex Co-Founder

23214 posts

Users Awards

Posted 22 October 2010 - 09:56 PM

REGEX is awesome. I want to make all our progers use it but I havent been succesful.

Regex totally removes the need for gsb or any other string manip. Its the most amazing thing!

Back to top

#10 Pyro699

1543 posts

Users Awards

Posted 22 October 2010 - 09:59 PM

REGEX is awesome. I want to make all our progers use it but I havent been succesful. Regex totally removes the need for gsb or any other string manip. Its the most amazing thing!

It totally is

it makes searching for text within a webpage a breeze :3 Like getting all of the links, all images, tables, unique entries... etc

Back to top

#11 Noitidart

Neocodex Co-Founder

23214 posts

Users Awards

Posted 23 October 2010 - 01:10 AM

Oh definitely the global flag is awsommmmme! I try to tell my guys but they won't listen :'(

Oh mel the other most awesome thing is arrays. You gotta learn that man arrays are king.

Back to top

#12 iargue

10048 posts

Users Awards

Posted 23 October 2010 - 05:14 AM

Well in that case you're creating a class which is sorta overkill for something as simple as what I want to make. I mean it works but is there another alternative?

Actually. A class is a type in this case. Its exactly what you want to do, and is the proper way to do it in python. He didnt show everything though.

After you add everything (He used a for an instance)

class Person(): #Creates the object
    def __init__(self): #Initiates the object
        self.name = ""
        self.salary = ""

peopleDatabase = []

a = Person()
a.name = "My Name"
a.salary = "23000"

peopleDatabase.append.(a)

Thats the proper way to create and add objects to the a class or "type" as you refer it. This just creates a container that holds the data in A as separate arrays.

You can then just shift through all of them as you need. I'll be on IRC to help more today if you need any.

Oh definitely the global flag is awsommmmme! I try to tell my guys but they won't listen :'(

Oh mel the other most awesome thing is arrays. You gotta learn that man arrays are king.

Right now I use GSB because if the page changes, I would know. How would I know if I was using Regex?

Back to top

#13 Pyro699

1543 posts

Users Awards

Posted 23 October 2010 - 10:09 AM

Right now I use GSB because if the page changes, I would know. How would I know if I was using Regex?

Im not too sure what you meant by "GSB", but why wouldn't you just check

Define a variable as the current url... ask the script what the last url that got loaded was... if it doesnt match the previous one... page change

Back to top

#14 iargue

10048 posts

Users Awards

Posted 23 October 2010 - 10:12 AM

Im not too sure what you meant by "GSB", but why wouldn't you just check Define a variable as the current url... ask the script what the last url that got loaded was... if it doesnt match the previous one... page change

Get String Between.

And I mean, whenever a program loads. Say. The Kad Feeder. GSB will only grab the data if it matches perfectly. Whereas Regex would still grab the data. This way if TNT decided to change the page silently, then the program wouldn't work, and no one would get frozen.

Back to top

#15 Pyro699

1543 posts

Users Awards

Posted 23 October 2010 - 10:16 AM

Get String Between.

And I mean, whenever a program loads. Say. The Kad Feeder. GSB will only grab the data if it matches perfectly. Whereas Regex would still grab the data. This way if TNT decided to change the page silently, then the program wouldn't work, and no one would get frozen.

Ah, ok, that makes sense.. i use a mixture of both... like in card games... when there is a board full of cards... its much easier to store them all in a list generated from a regular expression.


# zStr = String to dissect
# zStart = Starting pos (str)
# zEnd = Ending pos (str)
#
# Returns the data only between the two strings

def GetBetween(zStr, zStart, zEnd):
	z1 = zStr.find(zStart)
	z2 = zStr.find(zEnd, z1+len(zStart));
	
	if(z2 > z1 and z1 > -1):
		return zStr[z1+len(zStart):z2]
	else:
		return "";

That's the method he is talking about for anyone who was interested.

~Cody

Back to top

#16 Noitidart

Neocodex Co-Founder

23214 posts

Users Awards

Posted 23 October 2010 - 11:37 AM

You can easily check with regex to know if the page changed you can make the the regex strict using the exact parameters you used in the GSB. Regex returns null on no match.
But wouldnt it be nice not to update even when they change the page. For things like this I always throw in extra checks to make sure the data and everything is looking the way when i originally programmed it. Otherwise stop if it sees something may have changed that could play a major affect.

Back to top

#17 iargue

10048 posts

Users Awards

Posted 23 October 2010 - 11:43 AM

"regex strict using the exact parameters you used in the GSB. ".

Then its the same thing?

.

Back to top

#18 Noitidart

Neocodex Co-Founder

23214 posts

Users Awards

Posted 23 October 2010 - 11:46 AM

Hhaha yeah true. But regex is faster you were right I tested it but the speed difference is negligible. But the cool thing is you dont have to find the last index and offset your next GSB. Regex remembers it and gets them all if you use global flag.

Back to top

Back to Programming Chat Area

1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users

New to python, who knows how to use regex?

1 user(s) are reading this topic

Sign In