Quantcast

Jump to content


Photo

New to python, who knows how to use regex?


  • Please log in to reply
17 replies to this topic

#1 Melchoire

Melchoire
  • 5284 posts


Users Awards

Posted 22 October 2010 - 08:12 PM

So I'm just trying to get a handle for python 'cuz I need a couple different programs in different languages to build up my programming portfolio so I can get a job fast.

Anyways suppose I have an IRC chat log where each time someone says something a line in this form is added:

[hh:mm] <'handle'> 'this is what they said'


what I want to do is extract each person's name and what they typed so I'm using regex like so:

m = re.search("\[\d{2}:\d{2}]\s+<(.+?)>\s(.+?)\n", dtext)
dtext is where the chat log is stored.

The pattern matches a maximum of 2 groups at a line. It will have an even number of groups no matter what. The first one will be the username and second one will be whatever they said. What I want to do is count each person's words/line. And by line I mean everything from when they start typing to when they submit. So it's not a literal "line".

I figured the most efficient way to do this is make a "type"(I don't know what the equivalent of this is in python) called "Line", for example, with the attributes "Name" and "Words". I can iterate through the chatlog text file line by line each line that matches will have a name and whatever-they-said to go with it which I can add to an overall array and do the math later on.

So really the part I need help with is
1. creating a "type"
2. reading a file line by line (i already know how to open one)
3. matching regex pattern against that line

#2 iargue

iargue
  • 10048 posts


Users Awards

Posted 22 October 2010 - 08:14 PM

Read a file by lines. http://docs.python.o...nputoutput.html

I dunno what you mean by type :|. You can make an array, or an list if you want?

And you can match regex against an entire string instead of against the array.

#3 Pyro699

Pyro699
  • 1543 posts


Users Awards

Posted 22 October 2010 - 08:21 PM

Make sure to check out http://gskinner.com/RegExr/ to test your regular expressions...

Then what i would do is:

import re
allMatches = re.findall("<your expression>", stringToSearch)

if len(allMatches) == 0:
print "No matches found"
else:
for match in allMatches:
print "This is one of your matches: %s" % str(match)


As for the reading file line by line...

f = open('FileName', 'r')
lines = f.readlines()
f.close()

for line in lines:
print line


Happy programming :)
~Cody

Edited by Pyro699, 22 October 2010 - 08:24 PM.


#4 Melchoire

Melchoire
  • 5284 posts


Users Awards

Posted 22 October 2010 - 08:29 PM

Read a file by lines. http://docs.python.o...nputoutput.html

I dunno what you mean by type :|. You can make an array, or an list if you want?

And you can match regex against an entire string instead of against the array.

For example in visual basic you could make a type like this:

Public Type Person
	Dim Name as String
	Dim Salary as Double
End Type
I'm not sure that's even proper syntax(I haven't worked with it in years) but you get what I'm trying to do right? It's like a class but only with properties.

Make sure to check out http://gskinner.com/RegExr/ to test your regular expressions...

Then what i would do is:


import re
allMatches = re.findall("<your expression>", stringToSearch)

if len(allMatches) == 0:
print "No matches found"
else:
for match in allMatches:
print "This is one of your matches: "+match


Happy programming :)
~Cody


Well I think it'll be a little easier if I just read it line by line instead, because I'd rather handle 2 groups at a time.

Edit: didn't see your edit, thanks ^_^ that helped

#5 Pyro699

Pyro699
  • 1543 posts


Users Awards

Posted 22 October 2010 - 08:35 PM

You mean like this?


class Person(object):
def __init__(self, name, salary):
self.name = name
self.salary = salary

def getName(self):
return self.name

def getSalary(self):
return self.salary

a = Person("My Name", 23000)
print a.getName()
print a.getSalary()


Like that?

#6 Melchoire

Melchoire
  • 5284 posts


Users Awards

Posted 22 October 2010 - 08:39 PM

You mean like this?


class Person(object):
def __init__(self, name, salary):
self.name = name
self.salary = salary

def getName(self):
return self.name

def getSalary(self):
return self.salary

a = Person("My Name", 23000)
print a.getName()
print a.getSalary()


Like that?



Well in that case you're creating a class which is sorta overkill for something as simple as what I want to make. I mean it works but is there another alternative?



#7 Pyro699

Pyro699
  • 1543 posts


Users Awards

Posted 22 October 2010 - 08:42 PM

You could just create a 2 dimensional list...


a = [ ["Person A's Name", 23000], ["Person B's Name", 42000] ]

for name, salary in a:
print "The persons name is, %s\nTheir salary is %i\n\n" % (name, salary)


~Cody

#8 Melchoire

Melchoire
  • 5284 posts


Users Awards

Posted 22 October 2010 - 08:46 PM

You could just create a 2 dimensional list...


a = [ ["Person A's Name", 23000], ["Person B's Name", 42000] ]

for name, salary in a:
print "The persons name is, %s\nTheir salary is %i\n\n" % (name, salary)


~Cody


I think I'll just use a class for now. I don't wanna get into arrays just yet. Thanks for the help, I'll come back to this thread if I have more questions.

#9 Noitidart

Noitidart
  • Neocodex Co-Founder

  • 23214 posts


Users Awards

Posted 22 October 2010 - 09:56 PM

REGEX is awesome. I want to make all our progers use it but I havent been succesful. :( Regex totally removes the need for gsb or any other string manip. Its the most amazing thing!

#10 Pyro699

Pyro699
  • 1543 posts


Users Awards

Posted 22 October 2010 - 09:59 PM

REGEX is awesome. I want to make all our progers use it but I havent been succesful. :( Regex totally removes the need for gsb or any other string manip. Its the most amazing thing!


It totally is :D it makes searching for text within a webpage a breeze :3 Like getting all of the links, all images, tables, unique entries... etc :)

#11 Noitidart

Noitidart
  • Neocodex Co-Founder

  • 23214 posts


Users Awards

Posted 23 October 2010 - 01:10 AM

Oh definitely the global flag is awsommmmme! I try to tell my guys but they won't listen :'(

Oh mel the other most awesome thing is arrays. You gotta learn that man arrays are king.

#12 iargue

iargue
  • 10048 posts


Users Awards

Posted 23 October 2010 - 05:14 AM

Well in that case you're creating a class which is sorta overkill for something as simple as what I want to make. I mean it works but is there another alternative?



Actually. A class is a type in this case. Its exactly what you want to do, and is the proper way to do it in python. He didnt show everything though.

After you add everything (He used a for an instance)

class Person(): #Creates the object
    def __init__(self): #Initiates the object
        self.name = ""
        self.salary = ""

peopleDatabase = []

a = Person()
a.name = "My Name"
a.salary = "23000"

peopleDatabase.append.(a)


Thats the proper way to create and add objects to the a class or "type" as you refer it. This just creates a container that holds the data in A as separate arrays.

You can then just shift through all of them as you need. I'll be on IRC to help more today if you need any.

Oh definitely the global flag is awsommmmme! I try to tell my guys but they won't listen :'(

Oh mel the other most awesome thing is arrays. You gotta learn that man arrays are king.



Right now I use GSB because if the page changes, I would know. How would I know if I was using Regex?

#13 Pyro699

Pyro699
  • 1543 posts


Users Awards

Posted 23 October 2010 - 10:09 AM

Right now I use GSB because if the page changes, I would know. How would I know if I was using Regex?


Im not too sure what you meant by "GSB", but why wouldn't you just check :p Define a variable as the current url... ask the script what the last url that got loaded was... if it doesnt match the previous one... page change :D

#14 iargue

iargue
  • 10048 posts


Users Awards

Posted 23 October 2010 - 10:12 AM

Im not too sure what you meant by "GSB", but why wouldn't you just check :p Define a variable as the current url... ask the script what the last url that got loaded was... if it doesnt match the previous one... page change :D



Get String Between.

And I mean, whenever a program loads. Say. The Kad Feeder. GSB will only grab the data if it matches perfectly. Whereas Regex would still grab the data. This way if TNT decided to change the page silently, then the program wouldn't work, and no one would get frozen.

#15 Pyro699

Pyro699
  • 1543 posts


Users Awards

Posted 23 October 2010 - 10:16 AM

Get String Between.

And I mean, whenever a program loads. Say. The Kad Feeder. GSB will only grab the data if it matches perfectly. Whereas Regex would still grab the data. This way if TNT decided to change the page silently, then the program wouldn't work, and no one would get frozen.


Ah, ok, that makes sense.. i use a mixture of both... like in card games... when there is a board full of cards... its much easier to store them all in a list generated from a regular expression.


# zStr = String to dissect
# zStart = Starting pos (str)
# zEnd = Ending pos (str)
#
# Returns the data only between the two strings

def GetBetween(zStr, zStart, zEnd):
z1 = zStr.find(zStart)
z2 = zStr.find(zEnd, z1+len(zStart));

if(z2 > z1 and z1 > -1):
return zStr[z1+len(zStart):z2]
else:
return "";


That's the method he is talking about for anyone who was interested.

~Cody

#16 Noitidart

Noitidart
  • Neocodex Co-Founder

  • 23214 posts


Users Awards

Posted 23 October 2010 - 11:37 AM

You can easily check with regex to know if the page changed you can make the the regex strict using the exact parameters you used in the GSB. Regex returns null on no match.
But wouldnt it be nice not to update even when they change the page. For things like this I always throw in extra checks to make sure the data and everything is looking the way when i originally programmed it. Otherwise stop if it sees something may have changed that could play a major affect.

#17 iargue

iargue
  • 10048 posts


Users Awards

Posted 23 October 2010 - 11:43 AM

"regex strict using the exact parameters you used in the GSB. ".



Then its the same thing? :p.

#18 Noitidart

Noitidart
  • Neocodex Co-Founder

  • 23214 posts


Users Awards

Posted 23 October 2010 - 11:46 AM

Hhaha yeah true. But regex is faster you were right I tested it but the speed difference is negligible. But the cool thing is you dont have to find the last index and offset your next GSB. Regex remembers it and gets them all if you use global flag.


1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users