test if string contains a valid number

Posted on
Fri Mar 21, 2014 12:26 am
kw123 offline
User avatar
Posts: 8366
Joined: May 12, 2013
Location: Dallas, TX

test if string contains a valid number

Attached code returns a float number if possible
number = self.getNumber(string/number .. to test)
# test if a val contains a valid number, if not return ""
# return the number if any meaningful number (with letters before and after return that number)
# u"the values is -123.5 Amps" returns -123.5
# -1.3e5 returns -130000.0
# -1.3e-5 returns -0.000013
# u"1.3e-5" returns -0.000013
# u"1.3e-5x" returns "" ( - sign not first position ..need to include)
# True, u"truE" u"on" "ON".. returns 1.0; False u"faLse" u"off" returns 0.0
# u"1 2 3" returns ""
# u"1.2.3" returns ""
# u"12-5" returns ""
Code: Select all
   def getNumber(self,val):
      x = ""
      try:
         x = float(val)
         return x
      except:
         pass
      try:
         xx = ''.join([c for c in val if c in '-1234567890.'])                        # remove non numbers
         lenXX= len(xx)
         if lenXX > 0:                                                      # found numbers..
            if len( ''.join([c for c in xx if c in '.']) )           >1: return ""         # remove strings that have 2 or more dots " 5.5 6.6"
            if len( ''.join([c for c in xx if c in '-']) )           >1: return ""         # remove strings that have 2 or more -    " 5-5 6-6"
            if len( ''.join([c for c in xx if c in '1234567890']) ) ==0: return ""         # remove strings that just no numbers, just . amd - eg "abc.xyz- hij"
            if lenXX ==1                                    : return float(xx)   # just one number
            if xx.find("-") > 0                                 : return ""         # reject if "-" is not in first position
            valList = list(val)                                                # make it a list
            count = 0                                                      # count number of numbers
            for i in range(len(val)-1):                                          # reject -0 1 2.3 4  not consecutive numbers:..
               if (len(''.join([c for c in valList[i] if c in '-1234567890.'])) ==1 ):      # check if this character is a number, if yes:
                  count +=1                                                #
                  if count >= lenXX                           : break            # end of # of numbers, end of test: break, its a number
                  if (len(''.join([c for c in valList[i+1] if c in '-1234567890.'])) )== 0: return "" #  next is not a number and not all numbers accounted for, so it is numberXnumber
            return                                           float(xx)         # must be a real number, everything else is excluded
         else:                                                            # only text left,  no number in this string
            val = str(val).upper()                                             # if unicode return ""   (-->except:)
            if val== "TRUE"  or val =="ON"                        :  return 1.0      # true/on   --> 1
            if val== "FALSE" or val =="OFF"                        :  return 0.0      # false/off --> 0
            return ""                                                      # all tests failed ... nothing there, return "
      except:
         return ""                                                         # something failed eg unicode only ==> return ""
      return ""                                                            # should not happen just for safety


copied into an editor the formatting is better..


Karl

Posted on
Fri Mar 21, 2014 7:56 am
RogueProeliator offline
User avatar
Posts: 2501
Joined: Nov 13, 2012
Location: Baton Rouge, LA

Re: test if string contains a valid number

If it works and you don't want to tinker, no problem, but you can probably reduce not only the length of your code significantly, but potentially the processing time by using regular expressions to parse the string.

A properly formatted regular expression would be able to both check the format of the structure as well as return the numerical portion of it in nearly a single line of code. I can show you an example if you are interested...

Adam

Posted on
Fri Mar 21, 2014 8:42 am
kw123 offline
User avatar
Posts: 8366
Joined: May 12, 2013
Location: Dallas, TX

Re: test if string contains a valid number

this code grew by experience and exceptions. takes care of MANY combinations..
Just wanted to show the many combinations ..
YES a regex could do a better job.. but not yet on that level..

Posted on
Fri Mar 21, 2014 8:57 am
jay (support) offline
Site Admin
User avatar
Posts: 18220
Joined: Mar 19, 2008
Location: Austin, Texas

Re: test if string contains a valid number

regex is great once you get the hang of it. Here's a good start to what you're looking for.

Jay (Indigo Support)
Twitter | Facebook | LinkedIn

Posted on
Fri Mar 21, 2014 9:10 am
RogueProeliator offline
User avatar
Posts: 2501
Joined: Nov 13, 2012
Location: Baton Rouge, LA

Re: test if string contains a valid number

One principle to keep in mind that might help, and this is what is causing your script to expand I think, is that it is easier to identify the good input (or good parts of input) rather than kicking out bad input. Your script mostly does the latter and while it seems to do a good job with everything that has been thrown at it, will have to be modified for the next "unexpected" example that shows up.

This thought became more popular/main stream when dealing with SQL Injection attacks... at first programmers started just scrubbing for characters that could cause problems (such as quotes), only to realize that hackers would code around that using unicode or hex values or the like.

In your case, for example, you may be able to look for a valid number format that starts the string of text, capturing that an ignoring the rest (units/suffixes). Please do NOT take this as criticism, not meaning it like that at all, I just see that your function could potentially be improved and, especially, that you may have to continue growing it using this pattern. One example, the way I am reading your script it could fail when using European numbering localization as you are eliminating the comma.

Adam

Posted on
Fri Mar 21, 2014 10:28 am
kw123 offline
User avatar
Posts: 8366
Joined: May 12, 2013
Location: Dallas, TX

Re: test if string contains a valid number

yes the comment on the european coma is correct as i am originally from Europe I am painfully aware of it..,
(also I am using CSV in the data files, should switch to ";")

Not covered: "xxx 10.5e-9 xx" "10.5e-9" is

BUT 90% is covered by the first statement float (u"10.5e-9") works - only if there is mixture of text and numbers .. it was used just for variables, all valid device states are either a float to true/false.

anyway glad to get the discussion going.. may be someone can take shot at the regex.. happy to learn..

Karl

Posted on
Fri Mar 21, 2014 1:07 pm
RogueProeliator offline
User avatar
Posts: 2501
Joined: Nov 13, 2012
Location: Baton Rouge, LA

Re: test if string contains a valid number

Here is essentially a drop-in replacement for your getNumber function...
Code: Select all
getValueNumRegEx = re.compile("^(?P<number>\-{0,1}\d+(?:(?:\,|\.)\d+){0,1}(?:e\-{0,1}\d+){0,1})(?:.*)$",re.IGNORECASE)
getValueBoolRegEx = re.compile("^(?P<trueVal>(?:true)|(?:on)|(?:yes))|(?P<falseVal>(?:false)|(?:off)|(?:no))$",re.IGNORECASE)
def getNumberViaRegEx(value):
   try:
      if type(value) is float or type(value) is int:
         return value
      elif type(value) is bool:
         return 1.0 if value == true else 0.0
      elif type(value) is str:
         numberMatch = getValueNumRegEx.match(value)
         if numberMatch:
            return float(numberMatch.group(1))
      
         boolMatch = getValueBoolRegEx.match(value)
         if boolMatch:
            return 1.0 if boolMatch.group("trueVal") else 0.0
      return "" # whatever default includes "bad"
   except:
      return ""


Here is the long version - copy/paste this into an editor to save a python script that will show you all of the different inputs and the results using both your old code and the new code...

Code: Select all
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import re   

# Karl's original getNumber routine
def getNumber(val):
      x = ""
      try:
         x = float(val)
         return x
      except:
         pass
      try:
         xx = ''.join([c for c in val if c in '-1234567890.'])                        # remove non numbers
         lenXX= len(xx)
         if lenXX > 0:                                                      # found numbers..
            if len( ''.join([c for c in xx if c in '.']) )           >1: return ""         # remove strings that have 2 or more dots " 5.5 6.6"
            if len( ''.join([c for c in xx if c in '-']) )           >1: return ""         # remove strings that have 2 or more -    " 5-5 6-6"
            if len( ''.join([c for c in xx if c in '1234567890']) ) ==0: return ""         # remove strings that just no numbers, just . amd - eg "abc.xyz- hij"
            if lenXX ==1                                    : return float(xx)   # just one number
            if xx.find("-") > 0                                 : return ""         # reject if "-" is not in first position
            valList = list(val)                                                # make it a list
            count = 0                                                      # count number of numbers
            for i in range(len(val)-1):                                          # reject -0 1 2.3 4  not consecutive numbers:..
               if (len(''.join([c for c in valList[i] if c in '-1234567890.'])) ==1 ):      # check if this character is a number, if yes:
                  count +=1                                                #
                  if count >= lenXX                           : break            # end of # of numbers, end of test: break, its a number
                  if (len(''.join([c for c in valList[i+1] if c in '-1234567890.'])) )== 0: return "" #  next is not a number and not all numbers accounted for, so it is numberXnumber
            return                                           float(xx)         # must be a real number, everything else is excluded
         else:                                                            # only text left,  no number in this string
            val = str(val).upper()                                             # if unicode return ""   (-->except:)
            if val== "TRUE"  or val =="ON"                        :  return 1.0      # true/on   --> 1
            if val== "FALSE" or val =="OFF"                        :  return 0.0      # false/off --> 0
            return ""                                                      # all tests failed ... nothing there, return "
      except:
         return ""                                                         # something failed eg unicode only ==> return ""
      return ""                                                            # should not happen just for safety
      

# new getNumberViaRegEx routine, using global variable just to avoid the
# recompile each time, but not a big deal if not
getValueNumRegEx = re.compile("^(?P<number>\-{0,1}\d+(?:(?:\,|\.)\d+){0,1}(?:e\-{0,1}\d+){0,1})(?:.*)$",re.IGNORECASE)
getValueBoolRegEx = re.compile("^(?P<trueVal>(?:true)|(?:on)|(?:yes))|(?P<falseVal>(?:false)|(?:off)|(?:no))$",re.IGNORECASE)
def getNumberViaRegEx(value):
   try:
      if type(value) is float or type(value) is int:
         return value
      elif type(value) is bool:
         return 1.0 if value == True else 0.0
      elif type(value) is str:
         numberMatch = getValueNumRegEx.match(value)
         if numberMatch:
            return float(numberMatch.group(1))
      
         boolMatch = getValueBoolRegEx.match(value)
         if boolMatch:
            return 1.0 if boolMatch.group("trueVal") else 0.0
      return "" # whatever default includes "bad"
   except:
      return ""


if __name__ == '__main__':
   try:
      numTester = re.compile("^(?P<number>\-{0,1}\d+(?:(?:\,|\.)\d+){0,1}(?:e\-{0,1}\d+){0,1})(?:.*)$",re.IGNORECASE)
      boolConvert = re.compile("^(?P<trueVal>(?:true)|(?:on)|(?:yes))|(?P<falseVal>(?:false)|(?:off)|(?:no))$",re.IGNORECASE)
      
      inputTests = ("123", "-123", "123.4", "-123.4", "123,4", "-123,4", "12 Amps", "-12 Amps", "1.4V", "-1.4V", "1.3e4", "-1.3e5", "1.3E4", "14.0e-5", "-14.0e-5", "1.2.3", "abc", 34, 34.5, -12, -13.5)         
      boolInputTests = ("true", "TrUe", "yes", "YES", "on", "On", "Off", "OFF", "no", "NO", "false", "False", "crap", "notMe", True, False)
         
      # test the two routines against each other over all inputs
      for inputStr in inputTests:
         print "TEST:      " + str(inputStr)
         print "VALUE OLD: " + str(getNumber(inputStr))
         print "VALUE NEW: " + str(getNumberViaRegEx(inputStr))
         print ""
         
      for inputStr in boolInputTests:
         print "TEST:      " + str(inputStr)
         print "VALUE OLD: " + str(getNumber(inputStr))
         print "VALUE NEW: " + str(getNumberViaRegEx(inputStr))
         print ""
   
   except Exception as e:
      print "Exception: " + str(e)


Adam

Posted on
Fri Mar 21, 2014 4:45 pm
kw123 offline
User avatar
Posts: 8366
Joined: May 12, 2013
Location: Dallas, TX

Re: test if string contains a valid number

does the function str() also handle unicode?.. most of whats returned from indigo is unicode..

Posted on
Fri Mar 21, 2014 7:37 pm
kw123 offline
User avatar
Posts: 8366
Joined: May 12, 2013
Location: Dallas, TX

Re: test if string contains a valid number

this is fun ... need to improve the unicode part..

Code: Select all
      inputTests = (u"123", u"-123", u"123.4", u"-123.4", u"123,4", u"-123,4", u"12 Amps", u"-12 Amps", u"1.4V", u"-1.4V", "u1.3e4", u"-1.3e5", u"1.3E4", u"14.0e-5", u"-14.0e-5", "1.2.3", "abc", 34, 34.5, -12, -13.5)         
      boolInputTests = ("true", "TrUe", "yes", "YES", "on", "On", "Off", "OFF", "no", "NO", "false", "False", "crap", "notMe", True, False)



Code: Select all
TEST:      123
VALUE OLD: 123.0
VALUE NEW:

TEST:      -123
VALUE OLD: -123.0
VALUE NEW:

TEST:      123.4
VALUE OLD: 123.4
VALUE NEW:

TEST:      -123.4
VALUE OLD: -123.4
VALUE NEW:

TEST:      123,4
VALUE OLD:
VALUE NEW:

TEST:      -123,4
VALUE OLD:
VALUE NEW:

TEST:      12 Amps
VALUE OLD: 12.0
VALUE NEW:

TEST:      -12 Amps
VALUE OLD: -12.0
VALUE NEW:

TEST:      1.4V
VALUE OLD: 1.4
VALUE NEW:

TEST:      -1.4V
VALUE OLD: -1.4
VALUE NEW:

TEST:      u1.3e4
VALUE OLD:
VALUE NEW:

TEST:      -1.3e5
VALUE OLD: -130000.0
VALUE NEW:

TEST:      1.3E4
VALUE OLD: 13000.0
VALUE NEW:

TEST:      14.0e-5
VALUE OLD: 0.00014
VALUE NEW:

TEST:      -14.0e-5
VALUE OLD: -0.00014
VALUE NEW:

TEST:      1.2.3
VALUE OLD:
VALUE NEW: 1.2

TEST:      abc
VALUE OLD:
VALUE NEW:

TEST:      34
VALUE OLD: 34.0
VALUE NEW: 34

TEST:      34.5
VALUE OLD: 34.5
VALUE NEW: 34.5

TEST:      -12
VALUE OLD: -12.0
VALUE NEW: -12

TEST:      -13.5
VALUE OLD: -13.5
VALUE NEW: -13.5

TEST:      true
VALUE OLD: 1.0
VALUE NEW: 1.0

TEST:      TrUe
VALUE OLD: 1.0
VALUE NEW: 1.0

TEST:      yes
VALUE OLD:
VALUE NEW: 1.0

TEST:      YES
VALUE OLD:
VALUE NEW: 1.0

TEST:      on
VALUE OLD: 1.0
VALUE NEW: 1.0

TEST:      On
VALUE OLD: 1.0
VALUE NEW: 1.0

TEST:      Off
VALUE OLD: 0.0
VALUE NEW: 0.0

TEST:      OFF
VALUE OLD: 0.0
VALUE NEW: 0.0

TEST:      no
VALUE OLD:
VALUE NEW: 0.0

TEST:      NO
VALUE OLD:
VALUE NEW: 0.0

TEST:      false
VALUE OLD: 0.0
VALUE NEW: 0.0

TEST:      False
VALUE OLD: 0.0
VALUE NEW: 0.0

TEST:      crap
VALUE OLD:
VALUE NEW:

TEST:      notMe
VALUE OLD:
VALUE NEW:

TEST:      True
VALUE OLD: 1.0
VALUE NEW: 1.0

TEST:      False
VALUE OLD: 0.0
VALUE NEW: 0.0


Posted on
Fri Mar 21, 2014 8:45 pm
RogueProeliator offline
User avatar
Posts: 2501
Joined: Nov 13, 2012
Location: Baton Rouge, LA

Re: test if string contains a valid number

does the function str() also handle unicode?.. most of whats returned from indigo is unicode..

Oh, unicode is a good point... the only place in the conversion that cares is when it checks for a type of str (elif type(value) is str:). IIRC you can also check for unicode there with a type(value) is unicode... the regular expressions should be okay with unicode.

Posted on
Fri Mar 21, 2014 9:11 pm
kw123 offline
User avatar
Posts: 8366
Joined: May 12, 2013
Location: Dallas, TX

Re: test if string contains a valid number

with str --> unicode it seems to work..

Karl


Code: Select all
# new getNumberViaRegEx routine, using global variable just to avoid the
# recompile each time, but not a big deal if not
getValueNumRegEx = re.compile("^(?P<number>\-{0,1}\d+(?:(?:\,|\.)\d+){0,1}(?:e\-{0,1}\d+){0,1})(?:.*)$",re.IGNORECASE)
getValueBoolRegEx = re.compile("^(?P<trueVal>(?:true)|(?:on)|(?:yes))|(?P<falseVal>(?:false)|(?:off)|(?:no))$",re.IGNORECASE)
def getNumberViaRegEx(value):
   try:
      if type(value) is float or type(value) is int:
         return value
      elif type(value) is bool:
         return 1.0 if value == True else 0.0
      elif type(value) is unicode:
         numberMatch = getValueNumRegEx.match(value)
         if numberMatch:
            return float(numberMatch.group(1))
     
         boolMatch = getValueBoolRegEx.match(value)
         if boolMatch:
            return 1.0 if boolMatch.group("trueVal") else 0.0
      return "" # whatever default includes "bad"
   except:
      return ""

Code: Select all
TEST:      123
VALUE OLD: 123.0
VALUE NEW: 123.0

TEST:      -123
VALUE OLD: -123.0
VALUE NEW: -123.0

TEST:      123.4
VALUE OLD: 123.4
VALUE NEW: 123.4

TEST:      -123.4
VALUE OLD: -123.4
VALUE NEW: -123.4

TEST:      123,4
VALUE OLD:
VALUE NEW:

TEST:      -123,4
VALUE OLD:
VALUE NEW:

TEST:      12 Amps
VALUE OLD: 12.0
VALUE NEW: 12.0

TEST:      -12 Amps
VALUE OLD: -12.0
VALUE NEW: -12.0

TEST:      1.4V
VALUE OLD: 1.4
VALUE NEW: 1.4

TEST:      -1.4V
VALUE OLD: -1.4
VALUE NEW: -1.4

TEST:      u1.3e4
VALUE OLD:
VALUE NEW:

TEST:      -1.3e5
VALUE OLD: -130000.0
VALUE NEW: -130000.0

TEST:      1.3E4
VALUE OLD: 13000.0
VALUE NEW: 13000.0

TEST:      14.0e-5
VALUE OLD: 0.00014
VALUE NEW: 0.00014

TEST:      -14.0e-5
VALUE OLD: -0.00014
VALUE NEW: -0.00014

TEST:      1.2.3
VALUE OLD:
VALUE NEW:

TEST:      abc
VALUE OLD:
VALUE NEW:

TEST:      34
VALUE OLD: 34.0
VALUE NEW: 34

TEST:      34.5
VALUE OLD: 34.5
VALUE NEW: 34.5

TEST:      -12
VALUE OLD: -12.0
VALUE NEW: -12

TEST:      -13.5
VALUE OLD: -13.5
VALUE NEW: -13.5

TEST:      true
VALUE OLD: 1.0
VALUE NEW: 1.0

TEST:      TrUe
VALUE OLD: 1.0
VALUE NEW: 1.0

TEST:      yes
VALUE OLD:
VALUE NEW: 1.0

TEST:      YES
VALUE OLD:
VALUE NEW:

TEST:      on
VALUE OLD: 1.0
VALUE NEW:

TEST:      On
VALUE OLD: 1.0
VALUE NEW:

TEST:      Off
VALUE OLD: 0.0
VALUE NEW:

TEST:      OFF
VALUE OLD: 0.0
VALUE NEW:

TEST:      no
VALUE OLD:
VALUE NEW:

TEST:      NO
VALUE OLD:
VALUE NEW: 0.0

TEST:      false
VALUE OLD: 0.0
VALUE NEW:

TEST:      False
VALUE OLD: 0.0
VALUE NEW:

TEST:      crap
VALUE OLD:
VALUE NEW:

TEST:      notMe
VALUE OLD:
VALUE NEW:

TEST:      True
VALUE OLD: 1.0
VALUE NEW: 1.0

TEST:      False
VALUE OLD: 0.0
VALUE NEW: 0.0

Posted on
Fri Mar 21, 2014 9:38 pm
RogueProeliator offline
User avatar
Posts: 2501
Joined: Nov 13, 2012
Location: Baton Rouge, LA

Re: test if string contains a valid number

That output did not seem to be working... there were not any values (meaning it skipped the parse) for a few of your values in the boolean test. I think you need to look for both str and unicode in that line.

Example - look at the Value New for "false", "Off", etc.

Posted on
Fri Mar 21, 2014 9:52 pm
kw123 offline
User avatar
Posts: 8366
Joined: May 12, 2013
Location: Dallas, TX

Re: test if string contains a valid number

This one does not work: "volts are 123 and amps are 345" should not be accepted..

TEST: 12 3.4
VALUE OLD:
VALUE NEW: 12.0

Page 1 of 1

Who is online

Users browsing this forum: No registered users and 14 guests