pandas

pandas / pandas / core / tools / numeric.py / Jump to Code definitions Function to_numeric Code navigation index up-to-date

Go to file

  • TGo to file
  • Go to lineL
  • Go to definitionR
  • Copy path

  • Copy permalink

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.  Cannot retrieve contributors at this time

253 lines (220 sloc)

8.05 KB

Raw
Blame
Edit this fileE

Open in GitHub Desktop

  • Open with Desktop

  • View raw

  • Copy raw contents
    Copy raw contents
    Copy raw contents
    Copy raw contents
  • View blame

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters

from __future__ importannotations
fromtypingimportLiteral
importnumpyasnp
frompandas._libsimportlib
frompandas._typingimport (
DateTimeErrorChoices,
npt,
)
frompandas.core.dtypes.castimportmaybe_downcast_numeric
frompandas.core.dtypes.commonimport (
ensure_object,
is_datetime_or_timedelta_dtype,
is_decimal,
is_integer_dtype,
is_number,
is_numeric_dtype,
is_scalar,
needs_i8_conversion,
)
frompandas.core.dtypes.genericimport (
ABCIndex,
ABCSeries,
)
importpandasaspd
frompandas.core.arrays.numericimportNumericArray
defto_numeric(
arg,
errors: DateTimeErrorChoices=”raise”,
downcast: Literal[“integer”, “signed”, “unsigned”, “float”] |None=None,
):
“””
Convert argument to a numeric type.
The default return dtype is `float64` or `int64`
depending on the data supplied. Use the `downcast` parameter
to obtain other dtypes.
Please note that precision loss may occur if really large numbers
are passed in. Due to the internal limitations of `ndarray`, if
numbers smaller than `-9223372036854775808` (np.iinfo(np.int64).min)
or larger than `18446744073709551615` (np.iinfo(np.uint64).max) are
passed in, it is very likely they will be converted to float so that
they can stored in an `ndarray`. These warnings apply similarly to
`Series` since it internally leverages `ndarray`.
Parameters
———-
arg : scalar, list, tuple, 1-d array, or Series
Argument to be converted.
errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
– If ‘raise’, then invalid parsing will raise an exception.
– If ‘coerce’, then invalid parsing will be set as NaN.
– If ‘ignore’, then invalid parsing will return the input.
downcast : str, default None
Can be ‘integer’, ‘signed’, ‘unsigned’, or ‘float’.
If not None, and if the data has been successfully cast to a
numerical dtype (or if the data was numeric to begin with),
downcast that resulting data to the smallest numerical dtype
possible according to the following rules:
– ‘integer’ or ‘signed’: smallest signed int dtype (min.: np.int8)
– ‘unsigned’: smallest unsigned int dtype (min.: np.uint8)
– ‘float’: smallest float dtype (min.: np.float32)
As this behaviour is separate from the core conversion to
numeric values, any errors raised during the downcasting
will be surfaced regardless of the value of the ‘errors’ input.
In addition, downcasting will only occur if the size
of the resulting data’s dtype is strictly larger than
the dtype it is to be cast to, so if none of the dtypes
checked satisfy that specification, no downcasting will be
performed on the data.
Returns
——-
ret
Numeric if parsing succeeded.
Return type depends on input. Series if Series, otherwise ndarray.
See Also
——–
DataFrame.astype : Cast argument to a specified dtype.
to_datetime : Convert argument to datetime.
to_timedelta : Convert argument to timedelta.
numpy.ndarray.astype : Cast a numpy array to a specified type.
DataFrame.convert_dtypes : Convert dtypes.
Examples
——–
Take separate series and convert to numeric, coercing when told to
>>> s = pd.Series([‘1.0’, ‘2’, -3])
>>> pd.to_numeric(s)
0 1.0
1 2.0
2 -3.0
dtype: float64
>>> pd.to_numeric(s, downcast=’float’)
0 1.0
1 2.0
2 -3.0
dtype: float32
>>> pd.to_numeric(s, downcast=’signed’)
0 1
1 2
2 -3
dtype: int8
>>> s = pd.Series([‘apple’, ‘1.0’, ‘2’, -3])
>>> pd.to_numeric(s, errors=’ignore’)
0 apple
1 1.0
2 2
3 -3
dtype: object
>>> pd.to_numeric(s, errors=’coerce’)
0 NaN
1 1.0
2 2.0
3 -3.0
dtype: float64
Downcasting of nullable integer and floating dtypes is supported:
>>> s = pd.Series([1, 2, 3], dtype=”Int64″)
>>> pd.to_numeric(s, downcast=”integer”)
0 1
1 2
2 3
dtype: Int8
>>> s = pd.Series([1.0, 2.1, 3.0], dtype=”Float64″)
>>> pd.to_numeric(s, downcast=”float”)
0 1.0
1 2.1
2 3.0
dtype: Float32
“””
ifdowncastnotin (None, “integer”, “signed”, “unsigned”, “float”):
raiseValueError(“invalid downcasting method provided”)
iferrorsnotin (“ignore”, “raise”, “coerce”):
raiseValueError(“invalid error value specified”)
is_series=False
is_index=False
is_scalars=False
ifisinstance(arg, ABCSeries):
is_series=True
values=arg.values
elifisinstance(arg, ABCIndex):
is_index=True
ifneeds_i8_conversion(arg.dtype):
values=arg.asi8
else:
values=arg.values
elifisinstance(arg, (list, tuple)):
values=np.array(arg, dtype=”O”)
elifis_scalar(arg):
ifis_decimal(arg):
returnfloat(arg)
ifis_number(arg):
returnarg
is_scalars=True
values=np.array([arg], dtype=”O”)
elifgetattr(arg, “ndim”, 1) >1:
raiseTypeError(“arg must be a list, tuple, 1-d array, or Series”)
else:
values=arg
# GH33013: for IntegerArray & FloatingArray extract non-null values for casting
# save mask to reconstruct the full array after casting
mask: npt.NDArray[np.bool_] |None=None
ifisinstance(values, NumericArray):
mask=values._mask
values=values._data[~mask]
values_dtype=getattr(values, “dtype”, None)
ifis_numeric_dtype(values_dtype):
pass
elifis_datetime_or_timedelta_dtype(values_dtype):
values=values.view(np.int64)
else:
values=ensure_object(values)
coerce_numeric=errorsnotin (“ignore”, “raise”)
try:
values, _=lib.maybe_convert_numeric(
values, set(), coerce_numeric=coerce_numeric
)
except (ValueError, TypeError):
iferrors==”raise”:
raise
# attempt downcast only if the data has been successfully converted
# to a numerical dtype and if a downcast method has been specified
ifdowncastisnotNoneandis_numeric_dtype(values.dtype):
typecodes: str|None=None
ifdowncastin (“integer”, “signed”):
typecodes=np.typecodes[“Integer”]
elifdowncast==”unsigned”and (notlen(values) ornp.min(values) >=0):
typecodes=np.typecodes[“UnsignedInteger”]
elifdowncast==”float”:
typecodes=np.typecodes[“Float”]
# pandas support goes only to np.float32,
# as float dtypes smaller than that are
# extremely rare and not well supported
float_32_char=np.dtype(np.float32).char
float_32_ind=typecodes.index(float_32_char)
typecodes=typecodes[float_32_ind:]
iftypecodesisnotNone:
# from smallest to largest
fortypecodeintypecodes:
dtype=np.dtype(typecode)
ifdtype.itemsize<=values.dtype.itemsize:
values=maybe_downcast_numeric(values, dtype)
# successful conversion
ifvalues.dtype==dtype:
break
# GH33013: for IntegerArray & FloatingArray need to reconstruct masked array
ifmaskisnotNone:
data=np.zeros(mask.shape, dtype=values.dtype)
data[~mask] =values
frompandas.core.arraysimport (
FloatingArray,
IntegerArray,
)
klass=IntegerArrayifis_integer_dtype(data.dtype) elseFloatingArray
values=klass(data, mask.copy())
ifis_series:
returnarg._constructor(values, index=arg.index, name=arg.name)
elifis_index:
# because we want to coerce to numeric if possible,
# do not use _shallow_copy
returnpd.Index(values, name=arg.name)
elifis_scalars:
returnvalues[0]
else:
returnvalues
  • Copy lines
  • Copy permalink
  • View git blame
  • Reference in new issue

Go

Using errors=’raise.’

It will raise the error if it found any. See the following code.

import pandas as pd

ser = pd.Series([‘Eleven’, 11, 21, 19])
num = pd.to_numeric(ser, errors=’raise’)
print(num)

Output

Traceback (most recent call last):
File “pandas/_libs/lib.pyx”, line 1926, in pandas._libs.lib.maybe_convert_numeric
ValueError: Unable to parse string “Eleven”

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “app.py”, line 4, in
num = pd.to_numeric(ser, errors=’raise’)
File “/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/tools/numeric.py”, line 149, in to_numeric
values = lib.maybe_convert_numeric(
File “pandas/_libs/lib.pyx”, line 1963, in pandas._libs.lib.maybe_convert_numeric
ValueError: Unable to parse string “Eleven” at position 0

We get the ValueError: Unable to parse string “Eleven”.

If you pass the errors=’ignore’ then it will not throw an error. Let’s see this in the next session.

3. Convert Multiple Columns to Float

You can also convert multiple columns to float by sending dict of column name -> data type to astype() method. The below example converts column Fee  and Discount to float dtype.

# Convert multiple columns
df = df.astype({‘Fee’:’float’,’Discount’:’float’})

to_numeric (or to_datetime or to_timedelta)

There are a few better options available in pandas for converting one-dimensional data (i.e. one Series at a time). These methods provide better error correction than astype through the optional errors and downcast parameters. Take a look at how it can deal with the first Series created in this post. Using coerce for errors will turn any conversion errors into NaN. Passing in ignore will get the same behavior we had available in astype, returning our original input. Likewise, passing in raise will raise an exception.

>>> pd.to_numeric(s, errors=’coerce’)
0 1.0
1 NaN
2 2.0
dtype: float64

And if we want to save some space, we can safely downcast to the minimim size that will hold our data without errors (getting int16 instead of int64 if we didn’t downcast)

>>> pd.to_numeric(s4, downcast=’integer’)
0 22000
1 3
2 1
3 9
dtype: int16
>>> pd.to_numeric(s4).dtype
dtype(‘int64’)

The to_datetime and to_timedelta methods will behave similarly, but for dates and timedeltas.

>>> pd.to_numeric(s4).dtype
dtype(‘int64’)
>>> pd.to_timedelta([‘2 days’, ‘5 min’, ‘-3s’, ‘4M’, ‘1 parsec’], errors=’coerce’)
TimedeltaIndex([ ‘2 days 00:00:00’, ‘0 days 00:05:00’, ‘-1 days +23:59:57’,
‘0 days 00:04:00′, NaT],
dtype=’timedelta64[ns]’, freq=None)
>>> pd.to_datetime([’11/1/2020′, ‘Jan 4th 1919’, ‘20200930 08:00:31’])
DatetimeIndex([‘2020-11-01 00:00:00’, ‘1919-01-04 00:00:00’,
‘2020-09-30 08:00:31′],
dtype=’datetime64[ns]’, freq=None)

Since these functions are all for 1-dimensional data, you will need to use apply on a DataFrame. For instance, to downcast all the values to the smallest possible floating point size, use the downcast parameter.

>>> from functools import partial
>>> df.apply(partial(pd.to_numeric, downcast=’float’)).dtypes
a float32
b float32
c float32
dtype: object

Footer

© 2022 GitHub, Inc.

Conclusion

That’s all for now. These are the cases and examples for applying pandas to_numeric() function on pandas dataframe. I hope you have understood this tutorial. Even if you have any queries then you can contact us for more information.

Source:

Pandas Offical Documentation

  • Total2
  • Facebook
  • Twitter
  • Pinterest
  • LinkedIn
  • Buffer

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

We respect your privacy and take protecting it seriously

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Tags:
panda
pandas method

What’s your reaction?

Love1Happy2Sleepy0Wink0Share onShare on FacebookShare on TwitterShare on PinterestShare on WhatsAppShare on WhatsAppShare on LinkedinShare on Email
How to Improve Accuracy of Random Forest ? Tune Classifier In 7 StepsPrevious Article
Numpy datetime64 to datetime and Vice-Versa implementationNext Article

Метод astype() для преобразования одного типа в любой другой

Метод astype() позволяет нам четко указать тип, который мы хотим преобразовать. Мы можем перейти от одного типа данных к другому, передавая параметр внутри метода astype().

Рассмотрим следующий код:

# python 3.x
import pandas as pd
c = [[‘x’, ‘1.23’, ‘14.2’],
[‘y’, ’20’, ‘0.11’],
[‘z’, ‘3’, ’10’]] df = pd.DataFrame(
c,
columns=[‘first’, ‘second’, ‘third’])
print(df)
df[[‘second’, ‘third’]] =
df[[‘second’, ‘third’]].astype(float)
print(‘Converting………………’)
print(‘……………………….’)
print(df)

Вывод:

first second third
0 x 1.23 14.2
1 y 20 0.11
2 z 3 10
Converting………………
……………………….
first second third
0 x 1.23 14.20
1 y 20.00 0.11
2 z 3.00 10.00

2. pandas Convert String to Float

Use pandas DataFrame.astype() function to convert column from string/int to float, you can apply this on a specific column or on an entire DataFrame. To cast the data type to 54-bit signed float, you can use numpy.float64,numpy.float_, float, float64 as param. To cast to 32-bit signed float, use numpy.float32 or float32.

The Below example converts Fee column from string dtype to float64.

# Convert “Fee” from string to float
df = df.astype({‘Fee’:’float’})
print(df.dtypes)

Yields below output.

Fee float64
Discount object
dtype: object

You can also use Series.astype() to convert a specific column. since each column on DataFrame is pandas Series, I will get the column from DataFrame as Series and use astype() function. In the below example df.Fee or df[‘Fee’] returns Series object.

# Convert “Fee” from string to float
df[‘Fee’] = df[‘Fee’].astype(float)
print(df.dtypes)

Yields same output as above.

6. Handling Non-numeric Values

When you have some cells with character values on a column you wanted to convert to float, it returns an error. To ignore the error and convert the char values to NaN use errors=’coerce’ attribute.

# Convert each value of the column to a string
df[‘Discount’] = pd.to_numeric(df[‘Discount’], errors=’coerce’)
print(df.dtypes)

This yields the same output as above.

Indexing in pandas can be so confusing

There are so many ways to do the same thing! What is the difference between .loc, .iloc, .ix, and []?  You can read the official documentation but there’s so much of it and it seems so confusing. You can ask a question on Stack Overflow, but you’re just as likely to get too many different and confusing answers as no answer at all. And existing answers don’t fit your scenario.
You just need to get started with the basics.
What if you could quickly learn the basics of indexing and selecting data in pandas with clear examples and instructions on why and when you should use each one? What if the examples were all consistent, used realistic data, and included extra relevant background information?
Master the basics of pandas indexing with my free ebook. You’ll learn what you need to get comfortable with pandas indexing. Covered topics include:

  • what an index is and why it is needed
  • how to select data in both a Series and DataFrame.
  • the difference between .loc, .iloc, .ix, and [] and when (and if) you should use them.
  • slicing, and how pandas slicing compares to regular Python slicing
  • boolean indexing
  • selecting via callable
  • how to use where and mask.
  • how to use query, and how it can help performance
  • time series indexing

Because it’s highly focused, you’ll learn the basics of indexing and be able to fall back on this knowledge time and again as you use other features in pandas.
Just give me your email and you’ll get the free 57 page e-book, along with helpful articles about Python, pandas, and related technologies once or twice a month. Unsubscribe at any time.

Invalid email addressI promise not to spam you, and you can unsubscribe at any time.Thanks for subscribing!>>> s4 = pd.Series([22000, 3, 1, 9])
>>>s4.memory_usage()
160
>>> s4.astype(‘int8’).memory_usage()
132

But note there is an error above! astype will happily convert numbers that don’t fit in the new type without reporting the error to you.

>>> s4.astype(‘int8’)
0 -16
1 3
2 1
3 9
dtype: int8

Note that you can also use astype on DataFrames, even specifying different values for each column

>>> df = pd.DataFrame({‘a’: [1,2,3.3, 4], ‘b’: [4, 5, 2, 3], ‘c’: [“4”, 5.5, “7.09”, 1]})
>>> df.astype(‘float’)
a b c
0 1.0 4.0 4.00
1 2.0 5.0 5.50
2 3.3 2.0 7.09
3 4.0 3.0 1.00
>>> df.astype({‘a’: ‘uint’, ‘b’: ‘float16’})
a b c
0 1 4.0 4
1 2 5.0 5.5
2 3 2.0 7.09
3 4 3.0 1

4. Convert All Columns to Float Type

By default astype() function converts all columns to the same type. The below example converts all DataFrame columns to float type. If you have any column with alpha-numeric values, you will get an error.

# Convert entire DataFrame string to float
df = df.astype(float)
print(df.dtypes)

Yields below output.

Fee float64
Discount float64
dtype: object

7. Replace the ‘NaN’ Values with Zeros

Use df=df.replace(np.nan,0,regex=True) function to replace the ‘NaN’ values with ‘0’ values.

# Using df.replace() to replace nan values 0
df[‘Discount’] = pd.to_numeric(df[‘Discount’], errors=’coerce’)
df = df.replace(np.nan, 0, regex=True)
print(df)
print(df.dtypes)

Yields below output.

Fee Discount
0 22000.30 1000.1
1 25000.40 0.0
2 23000.20 1000.5
3 24000.50 0.0
4 26000.10 2500.2
Fee object
Discount float64
dtype: object

But what types?

The data type can be a core NumPy datatype, which means it could be a numerical type, or Python object. But the type can also be a pandas extension type, known as an ExtensionDType. Without getting into too much detail, just know two very common examples are the CategoricalDType, and in pandas 1.0+, the StringDType. For now, what’s important to remember is that all elements in a Series share the same type.

What’s important to realize is that when constructiong a Series or a DataFrame, pandas will pick the datatype that can represent all values in the Series (or DataFrame). Let’s look at an example to make this more clear. Note, this example was run using pandas version 1.1.4.

>>> import pandas as pd
>>> s = pd.Series([1.0, ‘N/A’, 2])
>>> s
0 1
1 N/A
2 2
dtype: object

As you can see, pandas has chosen the object type for my Series since it can represent values that are floating point numbers, strings, and integers. The individual items in this Series are all of a different type in this case, but can be represented as objects.

>>> print(type(s[0]))

>>> print(type(s[1]))

>>> print(type(s[2]))

8. Replace Empty String before Convert

If you have empty values in a string, convert empty string (”) with np.nan before converting it to float.

import pandas as pd
import numpy as np
technologies= ({
‘Fee’ :[‘22000.30′,’25000.40′,’23000.20′,’24000.50′,’26000.10′,’21000’],
‘Discount’:[‘1000.10′,np.nan,””,np.nan,’2500.20’,””] })
df = pd.DataFrame(technologies)
# Replace empty string (”) with np.nan
df[‘Discount’]=df.Discount.replace(”,np.nan).astype(float)
print(df)
print(df.dtypes)

Yields below output.

Fee Discount
0 22000.30 1000.1
1 25000.40 NaN
2 23000.20 NaN
3 24000.50 NaN
4 26000.10 2500.2
5 21000 NaN
Fee object
Discount float64
dtype: object

Using errors=’ignore’

It will ignore all non-numeric values.

import pandas as pd

ser = pd.Series([‘Eleven’, 11, 21, 19])
num = pd.to_numeric(ser, errors=’ignore’)
print(num)

In this example, we have created a series with one string and other numeric numbers.

So, if we add error=’ignore’ then you will not get any error because you are explicitly defining that please ignore all the errors while converting to numeric values.

See the output.

0 Eleven
1 11
2 21
3 19
dtype: object

We did not get any error due to the error=ignore argument.

infer_objects

If you happend to have a pandas object that consists of objects that haven’t been converted yet, both Series and DataFrame have a method that will attempt to convert those objects to the most sensible type. To see this, you have to do a sort of contrived example, because pandas will attempt to convert objects when you create them. For example:

>>> pd.Series([1, 2, 3, 4], dtype=’object’).infer_objects().dtype
int64
>>> pd.Series([1, 2, 3, ‘4’], dtype=’object’).infer_objects().dtype
object
>>>pd.Series([1, 2, 3, 4]).dtype
int64

You can see here that if the Series happens to have all numerical types (in this case integers) but they are stored as objects, it can figure out how to convert these to integers. But it doesn’t know how to convert the ‘4’ to an integer. For that, you need to use one of the techniques from above.

Other Examples

Suppose you have a numeric value written as a string. And if you apply a method that only accepts numerical values then you will get “valueerror”. To remove it you have to first convert the string value to numeric. And it can be done using the pd.to_numeric() method. Just run the line of code.

import pandas as pd
data = {“Date”:[“12/11/2020″,”13/11/2020″,”14/11/2020″,”15/11/2020”],
“Open”:[1,2,3,4],”Close”:[“5″,6,”7″,8],”Volume”:[100,200,300,400]}
df = pd.DataFrame(data=data)
df

Output


Sample Dataframe with the Numerical Value as String

In the above code 5 and 7 is a string in the column Close. If I will apply to_numeric() method on df[“Close”], then I will get the following output.

pd.to_numeric(df[“Close”])

Output


Applying to_numeric method on Column with Numeric Value as String

You can see the dtype is of “int64 for each value of the Close column.


pd to_numeric implementation

So, what’s the problem?

The problem with using object for everything is that you rarely want to work with your data this way. Looking at this first example, if you had imported this data from a text file you’d most likely want it to be treated as numerical, and perhaps calculate some statistical values from it.

>>> try:
… s.mean()
… except Exception as ex:
… print(ex)

unsupported operand type(s) for +: ‘float’ and ‘str’

It’s clear here that the mean function fails because it’s trying to add up the values in the Series and cannot add the ‘N/A’ to the running sum of values.

So how do we fix this?

Well, we could inspect the values and convert them by hand or using some other logic, but luckily pandas gives us a few options to do this in a sensible way. Let’s go through them all.

Источники

  • https://github.com/pandas-dev/pandas/blob/main/pandas/core/tools/numeric.py
  • https://appdividend.com/2020/06/18/pandas-to_numeric-function-in-python-example/
  • https://sparkbyexamples.com/pandas/pandas-convert-string-to-float-type-dataframe/
  • https://www.wrighters.io/converting-types-in-pandas/
  • https://www.datasciencelearner.com/pd-to_numeric-method-pandas-dataframe/
  • https://www.delftstack.com/ru/howto/python-pandas/how-to-change-data-type-of-columns-in-pandas/
[свернуть]
Решите Вашу проблему!


×
Adblock
detector