Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22873

Andre Roberge: pandas' SettingWithCopyWarning: did I get it right?

$
0
0

 I am just beginning to learn pandas and am looking to provide some automated help. From what I read, it appears that SettingWithCopyWarning is something that confuse many people. Is the following correct?

In [2]:
df=pd.DataFrame([[10,20,30],[40,50.,60]],
index=list("ab"),
columns=list("xyz"))
In [3]:
df.loc["b"]["x"]=99
`SettingWithCopyWarning`: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [4]:
# What is SettingWithCopyWarning ?
what()
Pandas occasionally emits a SettingWithCopyWarning when you use       
'chained indexing', either directly or indirectly,and you then attempt
to assign a value to the result. By 'direct chained indexing', we mean
that your code contains something like:

...[index_1][index_2]=...

During the first extraction using [index_1], pandas found that the
series to be created contained values of different types. It
automatically created a new series converting all values to a common
type. The second indexing, [index_2] was then done a this copy instead
of the original dataframe. Thus, the assigment was not done on the
original dataframe, which caused Pandas to emit this warning.

An 'indirect chained indexing' essentially amount to the same problem
except that the second indexing is not done on the same line as that
which was done to extract the first series.
In [5]:
# Can I get more specific information for what I just did?
why()
You used direct chained indexing of a dataframe which made a copy of  
the original content of the dataframe. If you try to assign a value to
that copy, the original dataframe will not be modified. Instead of
doing a direct chained indexing

df.loc["b"]["x"]...

try:

df.loc["b","x"]...
In [6]:
# What about if I tried to use indirect chaining. 
# There are two possibilities
series=df.loc["b"]
series["x"]=99
`SettingWithCopyWarning`: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [7]:
where()
Warning issued on line 4 of code block [6].

1| # What about if I tried to use indirect chaining.
2| # There are two possibilities
3| series = df.loc["b"]
> 4| series["x"] = 99
In [8]:
why()
I suspect that you used indirect chained indexing of a dataframe.     
First, you likely created a series using something like:

series=df.loc[...]

This made a copy of the data contained in the dataframe. Next, you
indexed that copy

series["x"]

This had no effect on the original dataframe. If your goal is to
modify the value of the original dataframe, try something like the
following instead:

df.loc[...,"x"]
In [9]:
# What if I do things in a different order
series_1=df["x"]
series_1.loc["b"]=99
`SettingWithCopyWarning`: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [10]:
where()
Warning issued on line 3 of code block [9].

1| # What if I do things in a different order
2| series_1 = df["x"]
> 3| series_1.loc["b"] = 99
In [11]:
why()
I suspect that you used indirect chained indexing of a dataframe.     
First, you likely created a series using something like:

series_1=df[...]

This made a copy of the data contained in the dataframe. Next, you
indexed that copy

series_1.loc["b"]

This had no effect on the original dataframe. If your goal is to
modify the value of the original dataframe, try something like the
following instead:

df.loc[...,"b"]
In [12]:
# What if I had multiples data frames?
df2=df.copy()
series=df.loc["b"]
series["x"]=99
`SettingWithCopyWarning`: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
In [13]:
where()
Warning issued on line 4 of code block [12].

2| df2 = df.copy()
3| series = df.loc["b"]
> 4| series["x"] = 99
In [14]:
why()
In your code, you have the following dataframes: {'df2','df'}. I do  
not know which one is causing the problem here; I will use the name
df2 as an example.

I suspect that you used indirect chained indexing of a dataframe.
First, you likely created a series using something like:

series=df2.loc[...]

This made a copy of the data contained in the dataframe. Next, you
indexed that copy

series["x"]

This had no effect on the original dataframe. If your goal is to
modify the value of the original dataframe, try something like the
following instead:

df2.loc[...,"x"]

Viewing all articles
Browse latest Browse all 22873

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>