pandas中重复索引的处理

而且警告显示，未来版本将不再支持给有重复值的索引调用reindex重新索引。取出索引不重复的值就好了。这个方法同样对Series起作用，那就很方便了。如果index的值重复，那么我调用reindex重新索引时就会报异常。1. pandas重新索引reindex方法的一个问题，代码如下。3.接上个话题，那么怎么删除索引重复的数据呢，今天查到了一下方法。

羸弱的穷酸书生

6373人浏览 · 2022-09-09 17:20:28

羸弱的穷酸书生 · 2022-09-09 17:20:28 发布

1. pandas重新索引reindex方法的一个问题，代码如下

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.DataFrame([[1,2],[1,2],[1,2]],columns=['a','a'])
df = df.T
df.reindex(index=['a','b'])
df

如果index的值重复，那么我调用reindex重新索引时就会报异常

C:\Users\gw00305123\AppData\Local\Temp\ipykernel_8908\2679121137.py:7: FutureWarning: reindexing with a non-unique Index is deprecated and will raise in a future version.
  df.reindex(index=['a','b'])
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_8908\2679121137.py in <cell line: 7>()
      5 df = pd.DataFrame([[1,2],[1,2],[1,2]],columns=['a','a'])
      6 df = df.T
----> 7 df.reindex(index=['a','b'])
      8 df

c:\Users\gw00305123\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    322         @wraps(func)
    323         def wrapper(*args, **kwargs) -> Callable[..., Any]:
--> 324             return func(*args, **kwargs)
    325 
    326         kind = inspect.Parameter.POSITIONAL_OR_KEYWORD

c:\Users\gw00305123\Anaconda3\lib\site-packages\pandas\core\frame.py in reindex(self, *args, **kwargs)
   4802         kwargs.pop("axis", None)
   4803         kwargs.pop("labels", None)
-> 4804         return super().reindex(**kwargs)
   4805 
   4806     @deprecate_nonkeyword_arguments(version=None, allowed_args=["self", "labels"])

c:\Users\gw00305123\Anaconda3\lib\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
   4964 
   4965         # perform the reindex on the axes
...
-> 4107             raise ValueError("cannot reindex on an axis with duplicate labels")
   4108 
   4109     def reindex(

ValueError: cannot reindex on an axis with duplicate labels

而且警告显示，未来版本将不再支持给有重复值的索引调用reindex重新索引。

3.pandas在concat的时候有时也会报这个错，示例代码

df1 = pd.DataFrame({
    'a':np.arange(10)
},index=[1,1,2,3,4,5,6,7,8,9])
df2 = pd.DataFrame({
    'b':np.arange(10)
},index=[0,1,2,3,4,5,6,7,8,9])

df = pd.concat([df1, df2], axis=1)
df

这段代码也会报错，但是当两边的df都有重复索引并且相同时，则不会报错，比如下面这样

df1 = pd.DataFrame({
    'a':np.arange(10)
},index=[1,1,2,3,4,5,6,7,8,9])
df2 = pd.DataFrame({
    'b':np.arange(10)
},index=[1,1,2,3,4,5,6,7,8,9])

df = pd.concat([df1, df2], axis=1)
df

结果

我们看到结果的索引还是保留原样，我猜是两个df的index如果能对上，则后台不会调用reindex方法，所以没报错，如果调redindex，应该还是会报错。

4.接上个话题，那么怎么删除索引重复的数据呢，今天查到了一下方法

df = df[~df.index.duplicated()]

取出索引不重复的值就好了。这个方法同样对Series起作用，很方便。

华为云开发者联盟

为开发者提供学习成长、分享交流、生态实践、资源工具等服务，帮助开发者快速成长。

更多推荐

通过HPA+CronHPA组合应对业务复杂弹性伸缩场景

华为云开发者联盟

FT-FMEA融合混沌演练，零售运营系统韧性架构在线验证实践

华为云开发者联盟

如何使用Python和Plotly绘制3D图形

华为云开发者联盟

所有评论(0)

查看更多评论

羸弱的穷酸书生

@zy1620454507

已为社区贡献2条内容