数组去重: 超高速字符串去重 (含保留原顺序的方法)

859 字

4 分钟

数组去重: 超高速字符串去重 (含保留原顺序的方法)

2020-06-27

笔记

算法

/

字符串

/

列表

/

csharp

1. 你需要知道这些#

代码适用于:#

将字符串数组中的重复元素去除, 仅留下一个

应用场景:#

你有一个超级长的文本文件, 这里面每一行都是一条数据, 例如这些数据是用爬虫获取的搜索关键字, 但可能含有重复, 你现在需要将它们去重. 这个算法可以帮助你快速完成去重的工作

局限性:#

下面将介绍两种算法
一种是先进行排序, 这时, 相同的元素都在一起了, 然后再进行一次遍历去重, 那么除了排序的时间, 仅进行一次遍历就可以去重, 速度很快, 但原有顺序改变了.
第二种算法是较为特殊的算法, 借助了引用类型这个特性, 可以保证原有顺序不变, 但需要定义字典和列表, 也就是说要多需要一些内存.

2. 算法的主要内容#

主要原理#

通过排序, 将相同的元素凑到一起, 那么只需要比对当前元素和一个相邻元素就可以得出该元素是否是重复的. 那么只需要遍历一次就可以做到去重.
在此基础上, 用Dictionary的根据键可以快速访问值得特性, 将原字符串数组的元素索引与元素引用保存到Dictionary中, 就可以通过索引将字符串数组还原到原来的顺序, 详细还请看代码

实例代码#

1
string[] RemoveSameElement(string[] source)
2
{
3
    List<string> result = source.ToList();
4
    result.Sort();
5
    for(int i = 1; i < source.Length;)
6
    {
7
        if (result[i] == result[i - 1])
8
        {
9
            result.RemoveAt(i);
10
        }
11
        else
12
        {
13
            i++;
14
        }
15
    }
16
    return result.ToArray();
17
}

1
string[] RemoveSameElement(string[] source)
2
{
3
    Dictionary<int, StringInfo> d = new Dictionary<int, StringInfo>();
4
    List<StringInfo> temp = new List<StringInfo>();
5
    List<string> result = new List<string>();
6
    for (int i = 0; i < source.Length; i++)
7
    {
8
        d.Add(i, new StringInfo(source[i]));
9
        temp.Add(d[i]);
10
    }
11
    temp.Sort();
12
    for (int i = 0; i < temp.Count;)
13
    {
14
        if (temp[i] != null)
15
        {
16
            temp.RemoveAt(i);
17
        }
18
        else
19
        {
20
            i++;
21
        }
22
    }
23
    foreach(StringInfo i in temp)
24
    {
25
        result.Add(i.Value);
26
    }
27
    return result.ToArray();
28
}
29
// 通过这个StringInfo类来实现对string的引用
30
class StringInfo : IComparable
31
{
32
    public StringInfo(string value)
33
    {
34
        Value = value;
35
    }
36
    public string Value;
37
    public int CompareTo(object obj)
38
    {
39
        if (obj.GetType() == GetType())
40
        {
41
            return Value.CompareTo((obj as StringInfo).Value);
42
        }
43
        else
44
        {
45
            throw new ArgumentException("参数必须为StringInfo类型");
46
        }
47
    }
48
}