博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Huffman Codes
阅读量:6944 次
发布时间:2019-06-27

本文共 5148 字,大约阅读时间需要 17 分钟。

Huffman Codes (30)

In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:

Each input file contains one test case. For each case, the first line gives an integer N (2 <= N <= 63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:

c[1] f[1] c[2] f[2] ... c[N] f[N]

where c[i] is a character chosen from {'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (<=1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:

c[i] code[i]

where c[i] is the i-th character and code[i] is a string of '0's and '1's.

Output Specification:

For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.

Sample Input:
7A 1 B 1 C 1 D 3 E 3 F 6 G 64A 00000B 00001C 0001D 001E 01F 10G 11A 01010B 01011C 0100D 011E 10F 11G 00A 000B 001C 010D 011E 100F 101G 110A 00000B 00001C 0001D 001E 00F 10G 11
Sample Output:
YesYesNoNo
#include 
#include
#include
#include
#include
#include
using namespace::std;struct HuffmanCodesNode { bool isLeaf; int frequence; char c; int d; HuffmanCodesNode *Child[2]; HuffmanCodesNode(char ch,int f){ Child[0]=NULL; Child[1]=NULL; c=ch; frequence=f; isLeaf=true; d=0; } HuffmanCodesNode(HuffmanCodesNode *h1=NULL,HuffmanCodesNode *h2=NULL):isLeaf(false),frequence(0),c(0),d(0){ Child[0]=h1; Child[1]=h2; } void clear(){ if(Child[0]!=NULL){Child[0]->clear();delete Child[0];} if(Child[1]!=NULL){Child[1]->clear();delete Child[1];} } int computeWPL(int depth,unsigned int &WPL){ if(Child[0]==NULL && Child[1]==NULL)WPL+=depth*this->frequence; else{ Child[0]->computeWPL(depth+1, WPL); Child[1]->computeWPL(depth+1, WPL); } return 0; } friend bool operator<(HuffmanCodesNode a,HuffmanCodesNode b){ return a.frequence>b.frequence; }};unsigned int computeWPLwithFrequencyMap(map
&f){ priority_queue
q; HuffmanCodesNode *p1,*p2; unsigned int WPL=0; for (map
::iterator iter=f.begin(); iter!=f.end(); ++iter) { HuffmanCodesNode temp(iter->first, iter->second); q.push(temp); } while (q.size()>1) { p1=new HuffmanCodesNode(q.top()); q.pop(); p2=new HuffmanCodesNode(q.top()); q.pop(); HuffmanCodesNode temp(p1,p2); temp.frequence=p1->frequence+p2->frequence; q.push(temp); } p1=new HuffmanCodesNode(q.top()); p1->computeWPL(0, WPL); return WPL;}bool compare(const string& a ,const string& b){ if (a.length() < b.length()) { return true; }else if(a.length() == b.length() && a < b){ return true; } return false;}bool checkPrefix(vector
& s){ sort(s.begin(), s.end(),compare); HuffmanCodesNode *root = new HuffmanCodesNode,*p=root; int t; bool check=false; for (vector
::iterator i=s.begin();i!=s.end();++i){ p=root; for (string::iterator iter = i->begin(); iter!=i->end(); iter++) { t=*iter-'0'; if(p->Child[t]==NULL)p->Child[t]=new HuffmanCodesNode; p=p->Child[t]; if(p->isLeaf){root->clear();return false;} } if(p->Child[0]!=NULL || p->Child[1]!=NULL)check=true; if(check){root->clear();return false;} p->isLeaf = true; } root->clear(); if (!check)return true; else return false;}int main(int argc,const char* argv[]){ ios::sync_with_stdio(false); int N;cin>>N; char c;int frequence; string s; map
f; for (int i=0; i
>c; cin>>frequence; f[c]=frequence; } unsigned int WPL=computeWPLwithFrequencyMap(f); int M;cin>>M; for (int i=0; i
t; for (int j=0; j
>c; cin>>s; t_WPL+=f[c]*s.size(); t.push_back(s); } if ( t_WPL == WPL && checkPrefix(t)){ cout<<"Yes"<

Huffman code的特征有两点

1. WPL(带权路径长度)最小,这个我们可以构建一个Huffman树来计算。

2. 每个编码唯一不具二义性,也就是每个编码都不会是另一个编码的前缀。

我的代码思路是这样

我们首先把输入的权值保存在一个map里面,然后交给

unsigned int computeWPLwithFrequencyMap(map
&f)

来计算出WPL,函数是通过建立一个Huffman树然后遍历得到WPL值的,应该有更好的方法。

之后用得到的WPL来和每个同学的WPL比较,比我们大的肯定不是Huffman编码了。

之后检测是否有二义性有两个办法(我目前想到两个):

  1.按编码长度排序(升序),穷举每个编码是否是后面编码的子串,也就是字符串比较,可以通过kmp比较。

  2.建立trie树,看看每个编码的路径中是否有其他的编码。

我用的是办法2,我先用sort()函数进行了按编码长度的升序排列(偷懒了,也可以不用排序),然后依次建立trie,

把编码最后一位所在的节点进行标记。按照Huffman编码,这个肯定是叶节点,如果以后有编码路径经过它,

那这个肯定不是Huffman编码了。

 

 

 

 

转载于:https://www.cnblogs.com/weierpeng/p/4392536.html

你可能感兴趣的文章
flag标志寄存器
查看>>
Android Notification与Toast(一)
查看>>
搜索引擎lucene
查看>>
UVA11991
查看>>
linux:php配置文件php.ini详解
查看>>
有用和有趣的产品秤砣
查看>>
PhotoShopCS5
查看>>
EVENT 10051:"trace OPI calls"
查看>>
利用Oracle在线重定义Online Redefinition清理历史数据
查看>>
A.3-C# 面向对象编程
查看>>
Linux下高性能网络编程中的几个TCP/IP选项
查看>>
HDU2049:不容易系列之(4)——考新郎
查看>>
mySQL优化, my.ini 配置说明
查看>>
函数、返回-Sql Server常用函数之统计、算数、字符串函数-by小雨
查看>>
windows调试器之Visual C++
查看>>
如何删除存在多个重复记录中的一个
查看>>
Android开发之旅:环境搭建及HelloWorld
查看>>
请求浏览器使用chrome查看http请求
查看>>
数据数据包JAVASE----17----网络编程
查看>>
OC基础知识
查看>>