Sunday, November 2, 2014

nine algorithms that changed the future 筆記

1. search engine indexing

  • indexing, matching and ranking
  • word-location trick
  • NEAR
  • the meta word trick
  • in TITLE
2. PageRank

  • the hyperlink trick
  • the authority trick
  • the random surfer trick
  • web spam, link-based ranking algorithm, use mathematical with less computational expense
3. public key cryptography

  •  Encrypted with a shared secret
  • Establish shared secret in public
  • Paint-mixing trick; public/private color, public-private mixture, shared secret color
  • In real life, mixing: discrete exponential , unmix: discrete logarithm , diffuse-Hellman, (base, clock size)
  • Prime number clock size
  • The base must be primitive root of clock size, power of bade eventuallyh cycle through every possible value on the clock


4. Error correction 
Correct
Repetition trick
Redundancy trick; 7-4 hamming code(first error correcting code) correct 1 error in each 7 digits
Detect
Simple Checksum trick, staircase checksum trick

Detect and correct
Pinpoint trick; two-dimensional parity 

In practice 
Reed-Solomon code
Ethernet CRC32
Md5, 40 digits
sha1( all cryptographic function, protect against malicious alteration) 50digits
Sha256, 75digits
sha512, 150digits

Low density parity check(satellite tv)


5. Pattern Recognition:
[...]

6. Data compression:
lossless: run length encoding, same-as-earlier trick, shorter-symbol trick, huffman
lossy: leave-it-out trick, jpeg(divide by chuck)
LZ77, Shannon-Fano coding, Huffman is Fano's student at MIT


7. database
write-ahead logging (to-do list trick), each entry in the log is "idempotent", can be executed several times.

Replica, replicated databases keeps all copies of the database in sync all the time. Different than "backup".

rollback, we might need it when deadlock.
two-phase commit( prepare-then-commit trick)
virtual table (join)

All is to make sure consistency.

8. Digital signature
lockbox & padlock, a verifiable signature
=> in digital:
keys: numbers
act of locking/unlocking: multiplication in clock arithmetic (in fact, exponentiation)

(multiplicative padlock trick)
large message can applied a transformation called "cryptographic hash function" to reduce the size/length of the message.

locked value => Euclid's algorithm => key value:
Because integer factorization is hard. (quantum computing, probability)
In practice, verification of downloaded content. CA(trusted back): certificated authority

chain of digital certificates (inspected by your web browser)

9. What's computable?
Alan turing.
Halting problem: can't write a program output yes if the program will be terminated.
Undecidability

maybe you'll be musing to yourself one day, "Now wouldn't it be nice if my computer could do this for me" only to realize that it's an impossibility, because you wished-for task can be prove undecidable using the same method as for our crash-finding program

-End-

"knowledge of a programming language is essential for computer scientists, but it is merely a prerequisite: the main challenge is toe invent, adapt, and understand algorithms."

Sunday, June 15, 2014

A random walk down wall street

A practical guide for random walkers, a life cycle guide to investing

3. Dollar-Cost averaging can reduce the risks of investing in stocks and bonds.
Dollar-Cost averaging simply means investing the same fixed amount of money in at regular intervals over a long period of time.

Ex 每年投資$1000,在不穩定的市場跟穩定上漲的市場比較圖

If you expect to be a net saver during the next five years, should you hope for a higher or lower stock market during that period?

Many investor are elated when stock prices rise and depressed when they fall. This reaction makes no sense. Only those who will be sellers of equities in the near future should be happy at seeing stocks rise. Prospective purchasers should much prefer singing prices.

Ex 最初放$500, 之後每個月$100投資在Vanguard 500 Index mutual fund. $39,000 -> $265,000.

If possible, keep a small reserve (in a money fund) to take advantage of market declines and buy a few extra shares if the market is down sharply...
For the stock market as a whole, Newton's law has always worked in reverse: What goes down has come back up.


For those in their twenties, a very aggressive investment pro folio is recommended. At this stage, there is lots of time to ride out the peaks and valleys of investment cycles.





Tuesday, June 10, 2014

Coders at work 2

Douglas Crockford(Yahoo!):

"每次開會都會讓一些人閱讀他們各自的程式碼,...
如果你讓能力很強的同事閱讀程式碼,那麼他們周圍的新手們就會學到很多東西,而這一切是無法透過其他方式獲得的; 如果新手來閱讀程式碼,那他會得到很多極具價值的建議。
這需要給予團隊成員充分的信任,如果團隊不和睦,那就別指望這麼做了"

在招聘程式師時,如何識別優秀人才?
"我會讓應聘者帶來他們寫過的優秀程式碼並帶領我們閱讀這些程式碼。
我想考察這些人的展示能力,我想看看他們引以為傲的東西,我想知道這些程式碼是不是他們自己寫的。"

"好好學習寫作和閱讀"


Brendan Eich(Mozilla):

yacc, parser generator, context-free grammar

Thursday, May 15, 2014

Coders at work 1

Fitzpatrick 提到他為LiveJournal/Google招聘程式設計師時提到

"我總是找這種型別的人,他們會做很多別人沒要求他做的事,不僅是學校的專案或者前僱主要求他做的,他們對某些事情充滿熱情,有額外的專案。”

”HR的經驗中有這麼一條--- 一些人只會做讓他們做的事,沒有那種追求極致的熱情。他們會說:做好了,接下來幹什麼? 他們甚至不告訴你,自己就上網玩去了。“

他們在面試你時問了什麼? “任意長度的十進位數字字串相乘”
many solutions!
#!/usr/bin/env python
 
def digits(x):
    return [int(c) for c in str(x)]
 
def mult_table(xs, ys):
    return [[x * y for x in xs] for y in ys]
 
def polymul(xs, ys):
    return map(lambda *vs: sum(filter(None, vs)),
               *[[0] * i + zs for i, zs in enumerate(mult_table(xs, ys))])
 
def longmult(x, y):
    result = 0
    for v in polymul(digits(x), digits(y)):
        result = result * 10 + v
    return result
 
if __name__ == "__main__":
    print longmult(2**64, 2**64)

“Callgrind"
Profile你的程式,哪些function用去了最多的instruction
670,486  src/lcthw/bstrlib.c:bstrcmp [tests/bstree_tests]
  194,377  src/lcthw/bstree.c:BSTree_get [tests/bstree_tests]
   65,580  src/lcthw/bstree.c:default_compare [tests/bstree_tests]
   16,338  src/lcthw/bstree.c:BSTree_delete [tests/bstree_tests]
   13,000  src/lcthw/bstrlib.c:bformat [tests/bstree_tests]
   11,000  src/lcthw/bstrlib.c:bfromcstralloc [tests/bstree_tests]
    7,774  src/lcthw/bstree.c:BSTree_set [tests/bstree_tests]
    5,800  src/lcthw/bstrlib.c:bdestroy [tests/bstree_tests]
    2,323  src/lcthw/bstree.c:BSTreeNode_create [tests/bstree_tests]
    1,183  /private/tmp/pkg-build/coregrind//vg_preloaded.c:vg_cleanup_env [/usr/local/lib/valgrind/vgpreload_core-amd64-darwin.so]
,甚至清楚地指出哪一行的程式碼呢!
2,453  static inline BSTreeNode *BSTree_getnode(BSTree *map, BSTreeNode *node, void *key)
     .  {
61,853      int cmp = map->compare(node->key, key);
663,908  => src/lcthw/bstree.c:default_compare (14850x)
     .
14,850      if(cmp == 0) {
     .          return node;
24,794      } else if(cmp < 0) {
30,623          if(node->left) {
     .              return BSTree_getnode(map, node->left, key);
     .          } else {
     .              return NULL;
     .          }
     .      } else {
13,146          if(node->right) {
     .              return BSTree_getnode(map, node->right, key);
     .          } else {
     .              return NULL;
     .          }
     .      }
     .  }
”我覺得有些人只是在工作,而不是享受城市設計帶來的樂趣,這沒什麼問題。但把他們和那些核心程式師做比較就不行了,當一個人花的時間多十倍,不停地考慮怎麼寫好這程式,另一個人只為了工作而工作,那兩者的生產力上的差距又豈止十倍呢?“
你是因為覺得有趣才設計程式的? “是的,我還是會繼續做些小玩意,我的手機上有個無聊的棋類遊戲,我對她有點厭煩了,所以就寫了一個求解程式,其中試用了一些動態程式設計,處理不同的棋盤大小,還有很多隨機棋盤,解決棋局所需步數的長條圖。我把它發給作者..這是剪很有意思的事情,我想我可以退休後整天做這些事。”
src: 
http://c.learncodethehardway.org/book/ex41.html
http://rosettacode.org/wiki/Long_multiplication#Python