Categorized Abstractions

My experience learning Haskell

2019-05-03 :: haskell, functionalprogramming

tl;dr Haskell will help you in other languages even if you don’t use it.

I’ve always been fascinated by the Haskell code I’ve seen [1], and I tried learning Haskell briefly in 2010 and when configuring Xmonad. In 2015, I started learning Haskell seriously by writing a significant piece of software, a Scheme interpreter, with the help of a tutorial. After few months in, I found that my style of programming in Python has improved somewhat. I discovered some patterns where functions returning functions greatly reduce duplication. In fact, functions returning functions are one type of the decorator syntax. Also, separating side-effect producing functions from pure functions was a good practice - it lead to less bugs and more testable code. More testable because of the lesser reliance on an external environment. Now with type hints in Python 3, one can also benefit from defining types and using algebraic data types NewType. Python’s type hints are nowhere near as mature as Haskell’s type system though, and their optional nature in Python changes the experience of using them.

[1] I should note that some of the very succinct and elegant examples are not accurate, such as the Sieve of Eratosthenes two-liner:

1 2	primes = sieve [2..] sieve (p : xs) = p : sieve [x \| x <− xs, x `mod` p > 0]

How is Haskell different from lisp/racket/clojure?

Haskell belongs to family of functional languages that are statically typed with Hindley–Milner type system.

Besides the ambiguity of calling a language functional. (Is Javascript functional?)

A couple of differences from the classic functional languages:

A superficial difference: Haskell isn’t littered with parenthesis or the s-expression format
Haskell is strongly and statically typed, with type inference. The Hinley-Milner type system is arguably one of the most advanced type systems in any language. A monad is a typeclass.
It’s purely functional, meaning functions always return the same value, functions can’t produce side effects, and variables are immutable. Some languages related to Haskell like SML, OCaml and F# achieve different degrees of these.

>>>>>> b338feaa4ad3eebe43f972fa8966a948239fd22e

Don’t fear the monad

One major feature of Haskell is its monads. There’s a joke that once one understands the monad, you loose the ability to explain it to others.

It is based on relatively simple concepts.

My view is that it’s a typeclass that can be used to chain function calls, and encapsulate other values. There’s also some syntactic sugar around monad chaining built into Haskell’s syntax, like the do notation and list comprehension.

It’s a pretty big deal - it allows most parts of Haskell to remain purely functional while allowing things like IO. It does this by deferring the read/write actions to the runtime. It’s more subtle than this, but the idea is that Haskell code that manipulates instructions to perform IO can be purely functional.

Some misconceptions and myths

There are no runtime errors. Runtime errors are still possible. For example, the error function that can typematch to any function and simply exits.
You need to be a genius with a Computer Science PhD to learn/use Haskell. It’s not true. The core language is fairly simple. What’s hard about learning Haskell are all the habits and mental model of what a programming language is. Coming from a procedural or C-like language background, a lot of concepts would seem alien or weird. Haskell does come from an academic background, and it shows in a lot of the terminology, but since then it has escaped and seen uses all over the place. For instance, a Haskell music livecoding environment called TidalCycles is used by artists and musicians who don’t know programming, let alone Haskell. Still, it’s syntactically correct Haskell code executed in GHCi.

References

In-depth: Functional programming in C++ by John Carmack

An old IDEA

2019-02-18 :: security, encryption

Spoiler alert: I didn’t find the password.

Around 2002, I came across a tool that encrypts and decrypts files using the IDEA cipher. IDEA was created in 1991 by James Massey Xuejia Lai, and it seems still relatively secure. It’s still used in a recent version of opengpg.

I had encrypted few zip files using this, in 2002 and 2003. (Nothing serious, just wanted to try out encrypting things.) However, I forgot the passwords.

Looking at the implementation, it says it uses a maximum of 8 characters for a passphrase, which is about half of IDEA’s 128-bit keylength (not counting non-ascii characters.)

It was a tool for DOS in 16-bit mode.

Given this limitation, is bruteforcing a password possible? If I used only lowercase letters for the password, the number of passwords to try would be 26^8 = 208,827,064,576, or about 200 billion combinations. Alphanumeric combinations would be (26*2 + 10)^8 = 218,340,105,584,896, or about 218 trillion. Some simple math can lead us to an estimate of the time to go through the search space. It depends on the elements and elements/s.

One challenge with finding the correct decryption password is there’s no easy way of making sure we hae the correct password. The way this idea program worked was to only encrypt/decrypt fixed blocks, with no integrity checking. If we decrypt a file with the wrong password, it would output random data. So one way to find the correct password is to find a pattern in the output if we know what the output would contain. Since it was a zip file, a zip file has a known header in the first block (16 bytes).

I modified the main file to read words from a wordlist, decrypt the first block, and test if it’s a zip header. Instead of parallizing the solution in C, I wrote simple bash scripts to run it parallelly after splitting the wordlist into 4, the number of cores I have. I was able to process 700k passwords/s. I wasn’t able to find the password using the wordlists I had. I suppose I need to investigate more techniques and maybe harness my GPU.

Few more things come to my mind for future work:

Are there any other weaknesses in the implementation?
Could this be integrated into the “John the Ripper” password cracker?