Lispy lists continued

In a previous post a wrote about data processing in Python using functional programming virtues like explicit state and immutability. I also played around with a rather unpythonic syntax using list, map, filter and zip and claimed it looks quite “lispy”.

As a logical consequence I thought it would be fun to implement yonder solution in real functional programming languages: Haskell and Clojure. Closure even is a real Lisp dialect.


As was the case in the original Python post, there exists an alternative representation of the program. Haskell also does have support for list comprehensions, very similar to those in Python. I am not sure, but I think I even read somewhere that Python lend the list comprehension idea from Haskell.

Note: In my first draft mkDouble was called mkFloat and I used Float instead of Double. It turned out that this precision was not enough to produce correct results. This is actually a flaw of the whole “design” of this demo task. The whole data evaluation logic is way to fragile regarding the floating point values of the data. Just changing the threshold from 1e-4 to 1.1e-4 totally changes the results. This, however, does not change the general ideas presented in this and the previous post.


(ns post41-clojure.core
  (:require [clojure.string :as str])

(defn convert-row
  "Converts one row (vector of string) to a vector [Float Int Int Int]
  It is expected that the vector consists of exactly 4 elements."
  (let [t  (Double/parseDouble (get vec 0))
        d1 (Integer/parseInt   (get vec 1))
        d2 (Integer/parseInt   (get vec 2))
        v  (Integer/parseInt   (get vec 3))]
    [t d1 d2 v]))

(defn get-col
  "Extract the n-th column from a vector of vectors"
  [n vec]
  (mapv (fn [v] (get v n)) vec))

(defn read-data
  "Reads the contents of the given file, splits it into lines and then splits
  each row into its column values (still strings). Finally the values are cast
  (converted) to the corresponding types."
  (map convert-row
      (map #(str/split % #",")
           (str/split-lines (slurp filename))))))

(defn valid?
  (> (get row 3) 0))

(defn make-time-diffs
  "Create a new vector from a vector of time values where the previous time is
  subtracted from the successor, thus leading a time difference vector."
  (map vector (range)
    (map #(- (first %) (second %))
      (map list (subvec times 1) (subvec times 0 (- (count times) 1))))))

(defn make-package-indices
  "Finds the start and end indices of the individual data packages based on the
  threshold logic."
  (into [0]
      (mapv #(+ (first %) 1)
        (filterv (fn [v] (> (second v) 1e-4)) time_diffs)))))

(defn make-package-bounds
  "Creates the data package boundary pairs (start, end) which delimit the range
  of one data package in the data stream"
  (map vector (subvec pkg_idxs 0 (- (count pkg_idxs) 1)) (subvec pkg_idxs 1)))

(defn extract-relevant-data
  "Extract the relevant package boundary values based on the given bounds and a
  predicate to filter the bounds"
  [data pred bounds]
  (let [data_bounds (keep-indexed #(if (pred %1) %2) bounds)]
    (for [[start end] data_bounds]
      (for [idx (range start end)]
        (get data idx)))))

(defn extract-data
  "Extract the data from the given data rows"
  [idx pred rows]
  (let [times (get-col 0 rows)
        data (get-col idx rows)]
      #(extract-relevant-data data pred %)
      make-time-diffs) times)))

(defn -main
  "Main program for the demo project."
  [& args]
  (let [filename (first args)
        rows (keep #(if (valid? %) %) (read-data filename))
        data1 (extract-data 1 even? rows)
        data2 (extract-data 2 odd? rows)]
    (doseq [string ["Data1" data1 "Data2" data2]]
      (println string))))