Lispy lists continued

In a previous post a wrote about data processing in Python using functional programming virtues like explicit state and immutability. I also played around with a rather unpythonic syntax using list, map, filter and zip and claimed it looks quite “lispy”.

As a logical consequence I thought it would be fun to implement yonder solution in real functional programming languages: Haskell and Clojure. Closure even is a real Lisp dialect.

Haskell

import           System.IO
import qualified Data.Text    as T
import qualified Data.Text.IO as TIO
import           Control.Monad.Trans.Cont
import qualified Data.Vector  as V


mkValues rows = return (V.fromList (map (mkTuple . (T.split (== ','))) rows))
  where
    mkTuple [w,x,y,z] = (toDouble w, toInt x, toInt y, toInt z) :: (Double, Int, Int, Int)
    mkTuple _ = error "unexpected"
    toInt = read . T.unpack
    toDouble = read . T.unpack

filterValid values = return $ V.filter (\(_,_,_,x) -> x == 1) values

mkTimeDiffs times = return $ V.indexed $ V.zipWith (-) t i
  where
    i = V.init times
    t = V.tail times

mkBounds tdiffs = do
  idxs <- return (
            V.cons 0 (
            V.map ((+1) . fst) (
            V.filter (\t -> snd t > 1e-4) tdiffs)))
  return (V.zip (V.init idxs) (V.tail idxs))

extractData pred bounds vec =
  let bounds' = V.ifilter (\i _ -> pred i) bounds
  in V.map (\(s,e) -> V.slice s (e-s) vec) bounds'

process rows = do
  values <- mkValues rows
  valid_values <- filterValid values
  let (times, data1_row, data2_row, _) = V.unzip4 valid_values
  tdiffs <- mkTimeDiffs times
  bounds <- mkBounds tdiffs
  let
    data1 = extractData even bounds data1_row
    data2 = extractData odd bounds data2_row
  return (data1,data2)

main = do
  rows <- (tail . T.lines) <$> TIO.readFile "post38_example.csv"
  let (data1,data2) = evalCont (process rows)
  putStrLn "Data1"
  print $ data1
  putStrLn "Data2"
  print $ data2

As was the case in the original Python post, there exists an alternative representation of the program. Haskell also does have support for list comprehensions, very similar to those in Python. I am not sure, but I think I even read somewhere that Python lend the list comprehension idea from Haskell.

import           Data.List
import           System.IO
import qualified Data.Text    as T
import qualified Data.Text.IO as TIO
import qualified Data.Vector  as V


mkTuple [w,x,y,z] = (toDouble w, toInt x, toInt y, toInt z) :: (Double, Int, Int, Int)
  where
    toInt = read . T.unpack
    toDouble = read . T.unpack
mkTuple _ = error "unexpected"

extractData pred bounds vec =
  let bounds' = V.ifilter (\i _ -> pred i) bounds
  in V.map (\(s,e) -> V.slice s (e-s) vec) bounds'

isValid (_,_,_,v) = v == 1

main = do
  lines <- (tail . T.lines) <$> TIO.readFile "post38_example.csv"
  let
    rows = [mkTuple (T.split (== ',') line) | line <- lines]
    valid_rows = [row | row <- rows, isValid row]
    (times,d1,d2,_) = unzip4 valid_rows
    time_diffs = zip [1..] (zipWith (-) (tail times) (init times))
    package_idxs = [0] ++ [fst x | x <- time_diffs, (snd x) > 1e-4]
    package_bounds = V.fromList $ zip (init package_idxs) (tail package_idxs)
    d1_vec = V.fromList d1
    d2_vec = V.fromList d2
    data1 = extractData even package_bounds d1_vec
    data2 = extractData odd package_bounds d2_vec
  putStrLn "Data1"
  print $ data1
  putStrLn "Data2"
  print $ data2

Note: In my first draft mkDouble was called mkFloat and I used Float instead of Double. It turned out that this precision was not enough to produce correct results. This is actually a flaw of the whole “design” of this demo task. The whole data evaluation logic is way to fragile regarding the floating point values of the data. Just changing the threshold from 1e-4 to 1.1e-4 totally changes the results. This, however, does not change the general ideas presented in this and the previous post.

Clojure

(ns post41-clojure.core
  (:require [clojure.string :as str])
  (:gen-class))


(defn convert-row
  "Converts one row (vector of string) to a vector [Float Int Int Int]
  It is expected that the vector consists of exactly 4 elements."
  [vec]
  (let [t  (Double/parseDouble (get vec 0))
        d1 (Integer/parseInt   (get vec 1))
        d2 (Integer/parseInt   (get vec 2))
        v  (Integer/parseInt   (get vec 3))]
    [t d1 d2 v]))

(defn get-col
  "Extract the n-th column from a vector of vectors"
  [n vec]
  (mapv (fn [v] (get v n)) vec))

(defn read-data
  "Reads the contents of the given file, splits it into lines and then splits
  each row into its column values (still strings). Finally the values are cast
  (converted) to the corresponding types."
  [filename]
  (map convert-row
    (rest
      (map #(str/split % #",")
           (str/split-lines (slurp filename))))))

(defn valid?
  [row]
  (> (get row 3) 0))

(defn make-time-diffs
  "Create a new vector from a vector of time values where the previous time is
  subtracted from the successor, thus leading a time difference vector."
  [times]
  (map vector (range)
    (map #(- (first %) (second %))
      (map list (subvec times 1) (subvec times 0 (- (count times) 1))))))

(defn make-package-indices
  "Finds the start and end indices of the individual data packages based on the
  threshold logic."
  [time_diffs]
  (into [0]
    (vec
      (mapv #(+ (first %) 1)
        (filterv (fn [v] (> (second v) 1e-4)) time_diffs)))))

(defn make-package-bounds
  "Creates the data package boundary pairs (start, end) which delimit the range
  of one data package in the data stream"
  [pkg_idxs]
  (map vector (subvec pkg_idxs 0 (- (count pkg_idxs) 1)) (subvec pkg_idxs 1)))

(defn extract-relevant-data
  "Extract the relevant package boundary values based on the given bounds and a
  predicate to filter the bounds"
  [data pred bounds]
  (let [data_bounds (keep-indexed #(if (pred %1) %2) bounds)]
    (for [[start end] data_bounds]
      (for [idx (range start end)]
        (get data idx)))))

(defn extract-data
  "Extract the data from the given data rows"
  [idx pred rows]
  (let [times (get-col 0 rows)
        data (get-col idx rows)]
    ((comp
      #(extract-relevant-data data pred %)
      make-package-bounds
      make-package-indices
      make-time-diffs) times)))

(defn -main
  "Main program for the demo project."
  [& args]
  (let [filename (first args)
        rows (keep #(if (valid? %) %) (read-data filename))
        data1 (extract-data 1 even? rows)
        data2 (extract-data 2 odd? rows)]
    (doseq [string ["Data1" data1 "Data2" data2]]
      (println string))))