# Lispy lists continued

In a previous post a wrote about data processing in Python using functional programming virtues like explicit state and immutability. I also played around with a rather unpythonic syntax using list, map, filter and zip and claimed it looks quite “lispy”.

As a logical consequence I thought it would be fun to implement yonder solution in real functional programming languages: Haskell and Clojure. Closure even is a real Lisp dialect.

``````import           System.IO
import qualified Data.Text    as T
import qualified Data.Text.IO as TIO
import qualified Data.Vector  as V

mkValues rows = return (V.fromList (map (mkTuple . (T.split (== ','))) rows))
where
mkTuple [w,x,y,z] = (toDouble w, toInt x, toInt y, toInt z) :: (Double, Int, Int, Int)
mkTuple _ = error "unexpected"

filterValid values = return \$ V.filter (\(_,_,_,x) -> x == 1) values

mkTimeDiffs times = return \$ V.indexed \$ V.zipWith (-) t i
where
i = V.init times
t = V.tail times

mkBounds tdiffs = do
idxs <- return (
V.cons 0 (
V.map ((+1) . fst) (
V.filter (\t -> snd t > 1e-4) tdiffs)))
return (V.zip (V.init idxs) (V.tail idxs))

extractData pred bounds vec =
let bounds' = V.ifilter (\i _ -> pred i) bounds
in V.map (\(s,e) -> V.slice s (e-s) vec) bounds'

process rows = do
values <- mkValues rows
valid_values <- filterValid values
let (times, data1_row, data2_row, _) = V.unzip4 valid_values
tdiffs <- mkTimeDiffs times
bounds <- mkBounds tdiffs
let
data1 = extractData even bounds data1_row
data2 = extractData odd bounds data2_row
return (data1,data2)

main = do
rows <- (tail . T.lines) <\$> TIO.readFile "post38_example.csv"
let (data1,data2) = evalCont (process rows)
putStrLn "Data1"
print \$ data1
putStrLn "Data2"
print \$ data2``````

As was the case in the original Python post, there exists an alternative representation of the program. Haskell also does have support for list comprehensions, very similar to those in Python. I am not sure, but I think I even read somewhere that Python lend the list comprehension idea from Haskell.

``````import           Data.List
import           System.IO
import qualified Data.Text    as T
import qualified Data.Text.IO as TIO
import qualified Data.Vector  as V

mkTuple [w,x,y,z] = (toDouble w, toInt x, toInt y, toInt z) :: (Double, Int, Int, Int)
where
mkTuple _ = error "unexpected"

extractData pred bounds vec =
let bounds' = V.ifilter (\i _ -> pred i) bounds
in V.map (\(s,e) -> V.slice s (e-s) vec) bounds'

isValid (_,_,_,v) = v == 1

main = do
lines <- (tail . T.lines) <\$> TIO.readFile "post38_example.csv"
let
rows = [mkTuple (T.split (== ',') line) | line <- lines]
valid_rows = [row | row <- rows, isValid row]
(times,d1,d2,_) = unzip4 valid_rows
time_diffs = zip [1..] (zipWith (-) (tail times) (init times))
package_idxs =  ++ [fst x | x <- time_diffs, (snd x) > 1e-4]
package_bounds = V.fromList \$ zip (init package_idxs) (tail package_idxs)
d1_vec = V.fromList d1
d2_vec = V.fromList d2
data1 = extractData even package_bounds d1_vec
data2 = extractData odd package_bounds d2_vec
putStrLn "Data1"
print \$ data1
putStrLn "Data2"
print \$ data2``````

Note: In my first draft mkDouble was called mkFloat and I used Float instead of Double. It turned out that this precision was not enough to produce correct results. This is actually a flaw of the whole “design” of this demo task. The whole data evaluation logic is way to fragile regarding the floating point values of the data. Just changing the threshold from 1e-4 to 1.1e-4 totally changes the results. This, however, does not change the general ideas presented in this and the previous post.

## Clojure

``````(ns post41-clojure.core
(:require [clojure.string :as str])
(:gen-class))

(defn convert-row
"Converts one row (vector of string) to a vector [Float Int Int Int]
It is expected that the vector consists of exactly 4 elements."
[vec]
(let [t  (Double/parseDouble (get vec 0))
d1 (Integer/parseInt   (get vec 1))
d2 (Integer/parseInt   (get vec 2))
v  (Integer/parseInt   (get vec 3))]
[t d1 d2 v]))

(defn get-col
"Extract the n-th column from a vector of vectors"
[n vec]
(mapv (fn [v] (get v n)) vec))

"Reads the contents of the given file, splits it into lines and then splits
each row into its column values (still strings). Finally the values are cast
(converted) to the corresponding types."
[filename]
(map convert-row
(rest
(map #(str/split % #",")
(str/split-lines (slurp filename))))))

(defn valid?
[row]
(> (get row 3) 0))

(defn make-time-diffs
"Create a new vector from a vector of time values where the previous time is
subtracted from the successor, thus leading a time difference vector."
[times]
(map vector (range)
(map #(- (first %) (second %))
(map list (subvec times 1) (subvec times 0 (- (count times) 1))))))

(defn make-package-indices
"Finds the start and end indices of the individual data packages based on the
threshold logic."
[time_diffs]
(into 
(vec
(mapv #(+ (first %) 1)
(filterv (fn [v] (> (second v) 1e-4)) time_diffs)))))

(defn make-package-bounds
"Creates the data package boundary pairs (start, end) which delimit the range
of one data package in the data stream"
[pkg_idxs]
(map vector (subvec pkg_idxs 0 (- (count pkg_idxs) 1)) (subvec pkg_idxs 1)))

(defn extract-relevant-data
"Extract the relevant package boundary values based on the given bounds and a
predicate to filter the bounds"
[data pred bounds]
(let [data_bounds (keep-indexed #(if (pred %1) %2) bounds)]
(for [[start end] data_bounds]
(for [idx (range start end)]
(get data idx)))))

(defn extract-data
"Extract the data from the given data rows"
[idx pred rows]
(let [times (get-col 0 rows)
data (get-col idx rows)]
((comp
#(extract-relevant-data data pred %)
make-package-bounds
make-package-indices
make-time-diffs) times)))

(defn -main
"Main program for the demo project."
[& args]
(let [filename (first args)
rows (keep #(if (valid? %) %) (read-data filename))
data1 (extract-data 1 even? rows)
data2 (extract-data 2 odd? rows)]
(doseq [string ["Data1" data1 "Data2" data2]]
(println string))))``````