Fix the slow bootstrapping with `groupUpdates`

Fixes #1
author: Ben Sima <ben@bsima.me> 2018-01-24 18:31:41 -0800
committer: Ben Sima <ben@bsima.me> 2018-01-24 18:31:41 -0800
commit: 8eeda4aa133e6aa3e068b1a9f7d31cd9001475d4 (patch)
tree: f20c96ba90eeed88cee92649929f20bbc9ea1634
parent: 57bcca93b1ed1e31d8837ba1a43a872c9e4f757f (diff)
2 files changed, 8 insertions, 38 deletions
diff --git a/main.hs b/main.hs
index a09c9ce..04acafc 100755
--- a/main.hs
+++ b/main.hs
@@ -28,6 +28,7 @@
 {-# LANGUAGE RecordWildCards #-}
 
 import Data.Acid
+import Data.Acid.Advanced (groupUpdates)
 import Data.Acid.Local (createCheckpointAndClose)
 import qualified Data.ByteString.Lazy as BSL
 import qualified Data.Csv as Csv
@@ -240,21 +241,9 @@ postNumberR = do
       caller <- liftIO $ update db $ AddCaller _name _number _context
       sendStatusJSON status200 $ caller
 
--- | This takes a while; on my machine it averages 181 records per second. It's
--- IO bound, and in an un-optimized program GHC on Linux uses a single, blocking
--- IO manager thread (on Windows it's non-blocking, apparently). This can be
--- improved with the Control.Concurrent module, it which case we could launch as
--- many IO threads as we want, and do probably 10k records per second. There's
--- definitely an optimal amount of threads here, we'd have to test to find that.
---
--- HOWEVER, you can watch it bootstrap. Hit this endpoint, then use "GET /count"
--- to see it updating. New POSTs will also work and update the database, even
--- while it is bootstrapping, which is kinda cool.
---
--- Try this in bash:
---
---    while sleep 1; do curl -s "localhost:3000/count" | jq '.count'; done
---
+
+callerFromCsv (number, context, name) = AddCaller name number context
+
 postBootstrapR :: Handler RepJson
 postBootstrapR = do
   $logInfo "Initializing the database."
@@ -264,11 +253,7 @@ postBootstrapR = do
   seedData <- liftIO $ BSL.readFile "interview-callerid-data.csv"
   callers <- case Csv.decode Csv.NoHeader seedData of
          Left err -> fail err
-         Right v ->
-           Vector.forM_ (Vector.indexed v) $ \(callerId, record) -> do
-           let (number, context, name) = record
-           c <- liftIO $ update db $ AddCaller name number context
-           return c
+         Right v -> liftIO $ groupUpdates db $ Vector.toList $ Vector.map callerFromCsv v
   sendStatusJSON status200 $ ("Bootstrap complete." :: Text)
 
 
diff --git a/readme.md b/readme.md
index 31e0edf..b7b9c4e 100644
--- a/readme.md
+++ b/readme.md
@@ -2,22 +2,7 @@
 2. Optional: have [nix](https://nixos.org/nix/) installed. If you *don't* use nix, delete line 3 of `main.hs`.
 3. Run `./main.hs` and the server will startup
 
-To bootstrap the db, do `curl -XPOST "localhost:3000/bootstrap"`.
+To bootstrap the db, do `curl -XPOST "localhost:3000/bootstrap"`. This takes
+like 3 seconds or so.
 
-This takes a while; on my machine it averages 181 records per second. It's IO
-bound, and in an un-optimized program GHC on Linux uses a single, blocking IO
-manager thread (on Windows it's non-blocking, [apparently][1]). This can be
-improved with the [Control.Concurrent][1] module, it which case we could launch
-as many IO threads as we want, and do probably 10k records per second. There's
-definitely an optimal amount of threads here, we'd have to test to find that.
-
-*However*, you can watch it bootstrap. In a separate terminal, do `curl
-"localhost:3000/count"` to see it updating. New POSTs will also work and update
-the database, even while it is bootstrapping, which is kinda cool.
-
-Try this in bash:
-
-    while sleep 1; do curl -s "localhost:3000/count" | jq '.count'; done
-
-
-[1]: https://www.stackage.org/haddock/lts-10.3/base-4.10.1.0/Control-Concurrent.html#g:10
+See `test.http` for some other examples.
author	Ben Sima <ben@bsima.me>	2018-01-24 18:31:41 -0800
committer	Ben Sima <ben@bsima.me>	2018-01-24 18:31:41 -0800
commit	8eeda4aa133e6aa3e068b1a9f7d31cd9001475d4 (patch)
tree	f20c96ba90eeed88cee92649929f20bbc9ea1634
parent	57bcca93b1ed1e31d8837ba1a43a872c9e4f757f (diff)