Is there a go-to database for storing User Activity data? The data would look something like this:
UserId , Timestamp, Activity(String, up to 255 chars), userGroup(arbitrary way of dividing users into groups)
1. High Write Throughput
2. (relatively high) availability
3. Reads will be only for dashboard / reports, therefore can tolerate higher delays.
4. Allow for huge tables: could easily get 100M records over a few days, read can get slower but writes cannot.
The stack I have in mind would look something like this:
WebApp -> Play2App(Scala) -> [Database]
AdminUI <- Play2App(Scala) <- (Spark? or maybe nothing) <- [Database]
What's a good DB tech for this use case? I already have RDB that drives everything else, but would like another (most likely) NoSQL Database to store user activity data only. Is there a go-to DB in this case?
Current top contenders:
MongoDB, CouchDB, Hbase(But will hate to have to manage it), Cassandra
Based on your requirements, sounds like Cassandra is the way to go.
Cassandra has a heavily optimised write path and performs very well for write-intensive workloads. Cassandra will also easily be able to accommodate increased number of records as you are already committed to bucketing your data. The limiting factor will be having about 100MB per partition (userGroup), which should be fine with planning around how many users per userGroup.
Note that Cassandra does not have a flexible schema, which is fine for what you are planning, but not good if you want to be able to produce more customizable reports in the future.