Bikas Katwal
1 min readMar 24, 2020

--

  1. How much time does it take to re-index from Cassandra? Have you done any optimization there?
  • It takes about 10–13 mins, with the number of docs we have right now. i.e. 3 million docs. We have made rate of ingestion to Solr configurable, as right now we have single shard and once we have more, we can increase the write rate.

2. How does spark know which all doc are pending to be ingested in Solr. I think you keep 2 table in Cassandra 1 for pending items and another for current Solr. And spark removes data from Cassandra pending item and puts in Solr?

  • For real time updates we update both Solr and Cassandra as and when we read data from Kafka topics.

You can go with the approach you have mentioned. One benefit would be Solr will always get updates from Cassandra. Once the data is ingested from pending data table, you need to delete it.

--

--

Bikas Katwal
Bikas Katwal

Written by Bikas Katwal

Coder | Distributed Systems | Search | Software Engineer @Walmartlabs

Responses (1)