≡

wincent.dev

  • Products
  • Blog
  • Wiki
  • Issues
You are viewing an historical archive of past issues. Please report new issues to the appropriate project issue tracker on GitHub.
Home » Issues » Feature request #1142

Feature request #1142: Set-up automated pruning of sessions table

Kind feature request
Product wincent.dev
When Created 2008-09-22T17:26:09Z, updated 2009-01-16T21:34:44Z
Status closed
Reporter Greg Hurrell
Tags no tags

Description

  1. Set up a new database user that has permission to DELETE from the sessions table and not perform any other operation.
  2. Check that the user can actually perform a query like DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY); (ie a DELETE with a WHERE clause) despite not having read access.
  3. Wait for a reasonable number of sessions to accumulate in the database (30 days' worth, for example) and confirm that the query runs fast enough to permit automated execution without causing any lock contention problems.
  4. Set up cron job to perform the pruning once per day.

The use of extremely limited privileges is desirable because the job will be running from cron, and that means either storing the password in plaintext the cron job or allowing passwordless access from the localhost for that user. I think I prefer the latter so that other users on the system can't snoop the password by looking at the output of ps. In the unlikely event that that user is compromised there isn't really any scope for data leakage nor impersonation (because reading isn't permitted and writing isn't either), and deletion of sessions records is basically harmless.

See also "Clearing out old Rails session records from a MySQL database".

Comments

  1. Greg Hurrell 2008-09-23T02:43:00Z

    Ok, I've done steps 1 and 2.

    GRANT DELETE ON the_db.sessions TO pruner@localhost IDENTIFIED BY 'foobar';

    Turns out that DELETE privileges alone are not enough due to the WHERE clause in the query:

    ERROR 1143 (42000): SELECT command denied to user 'pruner'@'localhost' for column 'updated_at' in table 'sessions'

    So we add the minimal SELECT privilege on the required column only.

    GRANT SELECT (updated_at) ON the_db.sessions TO pruner@localhost;
    FLUSH PRIVILEGES;

    Now just have to wait for sufficient records to build up and do some performance testing.

  2. Greg Hurrell 2008-11-06T05:59:56Z

    Ok, so it looks like automated clearing every 30 days is just not going to be an option; it's just too slow:

    mysql> select count(*) from sessions;
    +----------+
    | count(*) |
    +----------+
    |   534861 | 
    +----------+
    1 row in set (1.33 sec)

    So that's about 6 weeks' sessions. Pruning sessions older than 30 days would delete just over 200k sessions:

    mysql> select count(*) from sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY);
    +----------+
    | count(*) |
    +----------+
    |   201177 | 
    +----------+
    1 row in set (0.27 sec)

    So I decided to try the deletion:

    mysql> DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY);

    After several minutes the query was still running and so I decided to throw up a maintenance webpage on the website to reduce contention for the database:

    $ mysqladmin -u root -p processlist
    Enter password: 
    +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+
    | Id | User     | Host      | db       | Command | Time | State    | Info                                                                         |
    +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+
    | 16 | foo      | localhost | foo      | Sleep   | 17   |          |                                                                              |
    | 17 | foo      | localhost | foo      | Sleep   | 18   |          |                                                                              |
    | 18 | foo      | localhost | foo      | Sleep   | 16   |          |                                                                              |
    | 19 | foo      | localhost | foo      | Sleep   | 17   |          |                                                                              |
    | 20 | foo      | localhost | foo      | Query   | 496  | updating | DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY) |
    | 25 | root     | localhost |          | Query   | 0    |          | show processlist                                                             |
    +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+

    By the time the query had been running for 10 minutes I decided to go even further and shut down the background mongrel processes which were sitting idle (not handling any requests because all queries were being directed to the maintenance page, but still with active connections to the database):

    $ mysqladmin -u root -p processlist
    Enter password: 
    +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+
    | Id | User     | Host      | db       | Command | Time | State    | Info                                                                         |
    +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+
    | 20 | foo      | localhost | foo      | Query   | 649  | updating | DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY) |
    | 28 | root     | localhost |          | Query   | 0    |          | show processlist                                                             |
    +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+

    After 1000 seconds the query was still running:

    $ mysqladmin -u root -p processlist
    Enter password: 
    +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+
    | Id | User     | Host      | db       | Command | Time | State    | Info                                                                         |
    +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+
    | 20 | foo      | localhost | foo      | Query   | 1000 | updating | DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY) |
    | 30 | root     | localhost |          | Query   | 0    |          | show processlist                                                             |
    +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+

    So I decided to abort it:

    Query aborted by Ctrl+C

    But note that the abort didn't take effect immediately; five minutes later it was still in progress:

    $ mysqladmin -u root -p processlist
    Enter password: 
    +----+----------+-----------+----------+---------+------+-------+------------------------------------------------------------------------------+
    | Id | User     | Host      | db       | Command | Time | State | Info                                                                         |
    +----+----------+-----------+----------+---------+------+-------+------------------------------------------------------------------------------+
    | 20 | foo      | localhost | foo      | Query   | 1459 | end   | DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY) |
    | 37 | root     | localhost |          | Query   | 0    |       | show processlist                                                             |
    +----+----------+-----------+----------+---------+------+-------+------------------------------------------------------------------------------+

    So I decided to go a step further and kill the corresponding thread:

    $ mysqladmin -u root -p kill 20

    After five or six minutes the thread was still lingering:

    $ mysqladmin -u root -p processlist
    Enter password: 
    +----+----------+-----------+----------+---------+------+-------+------------------------------------------------------------------------------+
    | Id | User     | Host      | db       | Command | Time | State | Info                                                                         |
    +----+----------+-----------+----------+---------+------+-------+------------------------------------------------------------------------------+
    | 20 | foo      | localhost | foo      | Killed  | 1839 | end   | DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY) |
    | 42 | root     | localhost |          | Query   | 0    |       | show processlist                                                             |
    +----+----------+-----------+----------+---------+------+-------+------------------------------------------------------------------------------+

    But finally exited!

    ERROR 1053 (08S01): Server shutdown in progress

    So now with the maintenance page up and no other threads in the database I tried once again:

    mysql> DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY);
    ERROR 2006 (HY000): MySQL server has gone away
    No connection. Trying to reconnect...
    Connection id:    44
    Current database: foo
    
    Query OK, 201177 rows affected (24 min 46.31 sec)

    So as you can see, nearly 25 minutes to execute the query, which is clearly too long to ever turn this into an automated process.

    I need to consider two options now:

    1. Keep fewer sessions in the database and purge them more often (for example, every seven days); will need to test if the lower number of rows in the table leads to sufficiently short queries.
    2. Alternatively, forget about using the MySQL-backed session store and switch to something that either doesn't need purging (ie. the cookie-based store) or is sufficiently fast (not sure what my options might be there).
  3. Greg Hurrell 2009-01-14T18:29:37Z

    A little over two months since my last post and just checked the session count: 1,294,381 records. Of those, about two thirds are old enough to be pruned:

    mysql> SELECT COUNT(*) FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY);
    +----------+
    | count(*) |
    +----------+
    |   865682 | 
    +----------+
    1 row in set (0.90 sec)

    I'm going to shut down the server and prune one more time. I am curious to see if a deletion based on the primary key (the id) is any faster.

    mysql> SELECT id FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY) ORDER BY id DESC LIMIT 1;
    +---------+
    | id      |
    +---------+
    | 1066871 | 
    +---------+
    1 row in set (0.89 sec)
    
    mysql> DELETE FROM sessions WHERE id < 1066871;
    Query OK, 865688 rows affected (2 hours 6 min 37.96 sec)

    Evidently no faster. Going to audit the codebase for possible security implications of moving to the cookie-backed session store.

  4. Greg Hurrell 2009-01-14T18:32:17Z

    How would I get a histogram of session sizes in my current database? Would the MySQL LENGTH function work?

    Here's a test done on the staging database:

    mysql> CREATE TEMPORARY TABLE session_lengths (id int, session_length int);
    Query OK, 0 rows affected (0.04 sec)
    
    mysql> INSERT INTO session_lengths SELECT id, length(data) FROM sessions;
    Query OK, 425014 rows affected (53.00 sec)
    Records: 425014  Duplicates: 0  Warnings: 0
    
    mysql> SELECT session_length, COUNT(*) AS frequency FROM session_lengths GROUP BY session_length ORDER BY frequency DESC;
    +----------------+-----------+
    | session_length | frequency |
    +----------------+-----------+
    |             86 |    378495 | 
    |            139 |     46504 | 
    |            106 |         4 | 
    |            212 |         2 | 
    |            159 |         2 | 
    |            171 |         2 | 
    |            122 |         1 | 
    |            143 |         1 | 
    |            114 |         1 | 
    |            167 |         1 | 
    |            163 |         1 | 
    +----------------+-----------+
    11 rows in set (0.21 sec)

    Almost all records are 86 chars (that's the marshalled, encrypted form), with a smallish minority 129 chars, and some outliers. Evidently 86 chars is probably the baseline for an "empty" session; whatever Rails happens to be sticking in there by default, plus the overhead of marshalling, encryption, and encoding.

  5. Greg Hurrell 2009-01-14T18:37:54Z

    Same thing, except done on production database:

    mysql> CREATE TEMPORARY TABLE session_lengths (id int, session_length int);
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> INSERT INTO session_lengths SELECT id, length(data) FROM sessions;
    Query OK, 428845 rows affected (2 min 0.11 sec)
    Records: 428845  Duplicates: 0  Warnings: 0
    
    mysql> SELECT session_length, COUNT(*) AS frequency FROM session_lengths GROUP BY session_length ORDER BY frequency DESC;
    +----------------+-----------+
    | session_length | frequency |
    +----------------+-----------+
    |             86 |    405663 | 
    |            139 |     20927 | 
    |            143 |       451 | 
    |            155 |       339 | 
    |            147 |       302 | 
    |            163 |       183 | 
    |            159 |       142 | 
    |            151 |       131 | 
    |            167 |       119 | 
    |            171 |       114 | 
    |            135 |       107 | 
    |            175 |        83 | 
    |            179 |        60 | 
    |            183 |        56 | 
    |            212 |        28 | 
    |            192 |        24 | 
    |            208 |        21 | 
    |            188 |        21 | 
    |            196 |        18 | 
    |            232 |        16 | 
    |            240 |         7 | 
    |            228 |         6 | 
    |            131 |         5 | 
    |            236 |         4 | 
    |            297 |         4 | 
    |            106 |         3 | 
    |            338 |         2 | 
    |            200 |         2 | 
    |            301 |         1 | 
    |            216 |         1 | 
    |            114 |         1 | 
    |            305 |         1 | 
    |            346 |         1 | 
    |            362 |         1 | 
    |            249 |         1 | 
    +----------------+-----------+
    35 rows in set (0.22 sec)
  6. Greg Hurrell 2009-01-14T19:19:48Z

    See http://fukamachi.org/wp/2008/09/19/couchdb-session-model-for-rails/ for discussion with Sho about Rails session storage.

  7. Greg Hurrell 2009-01-14T19:48:22Z

    I'm going to trial the cookie-based store in my staging environment.

    Not sure whether this issue will come up, but wanted to save the link here just in case I see any TamperedWithCookie errors pop up.

  8. Greg Hurrell 2009-01-14T20:19:45Z

    Yep, first page view was a TamperedWithCookie error, seen by the user as an internal server error. Second page view was fine. Existing login session continued uninterrupted because that's not stored in the Rails session anyway.

  9. Greg Hurrell 2009-01-14T20:24:17Z

    To address the TamperedWithCookie problem when I go live on the production server will do the following:

    1. Deploy as usual, but before going live...
    2. cap deploy:web:disable
    3. Edit session_key (config file on server)
    4. Go live as normal

    This effectively invalidates the session, but given the things that are stored in the session (flashes etc) this is probably not a problem at all.

  10. Greg Hurrell 2009-01-14T20:27:53Z

    Doing all this has made me realize that the stuff I am storing in the session is totally transitory in nature (redirection counts, "original URI" for login etc) and keeping hold of it for a month at a time is totally unnecessary.

    The only thing that I really care about persisting is the login, and that was never stored in the Rails session. So I think I'm going to do some tests on purging the sessions table with really small record counts (1 hour of sessions, for example). Any user preferences which should persist for longer than an hour will go in the user table anyway.

  11. Greg Hurrell 2009-01-14T20:53:51Z

    Will be trying out something like:

    DELETE FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 1 HOUR);

    Hoping that it is fast enough to run without slowing down other requests.

  12. Greg Hurrell 2009-01-14T20:57:29Z

    One thing which I've discovered is that if you have a page open with a form, wait long enough for your session to be purged on the database, and then try to submit, you'll get the 422.html error page (internally, InvalidAuthenticityToken is raised).

    I'm going to have to clean up the 422.html error template so as not to confuse users and give them a helpful suggestion.

  13. Greg Hurrell 2009-01-15T15:11:06Z

    I've looked at my web stats to see when the best time of day would be to do this pruning (in the case that I end up doing it once every 24 hours).

    For wincent.dev:

    • quietest hour 5:00 AM EST, traffic (hits) drops to 82% of baseline (average hits per hour)
    • busiest hour 11:00 AM EST, traffic (hits) climbs to 120% of baseline
    • record hour for last month, traffic rose to 301% of baseline

    And for wincent.dev (non-Rails part of the site):

    • quietest, 1:00 AM EST, 75% of baseline
    • busiest, 15:00 PM EST, 128% of baseline
    • record, peaked at 909% of baseline

    So I'll be testing how slow pruning is once 24 hours of records have built up. If its really slow and could interrupt service or impact performance noticeably I'd prefer to take the hit once per day instead of every hour. But if its fast then I'd rather do it every hour (still keeping 24 hours of records).

  14. Greg Hurrell 2009-01-16T09:36:11Z

    More test results:

    mysql> truncate sessions;
    Query OK, 51 rows affected (0.03 sec)
    
    mysql> SELECT COUNT(*) FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 24 HOUR);
    +----------+
    | COUNT(*) |
    +----------+
    |      571 | 
    +----------+
    1 row in set (0.00 sec)
    
    mysql> DELETE FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 24 HOUR);
    Query OK, 571 rows affected (0.75 sec)
    
    mysql> SELECT COUNT(*) FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 24 HOUR);
    +----------+
    | COUNT(*) |
    +----------+
    |        9 | 
    +----------+
    1 row in set (0.00 sec)
    
    mysql> DELETE FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 24 HOUR);
    Query OK, 13 rows affected (0.03 sec)
    
    mysql> DELETE FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 24 HOUR);
    Query OK, 1920 rows affected (0.22 sec)
    
    mysql> quit

    Summarizing the above, you have:

    1. We TRUNCATE the sessions table entirely, emptying it.
    2. After just over 24 hours, we check to see how many "stale" sessions (sessions not active in the last 24 hours) exist (971 sessions).
    3. We DELETE those sessions; the query takes 0.75 seconds.
    4. We immediately perform another DELETE to see if the query runs any faster when the number of stale sessions is lower (13 sessions that had become stale since the previous query); this time it takes 0.03 seconds.
    5. We wait about three hours and do another DELETE; this time we delete 1920 records in 0.22 seconds.

    So on the basis of these figures, looks like it will be perfectly safe, performance-wise, to set up an hourly cron job that will prune old sessions (anything over 24 hours without activity).

  15. Greg Hurrell 2009-01-16T11:59:43Z

    I have two options here for doing this database maintenance from a cronjob:

    Either, set up a limited-privileges account solely for the purposes of cleaning up the database; something like:

    mysql> GRANT SELECT, DELETE ON database_name.sessions TO purge_user@localhost identified by 'kewl_pass';
    mysql> FLUSH PRIVILEGES;

    And run this from cron using:

    mysql -u purge_user -pkewl_pass \
          -e 'DELETE FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 24 HOUR)' \
          database_name

    Or, set up a rake task to do the same using the normal database account already used by the application; something which basically does:

    SomeModel.connection.execute 'DELETE FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 24 HOUR)'

    Although I like the idea of keeping things all "in the app" and not having to create a separate user, I don't like the fact that breakage in the app could prevent the second solution from executing properly. Will have to think about it a little.

  16. Greg Hurrell 2009-01-16T21:34:44Z

    Ok, in the end going with the second solution (the Rake task) because:

    • keeps the maintenance code in the repository, so all code related to the project is together
    • avoids creation of additional database account
    • avoids duplicate storage of database password
    • although loading the Rails environment is heavyweight just to run a tiny Rake task, the task only need run once per hour
    • although app breakage will prevent the job from running, consequences of not running are insignificant
Add a comment

Comments are now closed for this issue.

  • contact
  • legal

Menu

  • Blog
  • Wiki
  • Issues
  • Snippets