关于文章 postgresql

参数 pg_hba.conf postgresql postgresql.auto.conf postgresql.conf 模板

PostgreSQL 11 参数模板 - 珍藏级 --- digoal

PostgreSQL , 参数 , 模板 , postgresql.conf , pg_hba.conf , postgresql.auto.conf

PostgreSQL 11 参数模板 - 珍藏级

作者

digoal

原文地址 https://github.com/digoal/blog/blob/master/201812/20181203_01.md

日期

2018-12-03

标签

PostgreSQL , 参数 , 模板 , postgresql.conf , pg_hba.conf , postgresql.auto.conf


背景

PostgreSQL 11 postgresql.conf 参数模板

# -----------------------------  
# PostgreSQL configuration file  
# -----------------------------  
#  
# This file consists of lines of the form:  
#  
#   name = value  
#  
# (The "=" is optional.)  Whitespace may be used.  Comments are introduced with  
# "#" anywhere on a line.  The complete list of parameter names and allowed  
# values can be found in the PostgreSQL documentation.  
#  
# The commented-out settings shown in this file represent the default values.  
# Re-commenting a setting is NOT sufficient to revert it to the default value;  
# you need to reload the server.  
#  
# This file is read on server startup and when the server receives a SIGHUP  
# signal.  If you edit the file on a running system, you have to SIGHUP the  
# server for the changes to take effect, run "pg_ctl reload", or execute  
# "SELECT pg_reload_conf()".  Some parameters, which are marked below,  
# require a server shutdown and restart to take effect.  
#  
# Any parameter can also be given as a command-line option to the server, e.g.,  
# "postgres -c log_connections=on".  Some parameters can be changed at run time  
# with the "SET" SQL command.  
#  
# Memory units:  kB = kilobytes        Time units:  ms  = milliseconds  
#                MB = megabytes                     s   = seconds  
#                GB = gigabytes                     min = minutes  
#                TB = terabytes                     h   = hours  
#                                                   d   = days  


#------------------------------------------------------------------------------  
# FILE LOCATIONS  
#------------------------------------------------------------------------------  

# The default values of these variables are driven from the -D command-line  
# option or PGDATA environment variable, represented here as ConfigDir.  

#data_directory = 'ConfigDir'           # use data in another directory  
                                        # (change requires restart)  
#hba_file = 'ConfigDir/pg_hba.conf'     # host-based authentication file  
                                        # (change requires restart)  
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file  
                                        # (change requires restart)  

# If external_pid_file is not explicitly set, no extra PID file is written.  
#external_pid_file = ''                 # write an extra PID file  
                                        # (change requires restart)  


#------------------------------------------------------------------------------  
# CONNECTIONS AND AUTHENTICATION  
#------------------------------------------------------------------------------  

# - Connection Settings -  

listen_addresses = '0.0.0.0'            # what IP address(es) to listen on;  
                                        # comma-separated list of addresses;  
                                        # defaults to 'localhost'; use '*' for all  
                                        # (change requires restart)  
# 根据业务需求设定监听  
port = 1921                             # (change requires restart)  

# 建议不要大于 200 * 四分之一物理内存(GB), 例如四分之一物理内存为16G,则建议不要超过3200.      
# (假设一个连接耗费5MB,实际上syscache很大时,可能更多)   
# [《PostgreSQL relcache在长连接应用中的内存霸占"坑"》](201607/20160709_01.md)   
max_connections = 2000                  # (change requires restart)  
superuser_reserved_connections = 13      # (change requires restart)  

# $PGDATA, /tmp中 创建unix socket监听  
unix_socket_directories = '., /tmp'        # comma-separated list of directories  
                                        # (change requires restart)  
#unix_socket_group = ''                 # (change requires restart)  

# 除了OWNER和超级用户,其他用户无法从/tmp unix socket连接该实例  
unix_socket_permissions = 0700          # begin with 0 to use octal notation     
                                        # (change requires restart)  
#bonjour = off                          # advertise server via Bonjour  
                                        # (change requires restart)  
#bonjour_name = ''                      # defaults to the computer name  
                                        # (change requires restart)  

# - TCP Keepalives -  
# see "man 7 tcp" for details  

# 如果你连接数据库空闲一段时间会端口,可能是网络中存在会话超时的设备,建议可以设置一下这个心跳时间,TCP心跳间隔会缩短到60秒。  
tcp_keepalives_idle = 60                # TCP_KEEPIDLE, in seconds;  
                                        # 0 selects the system default  
tcp_keepalives_interval = 10            # TCP_KEEPINTVL, in seconds;  
                                        # 0 selects the system default  
tcp_keepalives_count = 10               # TCP_KEEPCNT;  
                                        # 0 selects the system default  

# - Authentication -  

#authentication_timeout = 1min          # 1s-600s  

# md5 or scram-sha-256   # 如果MD5会泄露,建议使用scram-sha-256,但是相互不兼容,请注意。   
# [《PostgreSQL 10.0 preview 安全增强 - SASL认证方法 之 scram-sha-256 安全认证机制》](201703/20170309_01.md)    
#password_encryption = md5              # md5 or scram-sha-256  
#db_user_namespace = off  

# GSSAPI using Kerberos  
#krb_server_keyfile = ''  
#krb_caseins_users = off  

# - SSL -  

#ssl = off  
#ssl_ca_file = ''  
#ssl_cert_file = 'server.crt'  
#ssl_crl_file = ''  
#ssl_key_file = 'server.key'  
#ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL' # allowed SSL ciphers  
#ssl_prefer_server_ciphers = on  
#ssl_ecdh_curve = 'prime256v1'  
#ssl_dh_params_file = ''  
#ssl_passphrase_command = ''  
#ssl_passphrase_command_supports_reload = off  


#------------------------------------------------------------------------------  
# RESOURCE USAGE (except WAL)  
#------------------------------------------------------------------------------  

# - Memory -  

# 1/4 主机内存   
shared_buffers = 24GB                  # min 128kB  
                                        # (change requires restart)  
# 当不使用huge page,并且连接数大于3000时,建议shared buffer不要超过48G  
# 建议shared buffer设置超过32GB时 使用大页,页大小 /proc/meminfo Hugepagesize    
huge_pages = try                # on, off, or try  
                                        # (change requires restart)  
#temp_buffers = 8MB                     # min 800kB  

# 如果用户需要使用两阶段提交,需要设置为大于0,建议与max_connections一样大  
max_prepared_transactions = 2000                # zero disables the feature  
                                        # (change requires restart)  
# Caution: it is not advisable to set max_prepared_transactions nonzero unless  
# you actively intend to use prepared transactions.  

# 可以在会话中设置,如果有大量JOIN,聚合操作,并且期望使用hash agg或hash join。   
# 可以设大一些,但是不建议大于    四分之一内存除以最大连接数  .   
# (一条QUERY中可以使用多倍WORK_MEM,与执行计划中的NODE有关)    
# 建议给一个输入,AP模式TP模式OR混合模式。三种模式使用三种不同的计算公式  
work_mem = 8MB                          # min 64kB  

# min( 2G, (1/4 主机内存)/autovacuum_max_workers )    
maintenance_work_mem = 2GB              # min 1MB  
#autovacuum_work_mem = -1               # min 1MB, or -1 to use maintenance_work_mem  
#max_stack_depth = 2MB                  # min 100kB  
dynamic_shared_memory_type = posix      # the default is the first option  
                                        # supported by the operating system:  
                                        #   posix  
                                        #   sysv  
                                        #   windows  
                                        #   mmap  
                                        # use none to disable dynamic shared memory  
                                        # (change requires restart)  

# - Disk -  

# 如果需要限制临时文件使用量,可以设置。  
# 例如防止有异常的递归调用,无限使用临时文件。  
#temp_file_limit = -1                   # limits per-process temp file space  
                                        # in kB, or -1 for no limit  

# - Kernel Resources -  

## 如果你的数据库有非常多小文件(比如有几十万以上的表,还有索引等,并且每张表都会被访问到时),  
# 建议FD可以设多一些,避免进程需要打开关闭文件。  
## 但是不要大于前面章节系统设置的ulimit -n(open files)  
# max_files_per_process=655360  

#max_files_per_process = 1000           # min 25  
                                        # (change requires restart)  

# - Cost-Based Vacuum Delay -  

# 如果你的IO非常好,则可以关闭vacuum delay   
vacuum_cost_delay = 0                   # 0-100 milliseconds  
#vacuum_cost_page_hit = 1               # 0-10000 credits  
#vacuum_cost_page_miss = 10             # 0-10000 credits  
#vacuum_cost_page_dirty = 20            # 0-10000 credits  

# io很好,CPU核数很多的机器,设大一些。如果设置了vacuum_cost_delay = 0 ,则这个不需要配置  
vacuum_cost_limit = 10000                # 1-10000 credits  

# - Background Writer -  

bgwriter_delay = 10ms                   # 10-10000ms between rounds  
bgwriter_lru_maxpages = 1000            # max buffers written/round, 0 disables  
bgwriter_lru_multiplier = 10.0          # 0-10.0 multiplier on buffers scanned/round  
bgwriter_flush_after = 512kB            # measured in pages, 0 disables  

# - Asynchronous Behavior -  

effective_io_concurrency = 0            # 1-1000; 0 disables prefetching  

# wal sender, user 动态fork的process, parallel worker等都算作 worker process, 所以你需要设置足够大.   
max_worker_processes = 128              # (change requires restart)  

#  如果需要使用并行创建索引,设置为大于1 ,不建议超过 主机cores-2  
max_parallel_maintenance_workers = 6    # taken from max_parallel_workers  

#  如果需要使用并行查询,设置为大于1 ,不建议超过 主机cores-2  
max_parallel_workers_per_gather = 0     # taken from max_parallel_workers  
parallel_leader_participation = on  

#  如果需要使用并行查询,设置为大于1 ,不建议超过 主机cores-2  
#  必须小于 max_worker_processes   
max_parallel_workers = 32               # maximum number of max_worker_processes that  
                                        # can be used in parallel operations  
#old_snapshot_threshold = -1            # 1min-60d; -1 disables; 0 is immediate  
                                        # (change requires restart)  
#backend_flush_after = 256               # measured in pages, 0 disables  


#------------------------------------------------------------------------------  
# WRITE-AHEAD LOG  
#------------------------------------------------------------------------------  

# - Settings -  

# 需要流复制物理备库、归档、时间点恢复时,设置为replica,需要逻辑订阅或逻辑备库则设置为logical  
wal_level = replica  # minimal, replica, or logical  
                                        # (change requires restart)  
#fsync = on                             # flush data to disk for crash safety  
                                        # (turning this off can cause  
                                        # unrecoverable data corruption)  

# 如果双节点,设置为ON,如果是多副本,同步模式,建议设置为remote_write。   
# 如果磁盘性能很差,并且是OLTP业务。可以考虑设置为off降低COMMIT的RT,提高吞吐(设置为OFF时,可能丢失部分XLOG RECORD)  
synchronous_commit = off                # synchronization level;  
                                        # off, local, remote_write, remote_apply, or on  

# 建议使用pg_test_fsync测试后,决定用哪个最快。通常LINUX下open_datasync比较快。  
#wal_sync_method = fsync                # the default is the first option  
                                        # supported by the operating system:  
                                        #   open_datasync  
                                        #   fdatasync (default on Linux)  
                                        #   fsync  
                                        #   fsync_writethrough  
                                        #   open_sync  

# 如果文件系统支持COW例如ZFS,则建议设置为OFF。 如果文件系统可以保证datafile block size的原子写,在对齐后也可以设置为OFF。  
# 如果底层存储能保证IO的原子写,也可以设置为OFF。  
full_page_writes = on                  # recover from partial page writes  

# 当写FULL PAGE的io是瓶颈时建议开启  
wal_compression = on                  # enable compression of full-page writes  
#wal_log_hints = off                    # also do full page writes of non-critical updates  
                                        # (change requires restart)  
# 建议 min( 512MB, shared_buffers/32 )   
#wal_buffers = -1                       # min 32kB, -1 sets based on shared_buffers  
                                        # (change requires restart)  

# 如果设置了synchronous_commit = off,可以设置wal_writer_delay  
wal_writer_delay = 10ms         # 1-10000 milliseconds  
wal_writer_flush_after = 1MB            # measured in pages, 0 disables  

# 如果synchronous_commit=on, 并且已知业务系统为高并发,对数据库有写操作的小事务,则可以设置commit_delay来实现分组提交,合并WAL FSYNCIO 。  
#commit_delay = 10                       # range 0-100000, in microseconds  
# 同时处于提交状态的事务数超过commit_siblings时,使用分组提交  
#commit_siblings = 5                    # range 1-1000  

# - Checkpoints -  

#  不建议频繁做检查点,否则XLOG会产生很多的FULL PAGE WRITE(when full_page_writes=on)。  
checkpoint_timeout = 30min              # range 30s-1d  

# 建议等于SHARED BUFFER,或2倍。  
# 同时需要考虑崩溃恢复时间, 越大,检查点可能拉越长导致崩溃恢复耗时越长。但是越小,开启FPW时,WAL日志写入量又越大。 建议采用COW文件系统,关闭FPW。  
max_wal_size = 48GB  
# 建议是SHARED BUFFER的2分之一  
min_wal_size = 12GB  

# 硬盘好的情况下,可以让检查点快速结束,恢复时也可以快速达到一致状态。否则建议0.5~0.9  
checkpoint_completion_target = 0.1    # checkpoint target duration, 0.0 - 1.0  

# IO很好的机器,不需要考虑平滑调度, 否则建议128~256kB  
checkpoint_flush_after = 256kB          # measured in pages, 0 disables  
#checkpoint_flush_after = 0             # measured in pages, 0 disables  
#checkpoint_warning = 30s               # 0 disables  

# - Archiving -  

# 建议默认打开,因为修改它需要重启实例  
#archive_mode = off             # enables archiving; off, on, or always  
                                # (change requires restart)  

#  后期再修改,如  'test ! -f /disk1/digoal/arch/%f && cp %p /disk1/digoal/arch/%f'  
#archive_command = ''           # command to use to archive a logfile segment  
                                # placeholders: %p = path of file to archive  
                                #               %f = file name only  
                                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'  
#archive_timeout = 0            # force a logfile segment switch after this  
                                # number of seconds; 0 disables  


#------------------------------------------------------------------------------  
# REPLICATION  
#------------------------------------------------------------------------------  

# - Sending Servers -  

# Set these on the master and on any standby that will send replication data.  

# 同时需要几个流复制连接,根据实际需求设定  
max_wal_senders = 10             # max number of walsender processes  
                                # (change requires restart)  

# 根据实际情况设置保留WAL的数量,主要是防止过早的清除WAL,导致备库因为主库的WAL清除而中断。根据实际情况设定。  
#wal_keep_segments = 0          # in logfile segments; 0 disables  
#wal_sender_timeout = 60s       # in milliseconds; 0 disables  


# 根据实际情况设置需要创建多少replication slot  
# 使用slot,可以保证流复制下游没有接收的WAL会在当前节点永久保留。所以必须留意下游的接收情况,否则可能导致WAL爆仓  
# 建议大于等于max_wal_senders  
#max_replication_slots = 10     # max number of replication slots  
                                # (change requires restart)  
#track_commit_timestamp = off   # collect timestamp of transaction commit  
                                # (change requires restart)  

# - Master Server -  

# These settings are ignored on a standby server.  


# 如果有2个或2个以上的备库,可以考虑使用同步多副本模式。 根据实际情况设置  
[PostgreSQL 一主多从(多副本,强同步)简明手册 - 配置、压测、监控、切换、防脑裂、修复、0丢失 - 珍藏级》](201803/20180326_01.md)    
#synchronous_standby_names = '' # standby servers that provide sync rep  
                                # method to choose sync standbys, number of sync standbys,  
                                # and comma-separated list of application_name  
                                # from standby(s); '*' = all  

# 注意,容易导致膨胀,容易导致VACUUM进程空转,导致IO和CPU飙升。(特别是autovacuum naptime配置很小时)  
#vacuum_defer_cleanup_age = 0   # number of xacts by which cleanup is delayed  

# - Standby Servers -  

# These settings are ignored on a master server.  

#hot_standby = on                       # "off" disallows queries during recovery  
                                        # (change requires restart)  
#max_standby_archive_delay = 30s        # max delay before canceling queries  
                                        # when reading WAL from archive;  
                                        # -1 allows indefinite delay  
#max_standby_streaming_delay = 30s      # max delay before canceling queries  
                                        # when reading streaming WAL;  
                                        # -1 allows indefinite delay  
#wal_receiver_status_interval = 10s     # send replies at least this often  
                                        # 0 disables  

# 建议关闭,以免备库长事务导致 主库无法回收垃圾而膨胀。  
[PostgreSQL物理"备库"的哪些操作或配置,可能影响"主库"的性能、垃圾回收、IO波动](201704/20170410_03.md)    
#hot_standby_feedback = off             # send info from standby to prevent  
                                        # query conflicts  
#wal_receiver_timeout = 60s             # time that receiver waits for  
                                        # communication from master  
                                        # in milliseconds; 0 disables  
#wal_retrieve_retry_interval = 5s       # time to wait before retrying to  
                                        # retrieve WAL after a failed attempt  

# - Subscribers -  

# These settings are ignored on a publisher.  

# [《PostgreSQL 10.0 preview 逻辑订阅 - 原理与最佳实践》](201702/20170227_01.md)    
# These settings are ignored on a publisher.   
# 必须小于  max_worker_processes  
#max_logical_replication_workers = 4    # taken from max_worker_processes  
                                        # (change requires restart)  
#max_sync_workers_per_subscription = 2  # taken from max_logical_replication_workers  


#------------------------------------------------------------------------------  
# QUERY TUNING  
#------------------------------------------------------------------------------  

# - Planner Method Configuration -  

#enable_bitmapscan = on  
#enable_hashagg = on  
#enable_hashjoin = on  
#enable_indexscan = on  
#enable_indexonlyscan = on  
#enable_material = on  
#enable_mergejoin = on  
#enable_nestloop = on  
#enable_parallel_append = on  
#enable_seqscan = on  
#enable_sort = on  
#enable_tidscan = on  
#enable_partitionwise_join = off  
#enable_partitionwise_aggregate = off  
#enable_parallel_hash = on  
#enable_partition_pruning = on  

# - Planner Cost Constants -  

#seq_page_cost = 1.0                    # measured on an arbitrary scale  
# 离散IO很好的机器(例如ssd, nvme ssd),不需要考虑离散和顺序扫描的成本差异   
random_page_cost = 1.1                 # same scale as above  
#cpu_tuple_cost = 0.01                  # same scale as above  
#cpu_index_tuple_cost = 0.005           # same scale as above  
#cpu_operator_cost = 0.0025             # same scale as above  
#parallel_tuple_cost = 0.1              # same scale as above  
#parallel_setup_cost = 1000.0   # same scale as above  

#jit_above_cost = 100000                # perform JIT compilation if available  
                                        # and query more expensive, -1 disables  
#jit_optimize_above_cost = 500000       # optimize JITed functions if query is  
                                        # more expensive, -1 disables  
#jit_inline_above_cost = 500000         # attempt to inline operators and  
                                        # functions if query is more expensive,  
                                        # -1 disables  

#min_parallel_table_scan_size = 8MB  
#min_parallel_index_scan_size = 512kB  

# 扣掉会话连接RSS,shared buffer, autovacuum worker, 剩下的都是OS可用的CACHE。  
effective_cache_size = 80GB  

# - Genetic Query Optimizer -  

#geqo = on  
#geqo_threshold = 12  
#geqo_effort = 5                        # range 1-10  
#geqo_pool_size = 0                     # selects default based on effort  
#geqo_generations = 0                   # selects default based on effort  
#geqo_selection_bias = 2.0              # range 1.5-2.0  
#geqo_seed = 0.0                        # range 0.0-1.0  

# - Other Planner Options -  

#default_statistics_target = 100        # range 1-10000  
#constraint_exclusion = partition       # on, off, or partition  
#cursor_tuple_fraction = 0.1            # range 0.0-1.0  
#from_collapse_limit = 8  
#join_collapse_limit = 8                # 1 disables collapsing of explicit  
                                        # JOIN clauses  
#force_parallel_mode = off  


#------------------------------------------------------------------------------  
# REPORTING AND LOGGING  
#------------------------------------------------------------------------------  

# - Where to Log -  

log_destination = 'csvlog'              # Valid values are combinations of  
                                        # stderr, csvlog, syslog, and eventlog,  
                                        # depending on platform.  csvlog  
                                        # requires logging_collector to be on.  

# This is used when logging to stderr:  
logging_collector = on                  # Enable capturing of stderr and csvlog  
                                        # into log files. Required to be on for  
                                        # csvlogs.  
                                        # (change requires restart)  

# These are only used if logging_collector is on:  
log_directory = 'log'                   # directory where log files are written,  
                                        # can be absolute or relative to PGDATA  
log_filename = 'postgresql-%a.log'      # log file name pattern,  
                                        # can include strftime() escapes  
#log_file_mode = 0600                   # creation mode for log files,  
                                        # begin with 0 to use octal notation  
log_truncate_on_rotation = on           # If on, an existing log file with the  
                                        # same name as the new log file will be  
                                        # truncated rather than appended to.  
                                        # But such truncation only occurs on  
                                        # time-driven rotation, not on restarts  
                                        # or size-driven rotation.  Default is  
                                        # off, meaning append to existing files  
                                        # in all cases.  
log_rotation_age = 1d                   # Automatic rotation of logfiles will  
                                        # happen after that time.  0 disables.  
log_rotation_size = 0                   # Automatic rotation of logfiles will  
                                        # happen after that much log output.  
                                        # 0 disables.  

# These are relevant when logging to syslog:  
#syslog_facility = 'LOCAL0'  
#syslog_ident = 'postgres'  
#syslog_sequence_numbers = on  
#syslog_split_messages = on  

# This is only relevant when logging to eventlog (win32):  
# (change requires restart)  
#event_source = 'PostgreSQL'  

# - When to Log -  

#client_min_messages = notice           # values in order of decreasing detail:  
                                        #   debug5  
                                        #   debug4  
                                        #   debug3  
                                        #   debug2  
                                        #   debug1  
                                        #   log  
                                        #   notice  
                                        #   warning  
                                        #   error  

#log_min_messages = warning             # values in order of decreasing detail:  
                                        #   debug5  
                                        #   debug4  
                                        #   debug3  
                                        #   debug2  
                                        #   debug1  
                                        #   info  
                                        #   notice  
                                        #   warning  
                                        #   error  
                                        #   log  
                                        #   fatal  
                                        #   panic  

#log_min_error_statement = error        # values in order of decreasing detail:  
                                        #   debug5  
                                        #   debug4  
                                        #   debug3  
                                        #   debug2  
                                        #   debug1  
                                        #   info  
                                        #   notice  
                                        #   warning  
                                        #   error  
                                        #   log  
                                        #   fatal  
                                        #   panic (effectively off)  

# 根据实际情况设定,例如业务上认为5秒以上是慢SQL,那么就设置为5秒。  
log_min_duration_statement = 5s        # -1 is disabled, 0 logs all statements  
                                        # and their durations, > 0 logs only  
                                        # statements running at least this number  
                                        # of milliseconds  


# - What to Log -  

#debug_print_parse = off  
#debug_print_rewritten = off  
#debug_print_plan = off  
#debug_pretty_print = on  
log_checkpoints = on   

# 如果业务是短连接,建议设置为OFF,否则建议设置为ON  
log_connections = on  

# 如果业务是短连接,建议设置为OFF,否则建议设置为ON  
log_disconnections = on  
#log_duration = off  
log_error_verbosity = verbose    # terse, default, or verbose messages  
#log_hostname = off  
log_line_prefix = '%m [%p] '            # special values:  
                                        #   %a = application name  
                                        #   %u = user name  
                                        #   %d = database name  
                                        #   %r = remote host and port  
                                        #   %h = remote host  
                                        #   %p = process ID  
                                        #   %t = timestamp without milliseconds  
                                        #   %m = timestamp with milliseconds  
                                        #   %n = timestamp with milliseconds (as a Unix epoch)  
                                        #   %i = command tag  
                                        #   %e = SQL state  
                                        #   %c = session ID  
                                        #   %l = session line number  
                                        #   %s = session start timestamp  
                                        #   %v = virtual transaction ID  
                                        #   %x = transaction ID (0 if none)  
                                        #   %q = stop here in non-session  
                                        #        processes  
                                        #   %% = '%'  
                                        # e.g. '<%u%%%d> '  
#log_lock_waits = off                   # log lock waits >= deadlock_timeout  

# 如果需要审计SQL,则可以设置为all  
#log_statement = 'none'                 # none, ddl, mod, all  
#log_replication_commands = off  
#log_temp_files = -1                    # log temporary files equal or larger  
                                        # than the specified size in kilobytes;  
                                        # -1 disables, 0 logs all temp files  
log_timezone = 'PRC'    

#------------------------------------------------------------------------------  
# PROCESS TITLE  
#------------------------------------------------------------------------------  

#cluster_name = ''                      # added to process titles if nonempty  
                                        # (change requires restart)  
#update_process_title = on  


#------------------------------------------------------------------------------  
# STATISTICS  
#------------------------------------------------------------------------------  

# - Query and Index Statistics Collector -  

#track_activities = on  
#track_counts = on  

# 跟踪IO耗时会带来一定的性能影响,默认是关闭的  
# 如果需要统计IO的时间开销,设置为ON  
# 建议用pg_test_timing测试一下获取时间的开销,如果开销很大,建议关闭这个时间跟踪。  
#track_io_timing = off  
#track_functions = none                 # none, pl, all  
#track_activity_query_size = 1024       # (change requires restart)  
#stats_temp_directory = 'pg_stat_tmp'  


# - Monitoring -  

#log_parser_stats = off  
#log_planner_stats = off  
#log_executor_stats = off  
#log_statement_stats = off  


#------------------------------------------------------------------------------  
# AUTOVACUUM  
#------------------------------------------------------------------------------  

#autovacuum = on                        # Enable autovacuum subprocess?  'on'  
                                        # requires track_counts to also be on.  
log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and  
                                        # their durations, > 0 logs only  
                                        # actions running at least this number  
                                        # of milliseconds.  

# CPU核多,并且IO好的情况下,可多点,但是注意最多可能消耗这么多内存:   
# autovacuum_max_workers * autovacuum mem(autovacuum_work_mem),  
# 会消耗较多内存,所以内存也要有基础。       
# 当DELETE\UPDATE非常频繁时,建议设置多一点,防止膨胀严重      
autovacuum_max_workers = 8              # max number of autovacuum subprocesses  
                                        # (change requires restart)  

# 建议不要太高频率,否则会因为vacuum产生较多的XLOG。或者在某些垃圾回收不掉的情况下(例如长事务、feed back on,等),导致一直触发vacuum,CPU和IO都会升高  
[PostgreSQL垃圾回收代码分析 - why postgresql cann't reclaim tuple is HEAPTUPLE_RECENTLY_DEAD》](201505/20150503_01.md)    
[PostgreSQL物理"备库"的哪些操作或配置,可能影响"主库"的性能、垃圾回收、IO波动](201704/20170410_03.md)    
#autovacuum_naptime = 1min              # time between autovacuum runs  
#autovacuum_vacuum_threshold = 50       # min number of row updates before  
                                        # vacuum  
#autovacuum_analyze_threshold = 50      # min number of row updates before  
                                        # analyze  
#autovacuum_vacuum_scale_factor = 0.2   # fraction of table size before vacuum  
#autovacuum_analyze_scale_factor = 0.1  # fraction of table size before analyze  

# 除了设置较大的FREEZE值。  
# 还是需要注意FREEZE风暴  [《PostgreSQL Freeze 风暴预测续 - 珍藏级SQL》](201804/20180411_01.md)    
# 表级定制freeze  
autovacuum_freeze_max_age = 1200000000  # maximum XID age before forced vacuum  
                                        # (change requires restart)  
autovacuum_multixact_freeze_max_age = 1400000000        # maximum multixact age  
                                        # before forced vacuum  
                                        # (change requires restart)  

# 如果数据库UPDATE非常频繁,建议设置为0。并且建议使用SSD  
autovacuum_vacuum_cost_delay = 0ms      # default vacuum cost delay for  
                                        # autovacuum, in milliseconds;  
                                        # -1 means use vacuum_cost_delay  
#autovacuum_vacuum_cost_limit = -1      # default vacuum cost limit for  
                                        # autovacuum, -1 means use  
                                        # vacuum_cost_limit  


#------------------------------------------------------------------------------  
# CLIENT CONNECTION DEFAULTS  
#------------------------------------------------------------------------------  

# - Statement Behavior -  

#search_path = '"$user", public'        # schema names  
#row_security = on  
#default_tablespace = ''                # a tablespace name, '' uses the default  
#temp_tablespaces = ''                  # a list of tablespace names, '' uses  
                                        # only default tablespace  
#check_function_bodies = on  
#default_transaction_isolation = 'read committed'  
#default_transaction_read_only = off  
#default_transaction_deferrable = off  
#session_replication_role = 'origin'  

# 可以用来防止风暴,但是不建议全局设置  
#statement_timeout = 0                  # in milliseconds, 0 is disabled  

# 执行DDL时,建议加上超时  
#lock_timeout = 0                       # in milliseconds, 0 is disabled  

# 空闲中事务自动清理,根据业务实际情况设置  
#idle_in_transaction_session_timeout = 0        # in milliseconds, 0 is disabled  

#vacuum_freeze_min_age = 50000000  
vacuum_freeze_table_age = 1150000000  
#vacuum_multixact_freeze_min_age = 5000000  
vacuum_multixact_freeze_table_age = 1150000000  
#vacuum_cleanup_index_scale_factor = 0.1        # fraction of total number of tuples  
                                                # before index cleanup, 0 always performs  
                                                # index cleanup  
#bytea_output = 'hex'                   # hex, escape  
#xmlbinary = 'base64'  
#xmloption = 'content'  

# 限制GIN扫描的返回结果集大小,在想限制超多匹配的返回时可以设置  
#gin_fuzzy_search_limit = 0  

# GIN索引pending list的大小  
#gin_pending_list_limit = 4MB  

# - Locale and Formatting -  

datestyle = 'iso, mdy'  
#intervalstyle = 'postgres'  
timezone = 'PRC'  
#timezone_abbreviations = 'Default'     # Select the set of available time zone  
                                        # abbreviations.  Currently, there are  
                                        #   Default  
                                        #   Australia (historical usage)  
                                        #   India  
                                        # You can create your own file in  
                                        # share/timezonesets/.  
#extra_float_digits = 0                 # min -15, max 3  
#client_encoding = sql_ascii            # actually, defaults to database  
                                        # encoding  

# These settings are initialized by initdb, but they can be changed.  
lc_messages = 'C'                       # locale for system error message  
                                        # strings  
lc_monetary = 'C'                       # locale for monetary formatting  
lc_numeric = 'C'                        # locale for number formatting  
lc_time = 'C'                           # locale for time formatting  

# default configuration for text search  
default_text_search_config = 'pg_catalog.english'  

# - Shared Library Preloading -  

# 需要加载什么LIB,预先加载,对于经常访问的库也建议预加载,例如postgis  
#shared_preload_libraries = 'pg_jieba,pipelinedb'        # (change requires restart)  
#local_preload_libraries = ''  
#session_preload_libraries = ''  

# - Other Defaults -  

#dynamic_library_path = '$libdir'  

jit = off                               # allow JIT compilation  
#jit_provider = 'llvmjit'               # JIT implementation to use  

#------------------------------------------------------------------------------  
# LOCK MANAGEMENT  
#------------------------------------------------------------------------------  

#deadlock_timeout = 1s  
#max_locks_per_transaction = 64         # min 10  
                                        # (change requires restart)  
#max_pred_locks_per_transaction = 64    # min 10  
                                        # (change requires restart)  
#max_pred_locks_per_relation = -2       # negative values mean  
                                        # (max_pred_locks_per_transaction  
                                        #  / -max_pred_locks_per_relation) - 1  
#max_pred_locks_per_page = 2            # min 0  


#------------------------------------------------------------------------------  
# VERSION AND PLATFORM COMPATIBILITY  
#------------------------------------------------------------------------------  

# - Previous PostgreSQL Versions -  

#array_nulls = on  
#backslash_quote = safe_encoding        # on, off, or safe_encoding  
#default_with_oids = off  

# [《PostgreSQL 转义、UNICODE、与SQL注入》](201704/20170402_01.md)    
#escape_string_warning = on  
#lo_compat_privileges = off  
#operator_precedence_warning = off  
#quote_all_identifiers = off  
#standard_conforming_strings = on  
#synchronize_seqscans = on  

# - Other Platforms and Clients -  

#transform_null_equals = off  


#------------------------------------------------------------------------------  
# ERROR HANDLING  
#------------------------------------------------------------------------------  

#exit_on_error = off                    # terminate session on any error?  
#restart_after_crash = on               # reinitialize after backend crash?  


#------------------------------------------------------------------------------  
# CONFIG FILE INCLUDES  
#------------------------------------------------------------------------------  

# These options allow settings to be loaded from files other than the  
# default postgresql.conf.  

#include_dir = 'conf.d'                 # include files ending in '.conf' from  
                                        # directory 'conf.d'  
#include_if_exists = 'exists.conf'      # include file only if it exists  
#include = 'special.conf'               # include file  


#------------------------------------------------------------------------------  
# CUSTOMIZED OPTIONS  
#------------------------------------------------------------------------------  

# Add settings for extensions here  

64G内存,16核,SSD机器的配置例子

listen_addresses = '0.0.0.0'  
port = 1921  
max_connections = 2000  
superuser_reserved_connections = 13  
unix_socket_directories = '/tmp, .'  
unix_socket_permissions = 0700  
tcp_keepalives_idle = 60  
tcp_keepalives_interval = 10  
tcp_keepalives_count = 10  
shared_buffers = 8GB  
max_prepared_transactions = 2000  
maintenance_work_mem = 1GB  
vacuum_cost_delay = 0  
bgwriter_delay = 10ms  
bgwriter_lru_maxpages = 1000  
bgwriter_lru_multiplier = 10.0  
effective_io_concurrency = 0  
max_worker_processes = 128  
max_parallel_maintenance_workers = 8  
max_parallel_workers_per_gather = 8  
max_parallel_workers = 10  
wal_level = replica  
synchronous_commit = off  
full_page_writes = on  
wal_compression = on  
wal_buffers = 64MB  
wal_writer_delay = 10ms  
checkpoint_timeout = 30min  
max_wal_size = 16GB  
min_wal_size = 4GB  
checkpoint_completion_target = 0.1  
archive_mode = on  
archive_command = '/bin/date'  
max_wal_senders = 16  
max_standby_archive_delay = 300s  
max_standby_streaming_delay = 300s  
hot_standby_feedback = off  
max_logical_replication_workers = 10  

# 以下两个参数默认关闭, 分区表JOIN时使用并行JOIN
# [《PostgreSQL 11 preview - 分区表智能并行JOIN (已类似MPP架构,性能暴增)》](201802/20180202_02.md)  
enable_partitionwise_join = on  
enable_partitionwise_aggregate = on  

enable_parallel_hash = on  
enable_partition_pruning = on  
random_page_cost = 1.1  
effective_cache_size = 48GB  

# 如果使用LLVM编译的话,并且业务有复杂SQL的情况下。建议开启JIT。
# jit = on  
log_destination = 'csvlog'  
logging_collector = on  
log_directory = 'log'  
log_filename = 'postgresql-%a.log'  
log_truncate_on_rotation = on  
log_rotation_age = 1d  
log_rotation_size = 0  
log_min_duration_statement = 5s  
log_checkpoints = on  
log_connections = on  
log_disconnections = on  
log_error_verbosity = verbose     
log_line_prefix = '%m [%p] '  
log_lock_waits = on  
log_statement = 'ddl'  
track_activity_query_size = 2048  
autovacuum = on  
log_autovacuum_min_duration = 0  
autovacuum_max_workers = 8  
autovacuum_freeze_max_age = 1200000000  
autovacuum_multixact_freeze_max_age = 1400000000  
autovacuum_vacuum_cost_delay = 0ms  
# statement_timeout = 45min  
lock_timeout = 15s                                
idle_in_transaction_session_timeout = 60s  
vacuum_freeze_table_age = 1150000000  
vacuum_multixact_freeze_table_age = 1150000000  
# shared_preload_libraries = 'pg_stat_statements'  
deadlock_timeout = 1s  

pg_hba.conf 数据库防火墙配置模板

# TYPE  DATABASE        USER            ADDRESS                 METHOD  

# "local" is for Unix domain socket connections only  
local   all             all                                     trust  
# IPv4 local connections:  
host    all             all             127.0.0.1/32            trust  
# IPv6 local connections:  
host    all             all             ::1/128                 trust  
# Allow replication connections from localhost, by a user with the  
# replication privilege.  
local   replication     all                                     trust  
host    replication     all             127.0.0.1/32            trust  
host    replication     all             ::1/128                 trust  

# 禁止超级用户从远程连接  
host all postgres 0.0.0.0/0 reject  

# 应用连接配置:哪个用户,从哪里来,连接什么数据库。规则为使用何种认证方法,或拒绝?  
# TYPE  DATABASE        USER            ADDRESS                 METHOD  

# 如果不想挨个配置,可以使用如下配置,允许所有来源,通过任意用户访问任意数据库  
host all all 0.0.0.0/0 md5  

Flag Counter

digoal's 大量PostgreSQL文章入口

免费领取阿里云RDS PostgreSQL实例、ECS虚拟机

linux master password postgresql 主从 安装 配置

从入门到差点放弃,postgresql极简安装+主从配置

postgresql安装,配置主从

安装

解压后

pgsql/bin/initdb -D /usr/local/pgsql/data
local/pgsql/bin/postgres -D /usr/local/pgsql/data >logfile 2>&1 &
local/pgsql/bin/createdb test
local/pgsql/bin/psql test

账户设置

创建用户

CREATE USER oschina WITH PASSWORD 'oschina123';
CREATE ROLE adam WITH LOGIN CREATEDB PASSWORD '654321';  记得添加login权限

改密码

ALTER ROLE davide WITH PASSWORD 'hu8jmn3';

让一个角色能够创建其他角色和新的数据库:

ALTER ROLE miriam CREATEROLE CREATEDB;

查看所有数据库

psql -l 

删除数据库

dropdb mydb

使用数据库

psql mydb

创建数据库

createdb mydb

配置主从

1.主创建同步账号

CREATE USER replica replication LOGIN CONNECTION LIMIT 3 ENCRYPTED PASSWORD 'replica';

2,postgresql.conf

wal_level = hot_standby  # 这个是设置主为wal的主机

max_wal_senders = 32 # 这个设置了可以最多有几个流复制连接,差不多有几个从,就设置几个
wal_keep_segments = 256 # 设置流复制保留的最多的xlog数目
wal_sender_timeout = 60s # 设置流复制主机发送数据的超时时间
max_connections = 100 # 这个设置要注意下,从库的max_connections必须要大于主库的

3,

pg_hda.conf

host    all     all     0.0.0.0/0       md5

4,

pg_basebackup -F p --progress -D /data/replica -h 192.168.1.12 -p 5432 -U replica --password 

5,复制recovery.conf

6,re的内容

standby_mode = on  # 这个说明这台机器为从库
primary_conninfo = 'host=10.12.12.10 port=5432 user=replica password=replica'  # 这个说明这台机器对应主库的信息

recovery_target_timeline = 'latest' # 这个说明这个流复制同步到最新的数据

postgresql。conf

max_connections = 1000 # 一般查多于写的应用从库的最大连接数要比较大

hot_standby = on  # 说明这台机器不仅仅是用于数据归档,也用于数据查询
max_standby_streaming_delay = 30s # 数据流备份的最大延迟时间
wal_receiver_status_interval = 1s  # 多久向主报告一次从的状态,当然从每次数据复制都会向主报告状态,这里只是设置最长的间隔时间
hot_standby_feedback = on # 如果有错误的数据复制,是否向主进行反馈

测试成果

主的机器上sender进程 从的机器上receiver进程

主的机器上

select * from pg_stat_replication;
pid              | 8467       # sender的进程
usesysid         | 44673      # 复制的用户id
usename          | replica    # 复制的用户用户名
application_name | walreceiver  
client_addr      | 10.12.12.12 # 复制的客户端地址
client_hostname  |
client_port      | 55804  # 复制的客户端端口
backend_start    | 2015-05-12 07:31:16.972157+08  # 这个主从搭建的时间
backend_xmin     |
state            | streaming  # 同步状态 startup: 连接中、catchup: 同步中、streaming: 同步
sent_location    | 3/CF123560 # Master传送WAL的位置
write_location   | 3/CF123560 # Slave接收WAL的位置
flush_location   | 3/CF123560 # Slave同步到磁盘的WAL位置
replay_location  | 3/CF123560 # Slave同步到数据库的WAL位置
sync_priority    | 0  #同步Replication的优先度
                      0: 异步、1~?: 同步(数字越小优先度越高)
sync_state       | async  # 有三个值,async: 异步、sync: 同步、potential: 虽然现在是异步模式,但是有可能升级到同步模式

最后注意几个坑

  • systemd 启动的话 配置文件可能在etc下

  • hba配置文件放开ip

  • 创建用户时给的权限

  • 记得给文件夹postgres的 用户组和用户身份

install linux postgresql 最佳部署

【转】PostgreSQL on Linux 最佳部署手册

PostgreSQL其实安装很简单,但是那仅仅是可用,并不是好用。

作者

digoal

日期

2016-11-21

标签

Linux , PostgreSQL , Install , 最佳部署


背景

数据库的安装一直以来都挺复杂的,特别是Oracle,现在身边都还有安装Oracle数据库赚外快的事情。

PostgreSQL其实安装很简单,但是那仅仅是可用,并不是好用。很多用户使用默认的方法安装好数据库之后,然后测试一通性能,发现性能不行就不用了。

原因不用说,多方面没有优化的结果。

PostgreSQL数据库为了适应更多的场景能使用,默认的参数都设得非常保守,通常需要优化,比如检查点,SHARED BUFFER等。

本文将介绍一下PostgreSQL on Linux的最佳部署方法,其实在我的很多文章中都有相关的内容,但是没有总结成一篇文档。

OS与硬件认证检查

目的是确认服务器与OS通过certification

Intel Xeon v3和v4的cpu,能支持的RHEL的最低版本是不一样的,

详情请见:https://access.redhat.com/support/policy/intel

Intel Xeon v3和v4的cpu,能支持的Oracle Linux 的最低版本是不一样的,

详情请见:http://linux.oracle.com/pls/apex/f?p=117:1

第一:RedHat生态系统--来自RedHat的认证列表https://access.redhat.com/ecosystem

第二:Oracle Linux 对服务器和存储的硬件认证列表 http://linux.oracle.com/pls/apex/f?p=117:1

安装常用包

# yum -y install coreutils glib2 lrzsz mpstat dstat sysstat e4fsprogs xfsprogs ntp readline-devel zlib-devel openssl-devel pam-devel libxml2-devel libxslt-devel python-devel tcl-devel gcc make smartmontools flex bison perl-devel perl-ExtUtils* openldap-devel jadetex  openjade bzip2

配置OS内核参数

1. sysctl

注意某些参数,根据内存大小配置(已说明)

含义详见

《DBA不可不知的操作系统内核参数》

# vi /etc/sysctl.conf

# add by digoal.zhou
fs.aio-max-nr = 1048576
fs.file-max = 76724600
kernel.core_pattern= /data01/corefiles/core_%e_%u_%t_%s.%p         
# /data01/corefiles事先建好,权限777,如果是软链接,对应的目录修改为777
kernel.sem = 4096 2147483647 2147483646 512000    
# 信号量, ipcs -l-u 查看,每16个进程一组,每组信号量需要17个信号量。
kernel.shmall = 107374182      
# 所有共享内存段相加大小限制(建议内存的80%)
kernel.shmmax = 274877906944   
# 最大单个共享内存段大小(建议为内存一半), >9.2的版本已大幅降低共享内存的使用
kernel.shmmni = 819200         
# 一共能生成多少共享内存段,每个PG数据库集群至少2个共享内存段
net.core.netdev_max_backlog = 10000
net.core.rmem_default = 262144       
# The default setting of the socket receive buffer in bytes.
net.core.rmem_max = 4194304          
# The maximum receive socket buffer size in bytes
net.core.wmem_default = 262144       
# The default setting (in bytes) of the socket send buffer.
net.core.wmem_max = 4194304          
# The maximum send socket buffer size in bytes.
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_keepalive_intvl = 20
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_mem = 8388608 12582912 16777216
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syncookies = 1    
# 开启SYN Cookies。当出现SYN等待队列溢出时,启用cookie来处理,可防范少量的SYN攻击
net.ipv4.tcp_timestamps = 1    
# 减少time_wait
net.ipv4.tcp_tw_recycle = 0    
# 如果=1则开启TCP连接中TIME-WAIT套接字的快速回收,但是NAT环境可能导致连接失败,建议服务端关闭它
net.ipv4.tcp_tw_reuse = 1      
# 开启重用。允许将TIME-WAIT套接字重新用于新的TCP连接
net.ipv4.tcp_max_tw_buckets = 262144
net.ipv4.tcp_rmem = 8192 87380 16777216
net.ipv4.tcp_wmem = 8192 65536 16777216
net.nf_conntrack_max = 1200000
net.netfilter.nf_conntrack_max = 1200000
vm.dirty_background_bytes = 409600000       
#  系统脏页到达这个值,系统后台刷脏页调度进程 pdflush(或其他) 自动将(dirty_expire_centisecs/100)秒前的脏页刷到磁盘
vm.dirty_expire_centisecs = 3000             
#  比这个值老的脏页,将被刷到磁盘。3000表示30秒。
vm.dirty_ratio = 95                          
#  如果系统进程刷脏页太慢,使得系统脏页超过内存 95 % 时,则用户进程如果有写磁盘的操作(如fsync, fdatasync等调用),则需要主动把系统脏页刷出。
#  有效防止用户进程刷脏页,在单机多实例,并且使用CGROUP限制单实例IOPS的情况下非常有效。  
vm.dirty_writeback_centisecs = 100            
#  pdflush(或其他)后台刷脏页进程的唤醒间隔, 100表示1秒。
vm.mmap_min_addr = 65536
vm.overcommit_memory = 0     
#  在分配内存时,允许少量over malloc, 如果设置为 1, 则认为总是有足够的内存,内存较少的测试环境可以使用 1 .  
vm.overcommit_ratio = 90     
#  当overcommit_memory = 2 时,用于参与计算允许指派的内存大小。
vm.swappiness = 0            
#  关闭交换分区
vm.zone_reclaim_mode = 0     
# 禁用 numa, 或者在vmlinux中禁止. 
net.ipv4.ip_local_port_range = 40000 65535    
# 本地自动分配的TCP, UDP端口号范围
fs.nr_open=20480000
# 单个进程允许打开的文件句柄上限

# 以下参数请注意
# vm.extra_free_kbytes = 4096000
# vm.min_free_kbytes = 2097152
# 如果是小内存机器,以上两个值不建议设置
# vm.nr_hugepages = 66536    
#  建议shared buffer设置超过64GB时 使用大页,页大小 /proc/meminfo Hugepagesize
# vm.lowmem_reserve_ratio = 1 1 1
# 对于内存大于64G时,建议设置,否则建议默认值 256 256 32

2. 生效配置

sysctl -p

配置OS资源限制

# vi /etc/security/limits.conf

# nofile超过1048576的话,一定要先将sysctl的fs.nr_open设置为更大的值,并生效后才能继续设置nofile.

* soft    nofile  1024000
* hard    nofile  1024000
* soft    nproc   unlimited
* hard    nproc   unlimited
* soft    core    unlimited
* hard    core    unlimited
* soft    memlock unlimited
* hard    memlock unlimited

最好在关注一下/etc/security/limits.d目录中的文件内容,会覆盖limits.conf的配置。

已有进程的ulimit请查看/proc/pid/limits,例如

Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             11286                11286                processes 
Max open files            1024                 4096                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       11286                11286                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us

如果你要启动其他进程,建议退出SHELL再进一遍,确认ulimit环境配置已生效,再启动。

配置OS防火墙

(建议按业务场景设置,我这里先清掉)

iptables -F

配置范例

# 私有网段
-A INPUT -s 192.168.0.0/16 -j ACCEPT
-A INPUT -s 10.0.0.0/8 -j ACCEPT
-A INPUT -s 172.16.0.0/16 -j ACCEPT

selinux

如果没有这方面的需求,建议禁用

# vi /etc/sysconfig/selinux 

SELINUX=disabled
SELINUXTYPE=targeted

关闭不必要的OS服务

chkconfig --list|grep on  
关闭不必要的,例如 
chkconfig iscsi off

部署文件系统

注意SSD对齐,延长寿命,避免写放大。

parted -s /dev/sda mklabel gpt
parted -s /dev/sda mkpart primary 1MiB 100%

格式化(如果你选择ext4的话)

mkfs.ext4 /dev/sda1 -m 0 -O extent,uninit_bg -E lazy_itable_init=1 -T largefile -L u01

建议使用的ext4 mount选项

# vi /etc/fstab

LABEL=u01 /u01     ext4        defaults,noatime,nodiratime,nodelalloc,barrier=0,data=writeback    0 0

# mkdir /u01
# mount -a

为什么需要data=writeback?

pic

建议pg_xlog放到独立的IOPS性能贼好的块设备中。

设置SSD盘的调度为deadline

如果不是SSD的话,还是使用CFQ,否则建议使用DEADLINE。

临时设置(比如sda盘)

echo deadline > /sys/block/sda/queue/scheduler

永久设置

编辑grub文件修改块设备调度策略

vi /boot/grub.conf

elevator=deadline

注意,如果既有机械盘,又有SSD,那么可以使用/etc/rc.local,对指定磁盘修改为对应的调度策略。

关闭透明大页、numa

加上前面的默认IO调度,如下

vi /boot/grub.conf

elevator=deadline numa=off transparent_hugepage=never 

编译器

建议使用较新的编译器,安装 gcc 6.2.0 参考

《PostgreSQL clang vs gcc 编译》

如果已安装好,可以分发给不同的机器。

cd ~
tar -jxvf gcc6.2.0.tar.bz2
tar -jxvf python2.7.12.tar.bz2


# vi /etc/ld.so.conf

/home/digoal/gcc6.2.0/lib
/home/digoal/gcc6.2.0/lib64
/home/digoal/python2.7.12/lib

# ldconfig

环境变量

# vi ~/env_pg.sh

export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=$1
export PGDATA=/$2/digoal/pg_root$PGPORT
export LANG=en_US.utf8
export PGHOME=/home/digoal/pgsql9.6
export LD_LIBRARY_PATH=/home/digoal/gcc6.2.0/lib:/home/digoal/gcc6.2.0/lib64:/home/digoal/python2.7.12/lib:$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export PATH=/home/digoal/gcc6.2.0/bin:/home/digoal/python2.7.12/bin:/home/digoal/cmake3.6.3/bin:$PGHOME/bin:$PATH:.
export DATE=`date +"%Y%m%d%H%M"`
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi

icc, clang

如果你想使用ICC或者clang编译PostgreSQL,请参考

《[转载]用intel编译器icc编译PostgreSQL》

《PostgreSQL clang vs gcc 编译》

编译PostgreSQL

建议使用NAMED_POSIX_SEMAPHORES

src/backend/port/posix_sema.c

create sem : 
named :
mySem = sem_open(semname, O_CREAT | O_EXCL,
(mode_t) IPCProtection, (unsigned) 1);


unamed :
/*
* PosixSemaphoreCreate
*
* Attempt to create a new unnamed semaphore.
*/
static void
PosixSemaphoreCreate(sem_t * sem)
{
if (sem_init(sem, 1, 1) < 0)
elog(FATAL, "sem_init failed: %m");
}


remove sem : 

#ifdef USE_NAMED_POSIX_SEMAPHORES
/* Got to use sem_close for named semaphores */
if (sem_close(sem) < 0)
elog(LOG, "sem_close failed: %m");
#else
/* Got to use sem_destroy for unnamed semaphores */
if (sem_destroy(sem) < 0)
elog(LOG, "sem_destroy failed: %m");
#endif

编译项

. ~/env_pg.sh 1921 u01

cd postgresql-9.6.1
export USE_NAMED_POSIX_SEMAPHORES=1
LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O3 -flto" ./configure --prefix=/home/digoal/pgsql9.6
LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O3 -flto" make world -j 64
LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O3 -flto" make install-world

如果你是开发环境,需要调试,建议这样编译。

cd postgresql-9.6.1
export USE_NAMED_POSIX_SEMAPHORES=1
LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O0 -flto -g -ggdb -fno-omit-frame-pointer" ./configure --prefix=/home/digoal/pgsql9.6 --enable-cassert
LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O0 -flto -g -ggdb -fno-omit-frame-pointer" make world -j 64
LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O0 -flto -g -ggdb -fno-omit-frame-pointer" make install-world

初始化数据库集群

pg_xlog建议放在IOPS最好的分区。

. ~/env_pg.sh 1921 u01
initdb -D $PGDATA -E UTF8 --locale=C -U postgres -X /u02/digoal/pg_xlog$PGPORT

配置postgresql.conf

以PostgreSQL 9.6, 512G内存主机为例

最佳到文件末尾即可,重复的会以末尾的作为有效值。  

$ vi postgresql.conf

listen_addresses = '0.0.0.0'
port = 1921
max_connections = 5000
unix_socket_directories = '.'
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 128GB                      # 1/4 主机内存
maintenance_work_mem = 2GB                  # min( 2G, (1/4 主机内存)/autovacuum_max_workers )
dynamic_shared_memory_type = posix
vacuum_cost_delay = 0
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 10.0
bgwriter_flush_after = 0                    # IO很好的机器,不需要考虑平滑调度
max_worker_processes = 128
max_parallel_workers_per_gather = 0         #  如果需要使用并行查询,设置为大于1 ,不建议超过 主机cores-2
old_snapshot_threshold = -1
backend_flush_after = 0  # IO很好的机器,不需要考虑平滑调度, 否则建议128~256kB
wal_level = replica
synchronous_commit = off
full_page_writes = on   # 支持原子写超过BLOCK_SIZE的块设备,在对齐后可以关闭。或者支持cow的文件系统可以关闭。
wal_buffers = 1GB       # min( 2047MB, shared_buffers/32 ) = 512MB
wal_writer_delay = 10ms
wal_writer_flush_after = 0  # IO很好的机器,不需要考虑平滑调度, 否则建议128~256kB
checkpoint_timeout = 30min  # 不建议频繁做检查点,否则XLOG会产生很多的FULL PAGE WRITE(when full_page_writes=on)。
max_wal_size = 256GB       # 建议是SHARED BUFFER的2倍
min_wal_size = 64GB        # max_wal_size/4
checkpoint_completion_target = 0.05          # 硬盘好的情况下,可以让检查点快速结束,恢复时也可以快速达到一致状态。否则建议0.5~0.9
checkpoint_flush_after = 0                   # IO很好的机器,不需要考虑平滑调度, 否则建议128~256kB
archive_mode = on
archive_command = '/bin/date'      #  后期再修改,如  'test ! -f /disk1/digoal/arch/%f && cp %p /disk1/digoal/arch/%f'
max_wal_senders = 8
random_page_cost = 1.3  # IO很好的机器,不需要考虑离散和顺序扫描的成本差异
parallel_tuple_cost = 0
parallel_setup_cost = 0
min_parallel_relation_size = 0
effective_cache_size = 300GB                          # 看着办,扣掉会话连接RSS,shared buffer, autovacuum worker, 剩下的都是OS可用的CACHE。
force_parallel_mode = off
log_destination = 'csvlog'
logging_collector = on
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_timezone = 'PRC'
vacuum_defer_cleanup_age = 0
hot_standby_feedback = off                             # 建议关闭,以免备库长事务导致 主库无法回收垃圾而膨胀。
max_standby_archive_delay = 300s
max_standby_streaming_delay = 300s
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_max_workers = 16                            # CPU核多,并且IO好的情况下,可多点,但是注意16*autovacuum mem,会消耗较多内存,所以内存也要有基础。  
autovacuum_naptime = 45s                               # 建议不要太高频率,否则会因为vacuum产生较多的XLOG。
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.1
autovacuum_freeze_max_age = 1600000000
autovacuum_multixact_freeze_max_age = 1600000000
vacuum_freeze_table_age = 1500000000
vacuum_multixact_freeze_table_age = 1500000000
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
shared_preload_libraries='pg_stat_statements'

## 如果你的数据库有非常多小文件(比如有几十万以上的表,还有索引等,并且每张表都会被访问到时),建议FD可以设多一些,避免进程需要打开关闭文件。
## 但是不要大于前面章节系统设置的ulimit -n(open files)
max_files_per_process=655360

配置pg_hba.conf

避免不必要的访问,开放允许的访问,建议务必使用密码访问。

$ vi pg_hba.conf

host replication xx 0.0.0.0/0 md5  # 流复制

host all postgres 0.0.0.0/0 reject # 拒绝超级用户从网络登录
host all all 0.0.0.0/0 md5  # 其他用户登陆

启动数据库

pg_ctl start

好了,你的PostgreSQL数据库基本上部署好了,可以愉快的玩耍了。

Count

linux postgresql 内核参数

[转] DBA不可不知的操作系统内核参数

DBA不可不知的操作系统内核参数

作者

digoal

日期

2016-08-03

标签

PostgreSQL , 内核参数 , Linux


背景

操作系统为了适应更多的硬件环境,许多初始的设置值,宽容度都很高。

如果不经调整,这些值可能无法适应HPC,或者硬件稍好些的环境。

无法发挥更好的硬件性能,甚至可能影响某些应用软件的使用,特别是数据库。

数据库关心的OS内核参数

512GB 内存为例

1.

参数

fs.aio-max-nr  

支持系统

CentOS 6, 7       

参数解释

aio-nr & aio-max-nr:    
.  
aio-nr is the running total of the number of events specified on the    
io_setup system call for all currently active aio contexts.    
.  
If aio-nr reaches aio-max-nr then io_setup will fail with EAGAIN.    
.  
Note that raising aio-max-nr does not result in the pre-allocation or re-sizing    
of any kernel data structures.    
.  
aio-nr & aio-max-nr:    
.  
aio-nr shows the current system-wide number of asynchronous io requests.    
.  
aio-max-nr allows you to change the maximum value aio-nr can grow to.    

推荐设置

fs.aio-max-nr = 1xxxxxx  
.  
PostgreSQL, Greenplum 均未使用io_setup创建aio contexts. 无需设置。    
如果Oracle数据库,要使用aio的话,需要设置它。    
设置它也没什么坏处,如果将来需要适应异步IO,可以不需要重新修改这个设置。   

2.

参数

fs.file-max  

支持系统

CentOS 6, 7       

参数解释

file-max & file-nr:    
.  
The value in file-max denotes the maximum number of file handles that the Linux kernel will allocate.   
.  
When you get lots of error messages about running out of file handles,   
you might want to increase this limit.    
.  
Historically, the kernel was able to allocate file handles dynamically,   
but not to free them again.     
.  
The three values in file-nr denote :      
the number of allocated file handles ,     
the number of allocated but unused file handles ,     
the maximum number of file handles.     
.  
Linux 2.6 always reports 0 as the number of free    
file handles -- this is not an error, it just means that the    
number of allocated file handles exactly matches the number of    
used file handles.    
.  
Attempts to allocate more file descriptors than file-max are reported with printk,   
look for "VFS: file-max limit <number> reached".    

推荐设置

fs.file-max = 7xxxxxxx  
.  
PostgreSQL 有一套自己管理的VFS,真正打开的FD与内核管理的文件打开关闭有一套映射的机制,所以真实情况不需要使用那么多的file handlers。     
max_files_per_process 参数。     
假设1GB内存支撑100个连接,每个连接打开1000个文件,那么一个PG实例需要打开10万个文件,一台机器按512G内存来算可以跑500个PG实例,则需要5000万个file handler。     
以上设置绰绰有余。     

3.

参数

kernel.core_pattern  

支持系统

CentOS 6, 7       

参数解释

core_pattern:    
.  
core_pattern is used to specify a core dumpfile pattern name.    
. max length 128 characters; default value is "core"    
. core_pattern is used as a pattern template for the output filename;    
certain string patterns (beginning with '%') are substituted with    
their actual values.    
. backward compatibility with core_uses_pid:    
If core_pattern does not include "%p" (default does not)    
and core_uses_pid is set, then .PID will be appended to    
the filename.    
. corename format specifiers:    
%<NUL>  '%' is dropped    
%%      output one '%'    
%p      pid    
%P      global pid (init PID namespace)    
%i      tid    
%I      global tid (init PID namespace)    
%u      uid    
%g      gid    
%d      dump mode, matches PR_SET_DUMPABLE and    
/proc/sys/fs/suid_dumpable    
%s      signal number    
%t      UNIX time of dump    
%h      hostname    
%e      executable filename (may be shortened)    
%E      executable path    
%<OTHER> both are dropped    
. If the first character of the pattern is a '|', the kernel will treat    
the rest of the pattern as a command to run.  The core dump will be    
written to the standard input of that program instead of to a file.    

推荐设置

kernel.core_pattern = /xxx/core_%e_%u_%t_%s.%p    
.  
这个目录要777的权限,如果它是个软链,则真实目录需要777的权限  
mkdir /xxx  
chmod 777 /xxx  
留足够的空间  

4.

参数

kernel.sem   

支持系统

CentOS 6, 7       

参数解释

kernel.sem = 4096 2147483647 2147483646 512000    
.  
4096 每组多少信号量 (>=17, PostgreSQL 每16个进程一组, 每组需要17个信号量) ,     
2147483647 总共多少信号量 (2^31-1 , 且大于4096*512000 ) ,     
2147483646 每个semop()调用支持多少操作 (2^31-1),     
512000 多少组信号量 (假设每GB支持100个连接, 512GB支持51200个连接, 加上其他进程, > 51200*2/16 绰绰有余)     
.  
# sysctl -w kernel.sem="4096 2147483647 2147483646 512000"    
.  
# ipcs -s -l    
------ Semaphore Limits --------    
max number of arrays = 512000    
max semaphores per array = 4096    
max semaphores system wide = 2147483647    
max ops per semop call = 2147483646    
semaphore max value = 32767    

推荐设置

kernel.sem = 4096 2147483647 2147483646 512000    
.  
4096可能能够适合更多的场景, 所以大点无妨,关键是512000 arrays也够了。    

5.

参数

kernel.shmall = 107374182    
kernel.shmmax = 274877906944    
kernel.shmmni = 819200    

支持系统

CentOS 6, 7        

参数解释

假设主机内存 512GB    
.  
shmmax 单个共享内存段最大 256GB (主机内存的一半,单位字节)      
shmall 所有共享内存段加起来最大 (主机内存的80%,单位PAGE)      
shmmni 一共允许创建819200个共享内存段 (每个数据库启动需要2个共享内存段。  将来允许动态创建共享内存段,可能需求量更大)     
.  
# getconf PAGE_SIZE    
4096    

推荐设置

kernel.shmall = 107374182    
kernel.shmmax = 274877906944    
kernel.shmmni = 819200    
.  
9.2以及以前的版本,数据库启动时,对共享内存段的内存需求非常大,需要考虑以下几点  
Connections:    (1800 + 270 * max_locks_per_transaction) * max_connections  
Autovacuum workers: (1800 + 270 * max_locks_per_transaction) * autovacuum_max_workers  
Prepared transactions:  (770 + 270 * max_locks_per_transaction) * max_prepared_transactions  
Shared disk buffers:    (block_size + 208) * shared_buffers  
WAL buffers:    (wal_block_size + 8) * wal_buffers  
Fixed space requirements:   770 kB  
.  
以上建议参数根据9.2以前的版本设置,后期的版本同样适用。  

6.

参数

net.core.netdev_max_backlog  

支持系统

CentOS 6, 7     

参数解释

netdev_max_backlog    
------------------    
Maximum number  of  packets,  queued  on  the  INPUT  side,    
when the interface receives packets faster than kernel can process them.    

推荐设置

net.core.netdev_max_backlog=1xxxx    
.  
INPUT链表越长,处理耗费越大,如果用了iptables管理的话,需要加大这个值。    

7.

参数

net.core.rmem_default  
net.core.rmem_max  
net.core.wmem_default  
net.core.wmem_max  

支持系统

CentOS 6, 7     

参数解释

rmem_default    
------------    
The default setting of the socket receive buffer in bytes.    
.  
rmem_max    
--------    
The maximum receive socket buffer size in bytes.    
.  
wmem_default    
------------    
The default setting (in bytes) of the socket send buffer.    
.  
wmem_max    
--------    
The maximum send socket buffer size in bytes.    

推荐设置

net.core.rmem_default = 262144    
net.core.rmem_max = 4194304    
net.core.wmem_default = 262144    
net.core.wmem_max = 4194304    

8.

参数

net.core.somaxconn   

支持系统

CentOS 6, 7        

参数解释

somaxconn - INTEGER    
Limit of socket listen() backlog, known in userspace as SOMAXCONN.    
Defaults to 128.    
See also tcp_max_syn_backlog for additional tuning for TCP sockets.    

推荐设置

net.core.somaxconn=4xxx    

9.

参数

net.ipv4.tcp_max_syn_backlog  

支持系统

CentOS 6, 7         

参数解释

tcp_max_syn_backlog - INTEGER    
Maximal number of remembered connection requests, which have not    
received an acknowledgment from connecting client.    
The minimal value is 128 for low memory machines, and it will    
increase in proportion to the memory of machine.    
If server suffers from overload, try increasing this number.    

推荐设置

net.ipv4.tcp_max_syn_backlog=4xxx    
pgpool-II 使用了这个值,用于将超过num_init_child以外的连接queue。     
所以这个值决定了有多少连接可以在队列里面等待。    

10.

参数

net.ipv4.tcp_keepalive_intvl=20    
net.ipv4.tcp_keepalive_probes=3    
net.ipv4.tcp_keepalive_time=60     

支持系统

CentOS 6, 7        

参数解释

tcp_keepalive_time - INTEGER    
How often TCP sends out keepalive messages when keepalive is enabled.    
Default: 2hours.    
.  
tcp_keepalive_probes - INTEGER    
How many keepalive probes TCP sends out, until it decides that the    
connection is broken. Default value: 9.    
.  
tcp_keepalive_intvl - INTEGER    
How frequently the probes are send out. Multiplied by    
tcp_keepalive_probes it is time to kill not responding connection,    
after probes started. Default value: 75sec i.e. connection    
will be aborted after ~11 minutes of retries.    

推荐设置

net.ipv4.tcp_keepalive_intvl=20    
net.ipv4.tcp_keepalive_probes=3    
net.ipv4.tcp_keepalive_time=60    
.  
连接空闲60秒后, 每隔20秒发心跳包, 尝试3次心跳包没有响应,关闭连接。 从开始空闲,到关闭连接总共历时120秒。    

11.

参数

net.ipv4.tcp_mem=8388608 12582912 16777216    

支持系统

CentOS 6, 7    

参数解释

tcp_mem - vector of 3 INTEGERs: min, pressure, max    
单位 page    
min: below this number of pages TCP is not bothered about its    
memory appetite.    
.  
pressure: when amount of memory allocated by TCP exceeds this number    
of pages, TCP moderates its memory consumption and enters memory    
pressure mode, which is exited when memory consumption falls    
under "min".    
.  
max: number of pages allowed for queueing by all TCP sockets.    
.  
Defaults are calculated at boot time from amount of available    
memory.    
64GB 内存,自动计算的值是这样的    
net.ipv4.tcp_mem = 1539615      2052821 3079230    
.  
512GB 内存,自动计算得到的值是这样的    
net.ipv4.tcp_mem = 49621632     66162176        99243264    
.  
这个参数让操作系统启动时自动计算,问题也不大  

推荐设置

net.ipv4.tcp_mem=8388608 12582912 16777216    
.  
这个参数让操作系统启动时自动计算,问题也不大  

12.

参数

net.ipv4.tcp_fin_timeout  

支持系统

CentOS 6, 7        

参数解释

tcp_fin_timeout - INTEGER    
The length of time an orphaned (no longer referenced by any    
application) connection will remain in the FIN_WAIT_2 state    
before it is aborted at the local end.  While a perfectly    
valid "receive only" state for an un-orphaned connection, an    
orphaned connection in FIN_WAIT_2 state could otherwise wait    
forever for the remote to close its end of the connection.    
Cf. tcp_max_orphans    
Default: 60 seconds    

推荐设置

net.ipv4.tcp_fin_timeout=5    
.  
加快僵尸连接回收速度   

13.

参数

net.ipv4.tcp_synack_retries  

支持系统

CentOS 6, 7         

参数解释

tcp_synack_retries - INTEGER    
Number of times SYNACKs for a passive TCP connection attempt will    
be retransmitted. Should not be higher than 255. Default value    
is 5, which corresponds to 31seconds till the last retransmission    
with the current initial RTO of 1second. With this the final timeout    
for a passive TCP connection will happen after 63seconds.    

推荐设置

net.ipv4.tcp_synack_retries=2    
.  
缩短tcp syncack超时时间  

14.

参数

net.ipv4.tcp_syncookies  

支持系统

CentOS 6, 7         

参数解释

tcp_syncookies - BOOLEAN    
Only valid when the kernel was compiled with CONFIG_SYN_COOKIES    
Send out syncookies when the syn backlog queue of a socket    
overflows. This is to prevent against the common 'SYN flood attack'    
Default: 1    
.  
Note, that syncookies is fallback facility.    
It MUST NOT be used to help highly loaded servers to stand    
against legal connection rate. If you see SYN flood warnings    
in your logs, but investigation shows that they occur    
because of overload with legal connections, you should tune    
another parameters until this warning disappear.    
See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.    
.  
syncookies seriously violate TCP protocol, do not allow    
to use TCP extensions, can result in serious degradation    
of some services (f.e. SMTP relaying), visible not by you,    
but your clients and relays, contacting you. While you see    
SYN flood warnings in logs not being really flooded, your server    
is seriously misconfigured.    
.  
If you want to test which effects syncookies have to your    
network connections you can set this knob to 2 to enable    
unconditionally generation of syncookies.    

推荐设置

net.ipv4.tcp_syncookies=1    
.  
防止syn flood攻击   

15.

参数

net.ipv4.tcp_timestamps  

支持系统

CentOS 6, 7         

参数解释

tcp_timestamps - BOOLEAN    
Enable timestamps as defined in RFC1323.    

推荐设置

net.ipv4.tcp_timestamps=1    
.  
tcp_timestamps 是 tcp 协议中的一个扩展项,通过时间戳的方式来检测过来的包以防止 PAWS(Protect Against Wrapped  Sequence numbers),可以提高 tcp 的性能。  

16.

参数

net.ipv4.tcp_tw_recycle  
net.ipv4.tcp_tw_reuse  
net.ipv4.tcp_max_tw_buckets  

支持系统

CentOS 6, 7         

参数解释

tcp_tw_recycle - BOOLEAN    
Enable fast recycling TIME-WAIT sockets. Default value is 0.    
It should not be changed without advice/request of technical    
experts.    
.  
tcp_tw_reuse - BOOLEAN    
Allow to reuse TIME-WAIT sockets for new connections when it is    
safe from protocol viewpoint. Default value is 0.    
It should not be changed without advice/request of technical    
experts.    
.  
tcp_max_tw_buckets - INTEGER  
Maximal number of timewait sockets held by system simultaneously.  
If this number is exceeded time-wait socket is immediately destroyed  
and warning is printed.   
This limit exists only to prevent simple DoS attacks,   
you _must_ not lower the limit artificially,   
but rather increase it (probably, after increasing installed memory),    
if network conditions require more than default value.   

推荐设置

net.ipv4.tcp_tw_recycle=0    
net.ipv4.tcp_tw_reuse=1    
net.ipv4.tcp_max_tw_buckets = 2xxxxx    
.  
net.ipv4.tcp_tw_recycle和net.ipv4.tcp_timestamps不建议同时开启    

17.

参数

net.ipv4.tcp_rmem  
net.ipv4.tcp_wmem  

支持系统

CentOS 6, 7         

参数解释

tcp_wmem - vector of 3 INTEGERs: min, default, max    
min: Amount of memory reserved for send buffers for TCP sockets.    
Each TCP socket has rights to use it due to fact of its birth.    
Default: 1 page    
.  
default: initial size of send buffer used by TCP sockets.  This    
value overrides net.core.wmem_default used by other protocols.    
It is usually lower than net.core.wmem_default.    
Default: 16K    
.  
max: Maximal amount of memory allowed for automatically tuned    
send buffers for TCP sockets. This value does not override    
net.core.wmem_max.  Calling setsockopt() with SO_SNDBUF disables    
automatic tuning of that socket's send buffer size, in which case    
this value is ignored.    
Default: between 64K and 4MB, depending on RAM size.    
.  
tcp_rmem - vector of 3 INTEGERs: min, default, max    
min: Minimal size of receive buffer used by TCP sockets.    
It is guaranteed to each TCP socket, even under moderate memory    
pressure.    
Default: 1 page    
.  
default: initial size of receive buffer used by TCP sockets.    
This value overrides net.core.rmem_default used by other protocols.    
Default: 87380 bytes. This value results in window of 65535 with    
default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit    
less for default tcp_app_win. See below about these variables.    
.  
max: maximal size of receive buffer allowed for automatically    
selected receiver buffers for TCP socket. This value does not override    
net.core.rmem_max.  Calling setsockopt() with SO_RCVBUF disables    
automatic tuning of that socket's receive buffer size, in which    
case this value is ignored.    
Default: between 87380B and 6MB, depending on RAM size.    

推荐设置

net.ipv4.tcp_rmem=8192 87380 16777216    
net.ipv4.tcp_wmem=8192 65536 16777216    
.  
许多数据库的推荐设置,提高网络性能  

18.

参数

net.nf_conntrack_max  
net.netfilter.nf_conntrack_max  

支持系统

CentOS 6    

参数解释

nf_conntrack_max - INTEGER    
Size of connection tracking table.    
Default value is nf_conntrack_buckets value * 4.    

推荐设置

net.nf_conntrack_max=1xxxxxx    
net.netfilter.nf_conntrack_max=1xxxxxx    

19.

参数

vm.dirty_background_bytes   
vm.dirty_expire_centisecs   
vm.dirty_ratio   
vm.dirty_writeback_centisecs   

支持系统

CentOS 6, 7        

参数解释

==============================================================    
.  
dirty_background_bytes    
.  
Contains the amount of dirty memory at which the background kernel    
flusher threads will start writeback.    
.  
Note: dirty_background_bytes is the counterpart of dirty_background_ratio. Only    
one of them may be specified at a time. When one sysctl is written it is    
immediately taken into account to evaluate the dirty memory limits and the    
other appears as 0 when read.    
.  
==============================================================    
.  
dirty_background_ratio    
.  
Contains, as a percentage of total system memory, the number of pages at which    
the background kernel flusher threads will start writing out dirty data.    
.  
==============================================================    
.  
dirty_bytes    
.  
Contains the amount of dirty memory at which a process generating disk writes    
will itself start writeback.    
.  
Note: dirty_bytes is the counterpart of dirty_ratio. Only one of them may be    
specified at a time. When one sysctl is written it is immediately taken into    
account to evaluate the dirty memory limits and the other appears as 0 when    
read.    
.  
Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any    
value lower than this limit will be ignored and the old configuration will be    
retained.    
.  
==============================================================    
.  
dirty_expire_centisecs    
.  
This tunable is used to define when dirty data is old enough to be eligible    
for writeout by the kernel flusher threads.  It is expressed in 100'ths    
of a second.  Data which has been dirty in-memory for longer than this    
interval will be written out next time a flusher thread wakes up.    
.  
==============================================================    
.  
dirty_ratio    
.  
Contains, as a percentage of total system memory, the number of pages at which    
a process which is generating disk writes will itself start writing out dirty    
data.    
.  
==============================================================    
.  
dirty_writeback_centisecs    
.  
The kernel flusher threads will periodically wake up and write `old' data    
out to disk.  This tunable expresses the interval between those wakeups, in    
100'ths of a second.    
.  
Setting this to zero disables periodic writeback altogether.    
.  
==============================================================    

推荐设置

vm.dirty_background_bytes = 4096000000    
vm.dirty_expire_centisecs = 6000    
vm.dirty_ratio = 80    
vm.dirty_writeback_centisecs = 50    
.  
减少数据库进程刷脏页的频率,dirty_background_bytes根据实际IOPS能力以及内存大小设置    

20.

参数

vm.extra_free_kbytes  

支持系统

CentOS 6    

参数解释

extra_free_kbytes    
.  
This parameter tells the VM to keep extra free memory   
between the threshold where background reclaim (kswapd) kicks in,   
and the threshold where direct reclaim (by allocating processes) kicks in.    
.  
This is useful for workloads that require low latency memory allocations    
and have a bounded burstiness in memory allocations,   
for example a realtime application that receives and transmits network traffic    
(causing in-kernel memory allocations) with a maximum total message burst    
size of 200MB may need 200MB of extra free memory to avoid direct reclaim    
related latencies.    
.  
目标是尽量让后台进程回收内存,比用户进程提早多少kbytes回收,因此用户进程可以快速分配内存。    

推荐设置

vm.extra_free_kbytes=4xxxxxx    

21.

参数

vm.min_free_kbytes  

支持系统

CentOS 6, 7         

参数解释

min_free_kbytes:    
.  
This is used to force the Linux VM to keep a minimum number    
of kilobytes free.  The VM uses this number to compute a    
watermark[WMARK_MIN] value for each lowmem zone in the system.    
Each lowmem zone gets a number of reserved free pages based    
proportionally on its size.    
.  
Some minimal amount of memory is needed to satisfy PF_MEMALLOC    
allocations; if you set this to lower than 1024KB, your system will    
become subtly broken, and prone to deadlock under high loads.    
.  
Setting this too high will OOM your machine instantly.    

推荐设置

vm.min_free_kbytes = 2xxxxxx    
.  
防止在高负载时系统无响应,减少内存分配死锁概率。    

22.

参数

vm.mmap_min_addr  

支持系统

CentOS 6, 7       

参数解释

mmap_min_addr    
.  
This file indicates the amount of address space  which a user process will    
be restricted from mmapping.  Since kernel null dereference bugs could    
accidentally operate based on the information in the first couple of pages    
of memory userspace processes should not be allowed to write to them.  By    
default this value is set to 0 and no protections will be enforced by the    
security module.  Setting this value to something like 64k will allow the    
vast majority of applications to work correctly and provide defense in depth    
against future potential kernel bugs.    

推荐设置

vm.mmap_min_addr=6xxxx    
.  
防止内核隐藏的BUG导致的问题  

23.

参数

vm.overcommit_memory   
vm.overcommit_ratio   

支持系统

CentOS 6, 7         

参数解释

==============================================================    
.  
overcommit_kbytes:    
.  
When overcommit_memory is set to 2, the committed address space is not    
permitted to exceed swap plus this amount of physical RAM. See below.    
.  
Note: overcommit_kbytes is the counterpart of overcommit_ratio. Only one    
of them may be specified at a time. Setting one disables the other (which    
then appears as 0 when read).    
.  
==============================================================    
.  
overcommit_memory:    
.  
This value contains a flag that enables memory overcommitment.    
.  
When this flag is 0,   
the kernel attempts to estimate the amount    
of free memory left when userspace requests more memory.    
.  
When this flag is 1,   
the kernel pretends there is always enough memory until it actually runs out.    
.  
When this flag is 2,   
the kernel uses a "never overcommit"    
policy that attempts to prevent any overcommit of memory.    
Note that user_reserve_kbytes affects this policy.    
.  
This feature can be very useful because there are a lot of    
programs that malloc() huge amounts of memory "just-in-case"    
and don't use much of it.    
.  
The default value is 0.    
.  
See Documentation/vm/overcommit-accounting and    
security/commoncap.c::cap_vm_enough_memory() for more information.    
.  
==============================================================    
.  
overcommit_ratio:    
.  
When overcommit_memory is set to 2,   
the committed address space is not permitted to exceed   
swap + this percentage of physical RAM.    
See above.    
.  
==============================================================    

推荐设置

vm.overcommit_memory = 0    
vm.overcommit_ratio = 90    
.  
vm.overcommit_memory = 0 时 vm.overcommit_ratio可以不设置   

24.

参数

vm.swappiness   

支持系统

CentOS 6, 7         

参数解释

swappiness    
.  
This control is used to define how aggressive the kernel will swap    
memory pages.    
Higher values will increase agressiveness, lower values    
decrease the amount of swap.    
.  
The default value is 60.    

推荐设置

vm.swappiness = 0    

25.

参数

vm.zone_reclaim_mode   

支持系统

CentOS 6, 7         

参数解释

zone_reclaim_mode:    
.  
Zone_reclaim_mode allows someone to set more or less aggressive approaches to    
reclaim memory when a zone runs out of memory. If it is set to zero then no    
zone reclaim occurs. Allocations will be satisfied from other zones / nodes    
in the system.    
.  
This is value ORed together of    
.  
1       = Zone reclaim on    
2       = Zone reclaim writes dirty pages out    
4       = Zone reclaim swaps pages    
.  
zone_reclaim_mode is disabled by default.  For file servers or workloads    
that benefit from having their data cached, zone_reclaim_mode should be    
left disabled as the caching effect is likely to be more important than    
data locality.    
.  
zone_reclaim may be enabled if it's known that the workload is partitioned    
such that each partition fits within a NUMA node and that accessing remote    
memory would cause a measurable performance reduction.  The page allocator    
will then reclaim easily reusable pages (those page cache pages that are    
currently not used) before allocating off node pages.    
.  
Allowing zone reclaim to write out pages stops processes that are    
writing large amounts of data from dirtying pages on other nodes. Zone    
reclaim will write out dirty pages if a zone fills up and so effectively    
throttle the process. This may decrease the performance of a single process    
since it cannot use all of system memory to buffer the outgoing writes    
anymore but it preserve the memory on other nodes so that the performance    
of other processes running on other nodes will not be affected.    
.  
Allowing regular swap effectively restricts allocations to the local    
node unless explicitly overridden by memory policies or cpuset    
configurations.    

推荐设置

vm.zone_reclaim_mode=0    
.  
不使用NUMA  

26.

参数

net.ipv4.ip_local_port_range  

支持系统

CentOS 6, 7         

参数解释

ip_local_port_range - 2 INTEGERS  
Defines the local port range that is used by TCP and UDP to  
choose the local port. The first number is the first, the  
second the last local port number. The default values are  
32768 and 61000 respectively.  
.  
ip_local_reserved_ports - list of comma separated ranges  
Specify the ports which are reserved for known third-party  
applications. These ports will not be used by automatic port  
assignments (e.g. when calling connect() or bind() with port  
number 0). Explicit port allocation behavior is unchanged.  
.  
The format used for both input and output is a comma separated  
list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and  
10). Writing to the file will clear all previously reserved  
ports and update the current list with the one given in the  
input.  
.  
Note that ip_local_port_range and ip_local_reserved_ports  
settings are independent and both are considered by the kernel  
when determining which ports are available for automatic port  
assignments.  
.  
You can reserve ports which are not in the current  
ip_local_port_range, e.g.:  
.  
$ cat /proc/sys/net/ipv4/ip_local_port_range  
32000   61000  
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports  
8080,9148  
.  
although this is redundant. However such a setting is useful  
if later the port range is changed to a value that will  
include the reserved ports.  
.  
Default: Empty  

推荐设置

net.ipv4.ip_local_port_range=40000 65535    
.  
限制本地动态端口分配范围,防止占用监听端口。  

27.

参数

vm.nr_hugepages  

支持系统

CentOS 6, 7  

参数解释

==============================================================  
nr_hugepages  
Change the minimum size of the hugepage pool.  
See Documentation/vm/hugetlbpage.txt  
==============================================================  
nr_overcommit_hugepages  
Change the maximum size of the hugepage pool. The maximum is  
nr_hugepages + nr_overcommit_hugepages.  
See Documentation/vm/hugetlbpage.txt  
.  
The output of "cat /proc/meminfo" will include lines like:  
......  
HugePages_Total: vvv  
HugePages_Free:  www  
HugePages_Rsvd:  xxx  
HugePages_Surp:  yyy  
Hugepagesize:    zzz kB  
.  
where:  
HugePages_Total is the size of the pool of huge pages.  
HugePages_Free  is the number of huge pages in the pool that are not yet  
allocated.  
HugePages_Rsvd  is short for "reserved," and is the number of huge pages for  
which a commitment to allocate from the pool has been made,  
but no allocation has yet been made.  Reserved huge pages  
guarantee that an application will be able to allocate a  
huge page from the pool of huge pages at fault time.  
HugePages_Surp  is short for "surplus," and is the number of huge pages in  
the pool above the value in /proc/sys/vm/nr_hugepages. The  
maximum number of surplus huge pages is controlled by  
/proc/sys/vm/nr_overcommit_hugepages.  
.  
/proc/filesystems should also show a filesystem of type "hugetlbfs" configured  
in the kernel.  
.  
/proc/sys/vm/nr_hugepages indicates the current number of "persistent" huge  
pages in the kernel's huge page pool.  "Persistent" huge pages will be  
returned to the huge page pool when freed by a task.  A user with root  
privileges can dynamically allocate more or free some persistent huge pages  
by increasing or decreasing the value of 'nr_hugepages'.  

推荐设置

如果要使用PostgreSQL的huge page,建议设置它。    
大于数据库需要的共享内存即可。    

28.

参数

fs.nr_open

支持系统

CentOS 6, 7

参数解释

nr_open:

This denotes the maximum number of file-handles a process can
allocate. Default value is 1024*1024 (1048576) which should be
enough for most machines. Actual limit depends on RLIMIT_NOFILE
resource limit.

它还影响security/limits.conf 的文件句柄限制,单个进程的打开句柄不能大于fs.nr_open,所以要加大文件句柄限制,首先要加大nr_open

推荐设置

对于有很多对象(表、视图、索引、序列、物化视图等)的PostgreSQL数据库,建议设置为2000万,
例如fs.nr_open=20480000

数据库关心的资源限制

1. 通过/etc/security/limits.conf设置,或者ulimit设置

2. 通过/proc/$pid/limits查看当前进程的设置

#        - core - limits the core file size (KB)  
#        - memlock - max locked-in-memory address space (KB)  
#        - nofile - max number of open files  建议设置为1000万 , 但是必须设置sysctl, fs.nr_open大于它,否则会导致系统无法登陆。
#        - nproc - max number of processes  
以上四个是非常关心的配置  
....  
#        - data - max data size (KB)  
#        - fsize - maximum filesize (KB)  
#        - rss - max resident set size (KB)  
#        - stack - max stack size (KB)  
#        - cpu - max CPU time (MIN)  
#        - as - address space limit (KB)  
#        - maxlogins - max number of logins for this user  
#        - maxsyslogins - max number of logins on the system  
#        - priority - the priority to run user process with  
#        - locks - max number of file locks the user can hold  
#        - sigpending - max number of pending signals  
#        - msgqueue - max memory used by POSIX message queues (bytes)  
#        - nice - max nice priority allowed to raise to values: [-20, 19]  
#        - rtprio - max realtime priority  

数据库关心的IO调度规则

1. 目前操作系统支持的IO调度策略包括cfq, deadline, noop 等。

/kernel-doc-xxx/Documentation/block  
-r--r--r-- 1 root root   674 Apr  8 16:33 00-INDEX  
-r--r--r-- 1 root root 55006 Apr  8 16:33 biodoc.txt  
-r--r--r-- 1 root root   618 Apr  8 16:33 capability.txt  
-r--r--r-- 1 root root 12791 Apr  8 16:33 cfq-iosched.txt  
-r--r--r-- 1 root root 13815 Apr  8 16:33 data-integrity.txt  
-r--r--r-- 1 root root  2841 Apr  8 16:33 deadline-iosched.txt  
-r--r--r-- 1 root root  4713 Apr  8 16:33 ioprio.txt  
-r--r--r-- 1 root root  2535 Apr  8 16:33 null_blk.txt  
-r--r--r-- 1 root root  4896 Apr  8 16:33 queue-sysfs.txt  
-r--r--r-- 1 root root  2075 Apr  8 16:33 request.txt  
-r--r--r-- 1 root root  3272 Apr  8 16:33 stat.txt  
-r--r--r-- 1 root root  1414 Apr  8 16:33 switching-sched.txt  
-r--r--r-- 1 root root  3916 Apr  8 16:33 writeback_cache_control.txt  

如果你要详细了解这些调度策略的规则,可以查看WIKI或者看内核文档。

从这里可以看到它的调度策略

cat /sys/block/vdb/queue/scheduler   
noop [deadline] cfq   

修改

echo deadline > /sys/block/hda/queue/scheduler  

或者修改启动参数

grub.conf  
elevator=deadline  

从很多测试结果来看,数据库使用deadline调度,性能会更稳定一些。

其他

1. 关闭透明大页

2. 禁用NUMA

3. SSD的对齐

Count

linux mysql pg postgresql 数据库

postgresql安装

postgresql安装

pgsql/bin/initdb -D /usr/local/pgsql/data
local/pgsql/bin/postgres -D /usr/local/pgsql/data >logfile 2>&1 &
local/pgsql/bin/createdb test
local/pgsql/bin/psql test

账户设置

创建用户
CREATE USER davide WITH PASSWORD 'jw8s0F4';
改密码
ALTER ROLE davide WITH PASSWORD 'hu8jmn3';
让一个角色能够创建其他角色和新的数据库:
ALTER ROLE miriam CREATEROLE CREATEDB;