Xefee Hosting
My location:Xefee Hosting » Cheap Web Hosting » Detail

The evolution of large-scale site architecture and knowledge

From:HostingBig Middle Small】 Views:49 Time(s)

Before the introduction of large sites also have some articles in the evolution of architecture, such as LiveJournal's, ebay, and are very valuable reference, but feel that they speak more to the evolution of the results each time, without great detail about why the need to do so evolution, coupled with the recent, many students feel it difficult to understand why a website needs less complex technology, so has the idea of writing this article, will be described in this article, a common web development into large-scale site A typical process Zhong Evolution of structure and the required knowledge acquired, hoping to give the Internet industry, the students want to work some preliminary Gainian,:), Wen Ye requested in the wrong if you Duoji suggestions to the real effect of this better ideas .

First step in the evolution of architecture: the physical separation of webserver and database
The beginning, because some of the ideas, so Zai built a site on the Internet, this time even You Keneng host is rented, but this article we only concerned about the Evolution of structure, so Jiujia She already managed this time the a host, and a certain De-bandwidth, and this time because the site has some features, attracted some of the people visited, and gradually you found the system's under increasing pressure Gao, Xiangyingsuduo getting slower and slower, but more obvious this time the database and application interaction, application problems, and the database is also prone to problems, and database problems when the application is also easy to go wrong, then entered the first stage in the evolution: the application and the database is physically separated into two sets of machines, this time there is no new technology requirements, but you find that indeed played enormous effects, system also restore to a previous response speed, and Bingju shore up higher flow rate and were not Shu Juku and application form with each other.
Upon completion of this step to see the system icon:

This step involves these knowledge systems:
This step framework for the evolution of the basic technical knowledge not required.

Framework for the evolution of the second step: increase the page cache
Did not last long, as more and more people visit, you find that speed of response began to slow down, and find the cause, found to be too much to access the database operations, resulting in the highly competitive data connection, so slow to respond, but the database connection can not open too much, otherwise the database will be a high pressure machine, so consider using caching mechanisms to minimize database connection resources of the competition and the pressure reading on the database, this time probably will opt for the first squid and other similar mechanism to the system in a relatively static pages (such as 12 days before an updated page) cache (of course, static pages can be used to program), this program can not do changes, we can well reduce the pressure on the webserver and the reduction database connection resources, competition, OK, and began to do with squid cache relatively static pages.
Upon completion of this step to see the system icon:

This step involves these knowledge systems:
Front page caching techniques, such as squid, If you want to have to use good, understanding of the way to achieve under the squid and the failure of the cache algorithm.

Framework for the evolution of the third step: increase the page fragment cache
Increase squid do cache, the overall system is to enhance the speed of true, webserver the pressure Kaishi drop a, but with the traffic to increase, found in the system You began appearing in some slow, and the taste of the squid like Dongtai the benefits of caching, began to be able to not allow dynamic pages are those of the parts in a relatively static cached it, so consider using a similar page like the ESI fragment caching strategy, OK, and began to do the dynamic page using ESI fragments of relatively static part of the cache.
Upon completion of this step to see the system icon:

This step involves these knowledge systems:
Page fragment caching techniques, such as ESI, etc., Would like to use good, the realization of the same need to know ways of ESI;

Step structure evolution: data cache
The adoption of ESI's technology once again like to improve the system cache effects, the system does further reduce the pressure, but also, as traffic increases, the system began to slow down or through search, the system may find that there are some repeated access to data in places like to get user information, this time to start thinking yes Bushikeyi Jiang these data information is also cached it, Yushi these data cached to the local Neicun, change the complete Hou, Wan Quan Fuheyuqi, the response is resumed, the database further reduce a lot of pressure.
Upon completion of this step to see the system icon:

This step involves these knowledge systems:
Cache technology, including data structures such as Map, caching algorithm, the chosen mechanism to achieve the framework itself.

Framework for the evolution of the fifth step: increase webserver
Did not last long, was found again with the increase in the amount of system access, webserver machine's pressure to rise to higher peak, this time to start thinking about adding a webserver, which is to simultaneously solve the availability problem, the webserver to avoid single down machine, then they will not be used, in doing these considerations, the decision to increase a webserver, to increase a webserver, it will run into some problems, typical are:
1, how to access assigned to the two machines, this time the program is usually considered native Apache load balancing, or LVS load balancing software such programs;
2, how to maintain state information synchronization, such as user session so the program will be considered at this time have written to the database, write memory, cookie or session information synchronization mechanism;
3, how to maintain the data cache synchronization information, for example, the user data before the cache, this time the mechanism is usually considered a distributed cache synchronization or cache;
4, how to upload files of these similar features to normal, this time the mechanism will be considered is the use of shared file system or storage;
In addressing these issues, the last is to increase the webserver to two, the system finally goes back to the previous rate.
Upon completion of this step to see the system icon:

This step involves these knowledge systems:
Load balancing techniques (including but not limited to hardware load balancing, software load balancing, load algorithms, linux forward agreement, the selected technology implementation details, etc.), The main preparation techniques (including but not limited to ARP deception, linuxheart-beat , etc.) or cache synchronization status information (including but not limited Cookie technology, UDP protocol, status information broadcast, the selected cache synchronization technology, implementation details and so on), file sharing technology (including, without limitation, NFS etc.) , storage (including not limited to, storage devices, etc.).

Framework for the evolution of the sixth step: sub-library
Enjoy a period of time the system visits the happiness of high growth Hou, found that the system has slowed the Kai Shi, Zhe Ci then yes do what Zhuangkuang, Jingguochazhao found database Xieru, Gengxin of these operations part of the database connection of Zi Yuan Jingzhengfeichang intense, resulting in a system slow down, how should we do this next, this time the optional program has a database cluster and sub-bank strategy, as some aspects of the cluster is not very good database support, so sub-library will become a more common strategy sub-library means to modify the original program, a sub-library is realized through changes, yes, target achieved, system recovery rate than before even faster.
Upon completion of this step to see the system icon:

This step involves these knowledge systems:
This step is more a need to make a reasonable division of the business in order to achieve sub-libraries, no specific technical details of the request;
But with the increasing amount of data and sub-libraries were, in database design, tuning and maintenance on the need to do better, therefore, made these areas a high technology or the requirements.

Framework for the evolution of the seventh step: sub-table, DAL, and distributed caching
As the system continued to run, a significant amount of data began to increase after the inquiry has identified sub-libraries will still be some slow, so Anzhao sub-library to start the work of sub-tables, of course, this inevitably will require on the procedures some changes, perhaps at this time will find used my own sub-Biao and warehouses should be concerned about the rules and so on, is a little complicated, so initiation Can we have more to achieve a common framework Lai banked points Biao data access, this in ebay's the corresponding structure is DAL, the relative evolution of the process takes longer, of course, also possible that a common framework will be done until after the start points table, while, at this stage may find that the previous cache synchronization problems, because the data is too big, cause there are not likely to be cached locally, then synchronize the way, need to adopt a distributed caching scheme, and therefore, it is a through investigation and torture, finally is the large amounts of data cache on the transfer to the distributed cache.
Upon completion of this step to see the system icon:

This step involves these knowledge systems:
More of the same sub-table is the business division, will be technically involved in the dynamic hash algorithm, consistenthash algorithm;
DAL involve more complex technologies such as database connection management (timeout, exception), the control of database operations (timeout, exception), sub-library sub-table rules package, etc.;

Framework for the evolution of the eighth step: adding more webserver
In sub-library sub-table finish this work, the database, the pressure has dropped to relatively low, and start living a surge of visits a day looking at a happy life, and suddenly one day and found that access to the system also began to the trend of a slow, first check the database at this time, all the normal pressure, then view the webserver, apache was found blocking a lot of requests, and application server for each request is relatively fast, seems to lead to high number of requests to queue up Wait, slow response, it is also easy to handle, in general, this time also some money, and then add some webserver server, add the webserver server in this process, there may be several challenges:
1, Apache's soft load or LVS soft-load and so can not afford the huge web traffic (requests connection, network traffic, etc.) Scheduling, and this time, if funding allows, will take the plan is to buy the hardware load, such as F5, Netsclar, Athelon like, such as the funding does not allow it, will be applied to the program is to do some logical categories, and then load the software into different clusters;
2, some existing state information synchronization, file sharing programs may be a bottleneck, the need for improvement, perhaps this time to prepare meet the site based on business requirements such as distributed file system;
In after completing such work, the beginning of a seemingly endless stretch of perfect age when site traffic increases, the response solution is continuously added webserver.
Upon completion of this step to see the system icon:

This step involves these knowledge systems:
At this step, along with the growing number of machines, increasing the amount of data and system availability have become increasingly demanding, the time demands on the technology used must have a more in-depth understanding of, and need to site the nature of the demand to do more customized products.

Framework for the evolution of the ninth step: reading and writing data separation and low-cost storage solution
Suddenly one day and found the perfect time to end, and the database appears in the eyes of the nightmare again, because the webserver to add too much, causing the database connection is still not enough resources, but this time they have banked score sheet, and began to pressure state of the database, you may find the database to read and write than high, this time usually think of data read and write separate programs, of course, this program is not easy to achieve, also may find some of the data storage some waste in the database, or database resources are too occupied, so at this stage may form the framework for the evolution of a separation of data read and write, and write some of the more low-cost storage options, such as BigTable this.
Upon completion of this step to see the system icon:

This step involves these knowledge systems:
Separation required to read and write data to the database replication, standby and other strategies have in-depth grasp and understand, and will require technology with self-realization;
Low-cost storage solution on the OS's file storage requirements gained a good grasp and understanding of the language used at the same time called for the realization of the document in depth piece on the subject.

Framework for the evolution of the 10th step: enter the era of large-scale distributed applications and low-cost server farms dream of the times
After the top of this long and painful process, finally re-ushered in the era of perfect, continuous increase in webserver can support more and more visits, and for large sites, the popularity of the important no doubt, with the popular The growing, wide variety of functional requirements explosive growth began, this time suddenly discovered that the webserver on the deployment of web applications that have very large, and when multiple teams are beginning to carry out changes to its, is really quite inconvenient, reuse is quite bad, the basic is that each team have done something more or less repeated, and the deployment and maintenance is quite a problem because the large application package in the N machine Copy, start all takes a lot of time, when the problem is not a good investigation , another worse situation is very likely to appear on an application bug has led the station is not available, there are tuning bad as the other operations (because of the deployment of the application of machine do anything, simply can not be targeted tuning) and other factors, according to this analysis, began to make a determined effort, the system is split according to responsibilities So a large distributed applications was born, usually, this step takes a long time because it will encounter many challenges:
1, split into high-performance distributed after the need to provide a stable communication framework, and the need to support a wide variety of communications and remote call mode;
2, the application of a huge separation takes a long time, the need for business consolidation and control system dependencies;
3, such as He Yunwei (dependent on management, operation, management, bug tracking, tuning, monitoring and alarm, etc.) Well in this large distributed applications.
After this step, almost Xitong of architecture into a relatively stable phase, but also to Kai Shi used a lot of cheap machines to support that giant access to traffic and data traffic, combined with the evolution of this process framework Yiji so many lessons of experience to use other various ways to support the increasingly high traffic.
Upon completion of this step to see the system icon:

This step involves these knowledge systems:
This step involves the knowledge very much require communications, remoting, Xiao Xi mechanisms have profound understanding and mastering requirements of Dushi theory, the hardware level, the operating system level Yiji language used for implementation are clear understanding.
Operation piece of knowledge is also very involved in Multi, in most cases need to have distributed parallel computing, reporting, monitoring technology and the rules of strategy and so on.
Speaking does not very much effort, and the entire site architecture and classical evolution are more similar to the above, of course, every step taken by the program, the evolution of the steps may be different, also different because the business web site, have different professional technology needs more of this blog is from the perspective of architecture to explain the evolution of Guo Cheng, of course, which contains many technology Ye fails mentioned here that, like database clustering, data mining, search, etc. In the real of evolution process will also help enhance the hardware configuration as the network environment, transformation of the operating system, CDN mirroring to support more traffic, so the real process of development will be plenty of different, and the other large sites should do far not only above these, there are like safety, operation and maintenance, operation, service, storage, etc., to make a large site is really not easy to write this article is more a hope to lead to more large-scale site introduction to the evolution of architecture,:).


Please sign that when you have a copy:
Xefee hosting<<The evolution of large-scale site architecture and knowledge>>:http://www.xefee.com/article-1323-1.html
Tags: Editor By:luis
New
Sponsored links

Advertisement - Links - Maps - RSS - Contact us - Copyright